Reference Transcript Mismatch Reporter¶
This tool can be used to identify variants where the reference genome build
doesn’t match the Ensembl reference transcript used by VEP for variant consequence
annotations. In these cases, the REF nucleotide(s) at a variant position will differ
from the Ensembl transcript nucleotide(s) at the corresponding mutation
position. Any resulting amino acid change predictions found in the Amino_acids
field of the VEP CSQ annotation will then be different from the translated Ensembl transcript
amino acids at that position.
This will lead to errors in some downstream tools, e.g. pVACseq, which rely on
the Amino_acids
field as well as the translated Ensembl transcript peptide
sequence - as reported by the Wildtype
plugin to make predictions about
the impact of the mutation on the transcript peptide sequence. Since those two
fields will be in disagreement in such cases as described above, pVACseq
cannot make predictions on such variants.
Such errors might occur in a small number of variants if there are only minor differences between the reference used and the Ensembl transcripts but they might also be more widespread, for example, if users aligned to GRCh37 but used a GRCh38 VEP cachce.
The input VCF needs to be annotated by VEP and requires annotation with the
Wildtype
VEP plugin available as part of pVACtools.
This tool will report on the number of variants and transcripts in a VCF that
are affected by this issuei and output this information to stdout. It will
write a .mismatch.tsv
file next to the VCF that provides further details
on the problematic variants.
This tool also allows the user to either soft-filter or hard-filter the VCF
using the --filter [soft|hard]
parameter. Soft-filtering will tag the
problematic variants with a custom VCF FILTER CSQ_MISMACH
while hard-filtering
will produce a new VCF that has these variants removed. When using a filter,
the output VCF will be written to a filtered.vcf
file next to
your input VCF file. You can set a different output file using the
--output-vcf
parameter.
Usage¶
usage: ref-transcript-mismatch-reporter [-h] [-f {soft,hard}] [-o OUTPUT_VCF]
input_vcf
A tool to identify variants in a VCF where the reference genome used to align
and call variants doesn't match the Ensembl reference transcript used by VEP
for variant consequence annotations.
positional arguments:
input_vcf A VEP-annotated VCF file with Wildtype plugin
annotation
optional arguments:
-h, --help show this help message and exit
-f {soft,hard}, --filter {soft,hard}
soft: Write a soft-filtered VCF file which identifies
variants with mismatched VEP annotations with the
CSQ_MISMATCH filter. hard: Write a hard-filtered VCF
file that removes variants with mismatched VEP
annotations.
-o OUTPUT_VCF, --output-vcf OUTPUT_VCF
Path to write the output VCF file to if a --filter is
chosen. If not provided, the output VCF file will be
written next to the input VCF file with a
.filtered.vcf file ending.
Example Command¶
csq-mismatch-report input.vcf --filter soft