VCF Readcount Annotator

The VCF Readcount Annotator will take an output file from bam-readcount and add its data to your VCF. It supports both DNA and RNA readcounts.

DNA readcounts are identified by specifying DNA in the list of positional arguments. Depth, allele counts, and VAFs are then written to the DP, AD, and AF fields, respectively. Forward and reverse strand allele counts are written in the ADF and ADR fields, respectively.

RNA readcounts are identified by specifying RNA in the list of positional arguments. Depth, allele counts, and VAFs are then written tot he RDP, RAD, and RAF fields, respectively. Forward and reverse strand allele counts are written in the RADF and RADR fields, respectively.

If your VCF is a multi-sample VCF, you have to pick one of the sample in your VCF by setting the --sample-name option. This is the sample that the readcounts will be written for.

By default the output VCF will be written to a .readcount.vcf file next to your input VCF file. You can set a different output file using the --output-vcf parameter.

Snvs and indels are usually run separately through bam-readcount because indels require to be run in insertion-centric mode (-i option). When using the -vcf-readcount-annotator, the --variant-type option can then be used to annotate your VCF with those two files separately. For example, you could run the vcf-readcount annotator once with the --variant-type snv option to run in snv-only mode using the snv bam-readcount output file and then annotate the output file from that step with indel information by using the --variant-type indel option and the indel bam-readcount output file. This is generally recommended because the all option in conjunction with a concatenated bam-readcount output file (containing both snvs and indels) will not be able to handle cases with a snv and indel at the same position. This situation results in duplicated bam-readcount entries in the concatenated file, one from the snv and one from the indel, that might contain conflicting information that can’t be resolved by the vcf-readcount-anntator.

Example commands for running the vcf-readcount-annotator with snvs and indels separately

vcf-readcount-annotator <input_vcf> <snv_bam_readcount_file> <DNA|RNA> \
-s <sample_name> -t snv -o <snv_annotated_vcf>

vcf-readcount-annotator <snv_annotated_vcf> <indel_bam_readcount_file> <DNA|RNA> \
-s <sample_name> -t indel -o <annotated_vcf>

Usage

usage: vcf-readcount-annotator [-h] [-s SAMPLE_NAME] [-o OUTPUT_VCF]
                               [-t {snv,indel,all}]
                               input_vcf bam_readcount_file {DNA,RNA}

A tool that will add the data from bam-readcount files to the VCF sample
column.

positional arguments:
  input_vcf             A VCF file
  bam_readcount_file    A bam-readcount output file
  {DNA,RNA}             The type of data in the bam_readcount_file. If `DNA`
                        is chosen, the readcounts will be written to the AD,
                        AF, and DP fields. If `RNA` is chosen, the readcounts
                        will be written to the RAD, RAF, and RDP fields.

optional arguments:
  -h, --help            show this help message and exit
  -s SAMPLE_NAME, --sample-name SAMPLE_NAME
                        If the input_vcf contains multiple samples, the name
                        of the sample to annotate.
  -o OUTPUT_VCF, --output-vcf OUTPUT_VCF
                        Path to write the output VCF file. If not provided,
                        the output VCF file will be written next to the input
                        VCF file with a .readcount.vcf file ending.
  -t {snv,indel,all}, --variant-type {snv,indel,all}
                        The type of variant to process. `snv` will only
                        annotate SNVs. `indel` will only annotate InDels.
                        `all` will annotate all variant types. `snv` and
                        `indel` mode currently do not support multi-allelic
                        VCF entries that contain both SNVs and InDels. It is
                        recommended to split multi-allelic sites before
                        running in `snv` or `indel` mode.