Transform Split Values

The Transform Split Values tool extracts and manipulates values from existing sample fields in a VCF and outputs the results to a TSV file. The field to manipulate is chosen via the second positional argument.

Supported operations are the following:

  • ref: Extract the first value in a R-number field (the reference value).

  • alt: Extract the second value in a R-number field (the alt value).

  • sum: Calculate the sum of all the numbers in the field.

  • min: Calculate the minimum of all the numbers in the field.

  • max: Calculate the maximum of all the numbers in the field.

  • mean: Calculate the mean of all the numbers in the field.

  • median: Calculate the median of all the numbers in the field.

  • stdev: Calculate the standard deviation of all the numbers in the field.

  • ref_ratio: The first value in a R-number field divided by the sum of all the numbers (the reference ratio).

  • alt_ratio: The second value in a R-number field divided by the sum of all the numbers (the alt ratio).

If your VCF is a multi-sample VCF, you have to pick one of the sample in your VCF by setting the --sample-name option. This is the sample that the readcounts will be written for.

By default the output TSV will be written to a .tsv file next to your input VCF file. You can set a different output file using the --output-tsv parameter.

Usage

usage: transform-split-values [-h] [-t INPUT_TSV] [-s SAMPLE_NAME]
                              [-o OUTPUT_TSV]
                              input_vcf format_field
                              {ref,alt,sum,min,max,mean,median,stdev,ref_ratio,alt_ratio}
                              [{ref,alt,sum,min,max,mean,median,stdev,ref_ratio,alt_ratio} ...]

positional arguments:
  input_vcf             The VCF file from which to extract information. Multi-
                        allelic sites must be decomposed.
  format_field          The multi-value format field to report.
  {ref,alt,sum,min,max,mean,median,stdev,ref_ratio,alt_ratio}
                        The operation to execute on the chosen field. ref:
                        Extract the first value in a R-number field (the
                        reference value). alt: Extract the second value in a
                        R-number field (the alt value). sum: Calculate the sum
                        of all the numbers in the field. min: Calculate the
                        minimum of all the numbers in the field. max:
                        Calculate the maximum of all the numbers in the field.
                        mean: Calculate the mean of all the numbers in the
                        field. median: Calculate the median of all the numbers
                        in the field. stdev: Calculate the standard deviation
                        of all the numbers in the field. ref_ratio: The first
                        value in a R-number field divided by the sum of all
                        the numbers (the reference ratio). alt_ratio: The
                        second value in a R-number field divided by the sum of
                        all the numbers (the alt ratio).

optional arguments:
  -h, --help            show this help message and exit
  -t INPUT_TSV, --input_tsv INPUT_TSV
                        A TSV report file to add information to. Required
                        columns are CHROM, POS, REF, ALT. These are used to
                        match each TSV entry to a VCF entry. Must be tab-
                        delimited.
  -s SAMPLE_NAME, --sample-name SAMPLE_NAME
                        If the input_vcf contains multiple samples, the name
                        of the sample to extract information for.
  -o OUTPUT_TSV, --output-tsv OUTPUT_TSV
                        Path to write the output report TSV file. If not
                        provided, the output TSV will be written next to the
                        input VCF with a .tsv file ending.