Skip to content

mavis/convert/vcf

PANDAS_DEFAULT_NA_VALUES

PANDAS_DEFAULT_NA_VALUES = [
    '-1.#IND',
    '1.#QNAN',
    '1.#IND',
    '-1.#QNAN',
    '#N/A',
    'N/A',
    'NA',
    '#NA',
    'NULL',
    'NaN',
    '-NaN',
    'nan',
    '-nan',
]

class VcfInfoType

inherits TypedDict

Attributes

  • SVTYPE (str)
  • CHR2 (str)
  • CIPOS (Tuple[int, int])
  • CIEND (Tuple[int, int])
  • CILEN (Tuple[int, int])
  • CT (str)
  • END (Optional[int])
  • PRECISE (bool)

class VcfRecordType

Attributes

  • id (str)
  • pos (int)
  • chrom (str)
  • alts (List[Optional[str]])
  • info (VcfInfoType)
  • ref (str)

parse_bnd_alt()

parses the alt statement from vcf files using the specification in vcf 4.2/4.2.

Assumes that the reference base is always the outermost base (this is based on the spec and also manta results as the spec was missing some cases)

r = reference base/seq u = untemplated sequence/alternate sequence p = chromosome:position

alt format orients
ru[p[ LR
[p[ur RR
]p]ur RL
ru]p] LL
def parse_bnd_alt(alt: str) -> Tuple[str, int, str, str, str, str]:

Args

  • alt (str)

Returns

  • Tuple[str, int, str, str, str, str]

convert_imprecise_breakend()

Handles IMPRECISE calls, that leveraged uncertainty from the CIPOS/CIEND/CILEN fields.

bp1_s = breakpoint1 start bp1_e = breakpoint1 end bp2_s = breakpoint2 start bp2_e = breakpoint2 end

Insertion and deletion edge case - in which bp1_e > bp2_s E.g bp1_s = 1890, bp1_e = 2000, bp2_s = 1900, bp2_e = 1900. break1 ------------------------=======================-------------- break2 ------------------------==========---------------------------

Insertion edge case - in which bp1_e > bp1_s E.g bp1_s = 1890, bp1_e = 1800, bp2_s = 1800, bp2_e = 1800. break1 ------------------------==----------------------------------- break2 ------------------------=------------------------------------

Insertion edge case - in which bp1_s > bp2_s E.g bp1_s = 1950, bp1_e = 2000, bp2_s = 1900, bp2_e = 3000. break1 ------------------------==----------------------------------- break2 -----------------------========------------------------------

def convert_imprecise_breakend(std_row: Dict, record: List[VcfRecordType], bp_end: int):

Args

convert_record()

converts a vcf record

def convert_record(record: VcfRecordType) -> List[Dict]:

Args

Returns

  • List[Dict]

Note

CT = connection type, If given this field will be used in determining the orientation at the breakpoints. From https://groups.google.com/forum/#!topic/delly-users/6Mq2juBraRY, we can expect certain CT types for certain event types - translocation/inverted translocation: 3to3, 3to5, 5to3, 5to5 - inversion: 3to3, 5to5 - deletion: 3to5 - duplication: 5to3

pandas_vcf()

Read a standard vcf file into a pandas dataframe

def pandas_vcf(input_file: str) -> Tuple[List[str], pd.DataFrame]:

Args

  • input_file (str)

Returns

  • Tuple[List[str], pd.DataFrame]

convert_file()

process a VCF file

def convert_file(input_file: str) -> List[Dict]:

Args

  • input_file (str): the input file name

Returns

  • List[Dict]

Raises

  • err: [description]