mavis/convert/vcf
PANDAS_DEFAULT_NA_VALUES
PANDAS_DEFAULT_NA_VALUES = [
'-1.#IND',
'1.#QNAN',
'1.#IND',
'-1.#QNAN',
'#N/A',
'N/A',
'NA',
'#NA',
'NULL',
'NaN',
'-NaN',
'nan',
'-nan',
]
class VcfInfoType
inherits TypedDict
Attributes
- SVTYPE (
str
) - CHR2 (
str
) - CIPOS (
Tuple[int, int]
) - CIEND (
Tuple[int, int]
) - CT (
str
) - END (
Optional[int]
) - PRECISE (
bool
)
class VcfRecordType
Attributes
- id (
str
) - pos (
int
) - chrom (
str
) - alts (
List[Optional[str]]
) - info (VcfInfoType)
- ref (
str
)
parse_bnd_alt()
parses the alt statement from vcf files using the specification in vcf 4.2/4.2.
Assumes that the reference base is always the outermost base (this is based on the spec and also manta results as the spec was missing some cases)
r = reference base/seq u = untemplated sequence/alternate sequence p = chromosome:position
alt format | orients |
---|---|
ru[p[ | LR |
[p[ur | RR |
]p]ur | RL |
ru]p] | LL |
def parse_bnd_alt(alt: str) -> Tuple[str, int, str, str, str, str]:
Args
- alt (
str
)
Returns
Tuple[str, int, str, str, str, str]
convert_record()
converts a vcf record
def convert_record(record: VcfRecordType) -> List[Dict]:
Args
- record (VcfRecordType)
Returns
List[Dict]
Note
CT = connection type, If given this field will be used in determining the orientation at the breakpoints. From https://groups.google.com/forum/#!topic/delly-users/6Mq2juBraRY, we can expect certain CT types for certain event types - translocation/inverted translocation: 3to3, 3to5, 5to3, 5to5 - inversion: 3to3, 5to5 - deletion: 3to5 - duplication: 5to3
pandas_vcf()
Read a standard vcf file into a pandas dataframe
def pandas_vcf(input_file: str) -> Tuple[List[str], pd.DataFrame]:
Args
- input_file (
str
)
Returns
Tuple[List[str], pd.DataFrame]
convert_file()
process a VCF file
def convert_file(input_file: str) -> List[Dict]:
Args
- input_file (
str
): the input file name
Returns
List[Dict]
Raises
err
: [description]