mavis/annotate/variant

class Annotation

a fusion of two transcripts created by the associated breakpoint_pair will also hold the other annotations for overlapping and encompassed and nearest genes

Attributes

encompassed_genes (Set[Gene])
genes_proximal_to_break1 (Set[Gene])
genes_proximal_to_break2 (Set[Gene])
genes_overlapping_break1 (Set[Gene])
genes_overlapping_break2 (Set[Gene])
proximity (int)
fusion (Optional[FusionTranscript])
transcript1 (Optional[Transcript])
transcript2 (Optional[Transcript])

Annotation.init()

Holds a breakpoint call and a set of transcripts, other information is gathered relative to these

def __init__(
    self,
    bpp: BreakpointPair,
    transcript1: Optional[Transcript] = None,
    transcript2: Optional[Transcript] = None,
    proximity: int = 5000,
    **kwargs,
):

Args

bpp (BreakpointPair): the breakpoint pair call. Will be adjusted and then stored based on the transcripts
transcript1 (Optional[Transcript]): transcript at the first breakpoint
transcript2 (Optional[Transcript]): Transcript at the second breakpoint
proximity (int)

Annotation.add_gene()

adds a input_gene to the current set of annotations. Checks which set it should be added to

def add_gene(self, input_gene: Gene):

Args

input_gene (Gene): the input_gene being added

Annotation.flatten()

generates a dictionary of the annotation information as strings

def flatten(self) -> Dict:

Returns

Dict: dictionary of attribute names and values

class IndelCall

Attributes

nterm_aligned (int)
cterm_aligned (int)
ref_seq (str)
mut_seq (str)
ins_seq (str)
del_seq (str)
is_dup (bool)
terminates (bool)

IndelCall.init()

Given two sequences, Assuming there exists a single difference between the two call an indel which accounts for the change

def __init__(self, refseq: str, mutseq: str):

Args

refseq (str): The reference (amino acid) sequence
mutseq (str): The mutated (amino acid) sequence

IndelCall.hgvs_protein_notation()

returns the HGVS protein notation for an indel call

def hgvs_protein_notation(self) -> Optional[str]:

Returns

Optional[str]

flatten_fusion_translation()

for a given fusion product (translation) gather the information to be output to the tabbed files

def flatten_fusion_translation(translation: Translation) -> Dict:

Args

translation (Translation): the translation which is on the fusion transcript

Returns

Dict: the dictionary of column names to values

call_protein_indel()

compare the fusion protein/aa sequence to the reference protein/aa sequence and return an hgvs notation indel call

def call_protein_indel(
    ref_translation: Translation,
    fusion_translation: Translation,
    reference_genome: Optional[ReferenceGenome] = None,
) -> str:

Args

ref_translation (Translation): the reference protein/translation
fusion_translation (Translation): the fusion protein/translation
reference_genome (Optional[ReferenceGenome]): the reference genome object used to fetch the reference translation AA sequence

Returns

str: the HGVS protein indel notation

choose_more_annotated()

for a given set of annotations if there are annotations which contain transcripts and annotations that are simply intergenic regions, discard the intergenic region annotations

similarly if there are annotations where both breakpoints fall in a transcript and annotations where one or more breakpoints lands in an intergenic region, discard those that land in the intergenic region

def choose_more_annotated(ann_list: List[Annotation]) -> List[Annotation]:

Args

ann_list (List[Annotation]): list of input annotations

Returns

List[Annotation]: the filtered list

Warning

input annotations are assumed to be the same event (the same validation_id) the logic used would not apply to different events

choose_transcripts_by_priority()

for each set of annotations with the same combinations of genes, choose the annotation with the most "best_transcripts" or most "alphanumeric" choices of transcript. Throw an error if they are identical

def choose_transcripts_by_priority(ann_list: List[Annotation]) -> List[Annotation]:

Args

ann_list (List[Annotation]): input annotations

Returns

List[Annotation]: the filtered list

Warning

input annotations are assumed to be the same event (the same validation_id) the logic used would not apply to different events

mavis/annotate/variant

class Annotation

Annotation.__init__()

Annotation.add_gene()

Annotation.flatten()

class IndelCall

IndelCall.__init__()

IndelCall.hgvs_protein_notation()

flatten_fusion_translation()

call_protein_indel()

choose_more_annotated()

choose_transcripts_by_priority()

Annotation.init()

IndelCall.init()