variant module

class mavis.annotate.variant.Annotation(bpp, transcript1=None, transcript2=None, proximity=5000, data=None, **kwargs)[source]

Bases: mavis.breakpoint.BreakpointPair

a fusion of two transcripts created by the associated breakpoint_pair will also hold the other annotations for overlapping and encompassed and nearest genes

Holds a breakpoint call and a set of transcripts, other information is gathered relative to these

Parameters:
  • bpp (BreakpointPair) – the breakpoint pair call. Will be adjusted and then stored based on the transcripts
  • transcript1 (Transcript) – transcript at the first breakpoint
  • transcript2 (Transcript) – Transcript at the second breakpoint
  • data (dict) – optional dictionary to hold related attributes
  • event_type (SVTYPE) – the type of event
add_gene(input_gene)[source]

adds a input_gene to the current set of annotations. Checks which set it should be added to

Parameters:input_gene (input_gene) – the input_gene being added
flatten()[source]

generates a dictionary of the annotation information as strings

Returns:dictionary of attribute names and values
Return type:dict of str by str
single_transcript()[source]
class mavis.annotate.variant.IndelCall(refseq, mutseq)[source]

Bases: object

Given two sequences, Assuming there exists a single difference between the two call an indel which accounts for the change

Parameters:
  • refseq (str) – The reference (amino acid) sequence
  • mutseq (str) – The mutated (amino acid) sequence
nterm_aligned

the number of characters aligned consecutively from the start of both strings

Type:int
cterm_aligned

the number of characters aligned consecutively from the end of both strings

Type:int
is_dup

flag to indicate a duplication

Type:bool
ref_seq

the reference sequence

Type:str
mut_seq

the mutated sequence

Type:str
ins_seq

the inserted sequence

Type:str
del_seq

the deleted sequence

Type:str
terminates

both sequences end in stop AAs

Type:bool
hgvs_protein_notation()[source]

returns the HGVS protein notation for an indel call

mavis.annotate.variant.annotate_events(bpps, annotations, reference_genome, max_proximity=5000, min_orf_size=200, min_domain_mapping_match=0.95, max_orf_cap=3, log=<mavis.util.Log object>, filters=None)[source]
Parameters:
  • bpps (list of BreakpointPair) – list of events
  • annotations – reference annotations
  • reference_genome (dict of string by string) – dictionary of reference sequences by name
  • max_proximity (int) – see max_proximity
  • min_orf_size (int) – see min_orf_size
  • min_domain_mapping_match (float) – see min_domain_mapping_match
  • max_orf_cap (int) – see max_orf_cap
  • log (callable) – callable function to take in strings and time_stamp args
  • filters (list of callable) – list of functions taking in a list and returning a list for filtering
Returns:

list of the putative annotations

Return type:

list of Annotation

mavis.annotate.variant.call_protein_indel(ref_translation, fusion_translation, reference_genome=None)[source]

compare the fusion protein/aa sequence to the reference protein/aa sequence and return an hgvs notation indel call

Parameters:
  • ref_translation (Translation) – the reference protein/translation
  • fusion_translation (Translation) – the fusion protein/translation
  • reference_genome – the reference genome object used to fetch the reference translation AA sequence
Returns:

the HGVS protein indel notation

Return type:

str

mavis.annotate.variant.choose_more_annotated(ann_list)[source]

for a given set of annotations if there are annotations which contain transcripts and annotations that are simply intergenic regions, discard the intergenic region annotations

similarly if there are annotations where both breakpoints fall in a transcript and annotations where one or more breakpoints lands in an intergenic region, discard those that land in the intergenic region

Parameters:ann_list (list of Annotation) – list of input annotations

Warning

input annotations are assumed to be the same event (the same validation_id) the logic used would not apply to different events

Returns:the filtered list
Return type:list of Annotation
mavis.annotate.variant.choose_transcripts_by_priority(ann_list)[source]

for each set of annotations with the same combinations of genes, choose the annotation with the most “best_transcripts” or most “alphanumeric” choices of transcript. Throw an error if they are identical

Parameters:ann_list (list of Annotation) – input annotations

Warning

input annotations are assumed to be the same event (the same validation_id) the logic used would not apply to different events

Returns:the filtered list
Return type:list of Annotation
mavis.annotate.variant.flatten_fusion_transcript(spliced_fusion_transcript)[source]
mavis.annotate.variant.flatten_fusion_translation(translation)[source]

for a given fusion product (translation) gather the information to be output to the tabbed files

Parameters:translation (Translation) – the translation which is on the fusion transcript
Returns:the dictionary of column names to values
Return type:dict
mavis.annotate.variant.overlapping_transcripts(ref_ann, breakpoint)[source]
Parameters:
  • ref_ann (dict of list of Gene by str) – the reference list of genes split by chromosome
  • breakpoint (Breakpoint) – the breakpoint in question
Returns:

a list of possible transcripts

Return type:

list of PreTranscript