breakpoint module

class mavis.breakpoint.Breakpoint(chr, start, end=None, orient='?', strand='?', seq=None)[source]

Bases: mavis.interval.Interval

class for storing information about a SV breakpoint coordinates are given as 1-indexed

Parameters:
  • chr (str) – the chromosome
  • start (int) – the genomic position of the breakpoint
  • end (int) – if the breakpoint is uncertain (a range) then specify the end of the range here
  • orient (ORIENT) – the orientation (which side is retained at the break)
  • strand (STRAND) – the strand
  • seq (str) – the seq

Examples

>>> Breakpoint('1', 1, 2)
>>> Breakpoint('1', 1)
>>> Breakpoint('1', 1, 2, 'R', )
>>> Breakpoint('1', 1, orient='R')
key
to_dict()[source]
class mavis.breakpoint.BreakpointPair(b1, b2, stranded=False, opposing_strands=None, untemplated_seq=None, data=None, **kwargs)[source]

Bases: object

Parameters:
  • b1 (Breakpoint) – the first breakpoint
  • b2 (Breakpoint) – the second breakpoint
  • stranded (bool) – if not stranded then +/- is equivalent to -/+
  • opposing_strands (bool) – are the strands at the breakpoint opposite? i.e. +/- instead of +/+
  • untemplated_seq (str) – seq between the breakpoints that is not part of either breakpoint
  • data (dict) – optional dictionary of attributes associated with this pair

Note

untemplated_seq should always be given wrt to the positive/forward reference strand

Example

>>> BreakpointPair(Breakpoint('1', 1), Breakpoint('1', 9999), opposing_strands=True)
>>> BreakpointPair(Breakpoint('1', 1, strand='+'), Breakpoint('1', 9999, strand='-'))
breakpoint_sequence_homology(reference_genome)[source]

for a given set of breakpoints matches the sequence opposite the partner breakpoint this sequence comparison is done with reference to a reference genome and does not use novel or untemplated sequence in the comparison. For this reason, insertions will never return any homologous sequence

small duplication event CTT => CTTCTT

GATACATTTCTTCTTGAAAA reference
---------<========== first breakpoint
===========>-------- second breakpoint
---------CT-CT------ first break homology
-------TT-TT-------- second break homology
Parameters:reference_genome (dict of Bio.SeqRecord by str) – dict of reference sequence by template/chr name
Returns:
  • str - homologous sequence at the first breakpoint
  • str - homologous sequence at the second breakpoint
Return type:tuple
Raises:AttributeError – for non specific breakpoints
classmethod classify(pair, distance=None)[source]

uses the chr, orientations and strands to determine the possible structural_variant types that this pair could support

Parameters:
  • pair (BreakpointPair) – the pair to classify
  • distance (callable) – if defined, will be passed to net size to use in narrowing the list of putative types (del vs ins)
Returns:

a list of possible SVTYPE

Return type:

list of SVTYPE

Example

>>> bpp = BreakpointPair(Breakpoint('1', 1), Breakpoint('1', 9999), opposing_strands=True)
>>> BreakpointPair.classify(bpp)
['inversion']
>>> bpp = BreakpointPair(Breakpoint('1', 1, orient='L'), Breakpoint('1', 9999, orient='R'), opposing_strands=False)
>>> BreakpointPair.classify(bpp)
{'deletion', 'insertion'}

see related theory documentation

copy()[source]
flatten()[source]

returns the key-value self for the breakpoint self information as can be written directly as a tab row

get_bed_repesentation()[source]
interchromosomal

True if the breakpoints are on different chromosomes, False otherwise

Type:bool
is_putative_indel
net_size(distance=<function BreakpointPair.<lambda>>)[source]

Returns the size of the event for a given pair. Mainly applicable to indels

untemplated_shift(reference_genome)[source]

gives a range for each breakpoint on the possible alignment range in the shifting the untemplated sequence