breakpoint module¶

class mavis.breakpoint.Breakpoint(chr, start, end=None, orient='?', strand='?', seq=None)[source]¶

Bases: mavis.interval.Interval

class for storing information about a SV breakpoint coordinates are given as 1-indexed

Parameters:	chr (str) – the chromosome start (int) – the genomic position of the breakpoint end (int) – if the breakpoint is uncertain (a range) then specify the end of the range here orient (ORIENT) – the orientation (which side is retained at the break) strand (STRAND) – the strand seq (str) – the seq

Examples

>>> Breakpoint('1', 1, 2)
>>> Breakpoint('1', 1)
>>> Breakpoint('1', 1, 2, 'R', )
>>> Breakpoint('1', 1, orient='R')

key¶

to_dict()[source]¶

class mavis.breakpoint.BreakpointPair(b1, b2, stranded=False, opposing_strands=None, untemplated_seq=None, data=None, **kwargs)[source]¶

Bases: object

Parameters:

b1 (Breakpoint) – the first breakpoint
b2 (Breakpoint) – the second breakpoint
stranded (bool) – if not stranded then +/- is equivalent to -/+
opposing_strands (bool) – are the strands at the breakpoint opposite? i.e. +/- instead of +/+
untemplated_seq (str) – seq between the breakpoints that is not part of either breakpoint
data (dict) – optional dictionary of attributes associated with this pair

Note

untemplated_seq should always be given wrt to the positive/forward reference strand

Example

>>> BreakpointPair(Breakpoint('1', 1), Breakpoint('1', 9999), opposing_strands=True)
>>> BreakpointPair(Breakpoint('1', 1, strand='+'), Breakpoint('1', 9999, strand='-'))

breakpoint_sequence_homology(reference_genome)[source]¶

for a given set of breakpoints matches the sequence opposite the partner breakpoint this sequence comparison is done with reference to a reference genome and does not use novel or untemplated sequence in the comparison. For this reason, insertions will never return any homologous sequence

small duplication event CTT => CTTCTT

GATACATTTCTTCTTGAAAA reference
---------<========== first breakpoint
===========>-------- second breakpoint
---------CT-CT------ first break homology
-------TT-TT-------- second break homology

Parameters:	reference_genome (`dict` of `Bio.SeqRecord` by `str`) – dict of reference sequence by template/chr name
Returns:	`str` - homologous sequence at the first breakpoint `str` - homologous sequence at the second breakpoint
Return type:	tuple
Raises:	`AttributeError` – for non specific breakpoints

classmethod classify(pair, distance=None)[source]¶

uses the chr, orientations and strands to determine the possible structural_variant types that this pair could support

Parameters:	pair (BreakpointPair) – the pair to classify distance (callable) – if defined, will be passed to net size to use in narrowing the list of putative types (del vs ins)
Returns:	a list of possible SVTYPE
Return type:	`list` of `SVTYPE`

Example

>>> bpp = BreakpointPair(Breakpoint('1', 1), Breakpoint('1', 9999), opposing_strands=True)
>>> BreakpointPair.classify(bpp)
['inversion']
>>> bpp = BreakpointPair(Breakpoint('1', 1, orient='L'), Breakpoint('1', 9999, orient='R'), opposing_strands=False)
>>> BreakpointPair.classify(bpp)
{'deletion', 'insertion'}

see related theory documentation

copy()[source]¶

flatten()[source]¶: returns the key-value self for the breakpoint self information as can be written directly as a tab row

get_bed_repesentation()[source]¶

interchromosomal¶

True if the breakpoints are on different chromosomes, False otherwise

Type:	`bool`

is_putative_indel¶

net_size(distance=<function BreakpointPair.<lambda>>)[source]¶: Returns the size of the event for a given pair. Mainly applicable to indels

untemplated_shift(reference_genome)[source]¶: gives a range for each breakpoint on the possible alignment range in the shifting the untemplated sequence