Skip to content

mavis/breakpoint

class Breakpoint

inherits Interval

class for storing information about a SV breakpoint coordinates are given as 1-indexed

Attributes

  • orient (str)
  • chr (str)
  • strand (str)
  • seq (str)

Breakpoint.__init__()

def __init__(
    self,
    chr: str,
    start: int,
    end: Optional[int] = None,
    orient=ORIENT.NS,
    strand=STRAND.NS,
    seq: Optional[str] = None,
):

Args

  • chr (str): the chromosome
  • start (int): the genomic position of the breakpoint
  • end (Optional[int]): if the breakpoint is uncertain (a range) then specify the end of the range here
  • orient (ORIENT): the orientation (which side is retained at the break)
  • strand (STRAND): the strand
  • seq (Optional[str]): the seq

Examples

>>> Breakpoint('1', 1, 2)
>>> Breakpoint('1', 1)
>>> Breakpoint('1', 1, 2, 'R', )
>>> Breakpoint('1', 1, orient='R')

class BreakpointPair

Attributes

  • break1 (Breakpoint)
  • break2 (Breakpoint)
  • stranded (bool)
  • opposing_strands (bool)
  • untemplated_seq (Optional[str])
  • data (Dict)

BreakpointPair.interchromosomal()

bool: True if the breakpoints are on different chromosomes, False otherwise

@property
def interchromosomal(self) -> bool:

Args

  • self

Returns

  • bool

BreakpointPair.flatten()

returns the key-value self for the breakpoint self information as can be written directly as a tab row

def flatten(self):

BreakpointPair.net_size()

Returns the size of the event for a given pair. Mainly applicable to indels

def net_size(self, distance=lambda x, y: Interval(abs(x - y))) -> Interval:

Args

  • distance

Returns

BreakpointPair.breakpoint_sequence_homology()

for a given set of breakpoints matches the sequence opposite the partner breakpoint this sequence comparison is done with reference to a reference genome and does not use novel or untemplated sequence in the comparison. For this reason, insertions will never return any homologous sequence

small duplication event CTT => CTTCTT

GATACATTTCTTCTTGAAAA reference ---------<========== first breakpoint ===========>-------- second breakpoint ---------CT-CT------ first break homology -------TT-TT-------- second break homology

def breakpoint_sequence_homology(self, reference_genome: ReferenceGenome):

Args

  • reference_genome (ReferenceGenome): dict of reference sequence by template/chr name

Returns

  • Tuple[str,str]: homologous sequence at the first breakpoint and second breakpoints

Raises

  • AttributeError: for non specific breakpoints

BreakpointPair.untemplated_shift()

gives a range for each breakpoint on the possible alignment range in the shifting the untemplated sequence

def untemplated_shift(self, reference_genome):

Args

  • reference_genome

classify_breakpoint_pair()

uses the chr, orientations and strands to determine the possible structural_variant types that this pair could support

def classify_breakpoint_pair(pair: BreakpointPair, distance: Optional[Callable] = None) -> Set[str]:

Args

  • pair (BreakpointPair): the pair to classify
  • distance (Optional[Callable]): if defined, will be passed to net size to use in narrowing the list of putative types (del vs ins)

Returns

  • Set[str]: a list of possible SVTYPE

Examples

>>> bpp = BreakpointPair(Breakpoint('1', 1), Breakpoint('1', 9999), opposing_strands=True)
>>> classify_breakpoint_pair(bpp)
['inversion']
>>> bpp = BreakpointPair(Breakpoint('1', 1, orient='L'), Breakpoint('1', 9999, orient='R'), opposing_strands=False)
>>> classify_breakpoint_pair(bpp)
{'deletion', 'insertion'}