mavis/breakpoint
class Breakpoint
inherits Interval
class for storing information about a SV breakpoint coordinates are given as 1-indexed
Attributes
- orient (
str
) - chr (
str
) - strand (
str
) - seq (
str
)
Breakpoint.__init__()
def __init__(
self,
chr: str,
start: int,
end: Optional[int] = None,
orient=ORIENT.NS,
strand=STRAND.NS,
seq: Optional[str] = None,
):
Args
- chr (
str
): the chromosome - start (
int
): the genomic position of the breakpoint - end (
Optional[int]
): if the breakpoint is uncertain (a range) then specify the end of the range here - orient (ORIENT): the orientation (which side is retained at the break)
- strand (STRAND): the strand
- seq (
Optional[str]
): the seq
Examples
>>> Breakpoint('1', 1, 2)
>>> Breakpoint('1', 1)
>>> Breakpoint('1', 1, 2, 'R', )
>>> Breakpoint('1', 1, orient='R')
class BreakpointPair
Attributes
- break1 (Breakpoint)
- break2 (Breakpoint)
- stranded (
bool
) - opposing_strands (
bool
) - untemplated_seq (
Optional[str]
) - data (
Dict
)
BreakpointPair.interchromosomal()
bool: True if the breakpoints are on different chromosomes, False otherwise
@property
def interchromosomal(self) -> bool:
Args
- self
Returns
bool
BreakpointPair.flatten()
returns the key-value self for the breakpoint self information as can be written directly as a tab row
def flatten(self):
BreakpointPair.net_size()
Returns the size of the event for a given pair. Mainly applicable to indels
def net_size(self, distance=lambda x, y: Interval(abs(x - y))) -> Interval:
Args
- distance
Returns
BreakpointPair.breakpoint_sequence_homology()
for a given set of breakpoints matches the sequence opposite the partner breakpoint this sequence comparison is done with reference to a reference genome and does not use novel or untemplated sequence in the comparison. For this reason, insertions will never return any homologous sequence
small duplication event CTT => CTTCTT
GATACATTTCTTCTTGAAAA reference ---------<========== first breakpoint ===========>-------- second breakpoint ---------CT-CT------ first break homology -------TT-TT-------- second break homology
def breakpoint_sequence_homology(self, reference_genome: ReferenceGenome):
Args
- reference_genome (ReferenceGenome): dict of reference sequence by template/chr name
Returns
Tuple[str,str]
: homologous sequence at the first breakpoint and second breakpoints
Raises
AttributeError
: for non specific breakpoints
BreakpointPair.untemplated_shift()
gives a range for each breakpoint on the possible alignment range in the shifting the untemplated sequence
def untemplated_shift(self, reference_genome):
Args
- reference_genome
classify_breakpoint_pair()
uses the chr, orientations and strands to determine the possible structural_variant types that this pair could support
def classify_breakpoint_pair(pair: BreakpointPair, distance: Optional[Callable] = None) -> Set[str]:
Args
- pair (BreakpointPair): the pair to classify
- distance (
Optional[Callable]
): if defined, will be passed to net size to use in narrowing the list of putative types (del vs ins)
Returns
Set[str]
: a list of possible SVTYPE
Examples
>>> bpp = BreakpointPair(Breakpoint('1', 1), Breakpoint('1', 9999), opposing_strands=True)
>>> classify_breakpoint_pair(bpp)
['inversion']
>>> bpp = BreakpointPair(Breakpoint('1', 1, orient='L'), Breakpoint('1', 9999, orient='R'), opposing_strands=False)
>>> classify_breakpoint_pair(bpp)
{'deletion', 'insertion'}
Note