mavis/validate/base

class Evidence

Attributes

assembly_max_kmer_size (int)
bam_cache (BamCache)
classification (Optional[str])
compatible_flanking_pairs (Set)
compatible_window1 (Optional[Interval])
compatible_window2 (Optional[Interval])
config (Dict)
contigs (List)
counts (List[int])
flanking_pairs (Set)
half_mapped (Tuple[Set, Set])
median_fragment_size (int)
read_length (int)
reference_genome (Dict)
spanning_reads (Set)
split_reads (Tuple[Set, Set])
stdev_fragment_size (int)
strand_determining_read (int)
inner_window1 (Interval)
inner_window2 (Interval)
outer_window1 (Interval)
outer_window2 (Interval)

Evidence.collect_from_outer_window()

determines if evidence should be collected from the outer window (looking for flanking evidence) or should be limited to the inner window (split/spanning/contig only)

def collect_from_outer_window(self):

Returns

bool: True or False

Evidence.supporting_reads()

convenience method to return all flanking, split and spanning reads associated with an evidence object

def supporting_reads(self):

Evidence.collect_spanning_read()

spanning read: a read covering BOTH breakpoints

This is only applicable to small events. Do not need to look for soft clipped reads here since they will be collected already

def collect_spanning_read(self, read: pysam.AlignedSegment):

Args

read (pysam.AlignedSegment): the putative spanning read

Returns

bool: - True: the read was collected and stored in the current evidence object - False: the read was not collected

Evidence.collect_compatible_flanking_pair()

checks if a given read meets the minimum quality criteria to be counted as evidence as stored as support for this event

def collect_compatible_flanking_pair(
    self, read: pysam.AlignedSegment, mate: pysam.AlignedSegment, compatible_type: str
) -> bool:

Args

read (pysam.AlignedSegment): the read to add
mate (pysam.AlignedSegment): the mate
compatible_type (str): the type we are collecting for

Returns

bool: - True: the pair was collected and stored in the current evidence object - False: the pair was not collected

Raises

ValueError: if the input reads are not a valid pair

Note

see theory - types of flanking evidence

Evidence.collect_flanking_pair()

checks if a given read meets the minimum quality criteria to be counted as evidence as stored as support for this event

def collect_flanking_pair(self, read: pysam.AlignedSegment, mate: pysam.AlignedSegment):

Args

read (pysam.AlignedSegment): the read to add
mate (pysam.AlignedSegment): the mate

Returns

bool: - True: the pair was collected and stored in the current evidence object - False: the pair was not collected

Raises

ValueError: if the input reads are not a valid pair : see theory - types of flanking evidence

Evidence.collect_split_read()

adds a split read if it passes the criteria filters and raises a warning if it does not

def collect_split_read(self, read: pysam.AlignedSegment, first_breakpoint: bool):

Args

read (pysam.AlignedSegment): the read to add
first_breakpoint (bool): add to the first breakpoint (or second if false)

Returns

bool: - True: the read was collected and stored in the current evidence object - False: the read was not collected

Raises

NotSpecifiedError: if the breakpoint orientation is not specified

Evidence.decide_sequenced_strand()

given a set of reads, determines the sequenced strand (if possible) and then returns the majority strand found

def decide_sequenced_strand(self, reads: Set[pysam.AlignedSegment]):

Args

reads (Set[pysam.AlignedSegment])

Returns

STRAND: the sequenced strand

Raises

ValueError: input was an empty set or the ratio was not sufficient to decide on a strand

Evidence.assemble_contig()

uses the split reads and the partners of the half mapped reads to create a contig representing the sequence across the breakpoints

if it is not strand specific then sequences are sorted alphanumerically and only the first of a pair is kept (paired by sequence)

def assemble_contig(self):

Evidence.load_evidence()

open the associated bam file and read and store the evidence does some preliminary read-quality filtering

def load_evidence(self):

Evidence.generate_window()

given some input breakpoint uses the current evidence setting to determine an appropriate window/range of where one should search for supporting reads

def generate_window(self, breakpoint: Breakpoint) -> Interval:

Args

breakpoint (Breakpoint): the breakpoint we are generating the evidence window for

Returns

Interval: the range where reads should be read from the bam looking for evidence for this event