mavis/validate/base
class Evidence
inherits BreakpointPair
Attributes
- assembly_max_kmer_size (
int
) - bam_cache (BamCache)
- classification (
Optional[str]
) - compatible_flanking_pairs (
Set
) - compatible_window1 (Optional[Interval])
- compatible_window2 (Optional[Interval])
- config (
Dict
) - contigs (
List
) - counts (
List[int]
) - flanking_pairs (
Set
) - half_mapped (
Tuple[Set, Set]
) - median_fragment_size (
int
) - read_length (
int
) - reference_genome (
Dict
) - spanning_reads (
Set
) - split_reads (
Tuple[Set, Set]
) - stdev_fragment_size (
int
) - strand_determining_read (
int
) - inner_window1 (Interval)
- inner_window2 (Interval)
- outer_window1 (Interval)
- outer_window2 (Interval)
Evidence.collect_from_outer_window()
determines if evidence should be collected from the outer window (looking for flanking evidence) or should be limited to the inner window (split/spanning/contig only)
def collect_from_outer_window(self):
Returns
bool
: True or False
Evidence.supporting_reads()
convenience method to return all flanking, split and spanning reads associated with an evidence object
def supporting_reads(self):
Evidence.collect_spanning_read()
spanning read: a read covering BOTH breakpoints
This is only applicable to small events. Do not need to look for soft clipped reads here since they will be collected already
def collect_spanning_read(self, read: pysam.AlignedSegment):
Args
- read (
pysam.AlignedSegment
): the putative spanning read
Returns
bool
: - True: the read was collected and stored in the current evidence object - False: the read was not collected
Evidence.collect_compatible_flanking_pair()
checks if a given read meets the minimum quality criteria to be counted as evidence as stored as support for this event
def collect_compatible_flanking_pair(
self, read: pysam.AlignedSegment, mate: pysam.AlignedSegment, compatible_type: str
) -> bool:
Args
- read (
pysam.AlignedSegment
): the read to add - mate (
pysam.AlignedSegment
): the mate - compatible_type (
str
): the type we are collecting for
Returns
bool
: - True: the pair was collected and stored in the current evidence object - False: the pair was not collected
Raises
ValueError
: if the input reads are not a valid pair
Note
Evidence.collect_flanking_pair()
checks if a given read meets the minimum quality criteria to be counted as evidence as stored as support for this event
def collect_flanking_pair(self, read: pysam.AlignedSegment, mate: pysam.AlignedSegment):
Args
- read (
pysam.AlignedSegment
): the read to add - mate (
pysam.AlignedSegment
): the mate
Returns
bool
: - True: the pair was collected and stored in the current evidence object - False: the pair was not collected
Raises
ValueError
: if the input reads are not a valid pair : see theory - types of flanking evidence
Evidence.collect_split_read()
adds a split read if it passes the criteria filters and raises a warning if it does not
def collect_split_read(self, read: pysam.AlignedSegment, first_breakpoint: bool):
Args
- read (
pysam.AlignedSegment
): the read to add - first_breakpoint (
bool
): add to the first breakpoint (or second if false)
Returns
bool
: - True: the read was collected and stored in the current evidence object - False: the read was not collected
Raises
- NotSpecifiedError: if the breakpoint orientation is not specified
Evidence.decide_sequenced_strand()
given a set of reads, determines the sequenced strand (if possible) and then returns the majority strand found
def decide_sequenced_strand(self, reads: Set[pysam.AlignedSegment]):
Args
- reads (
Set[pysam.AlignedSegment]
)
Returns
- STRAND: the sequenced strand
Raises
ValueError
: input was an empty set or the ratio was not sufficient to decide on a strand
Evidence.assemble_contig()
uses the split reads and the partners of the half mapped reads to create a contig representing the sequence across the breakpoints
if it is not strand specific then sequences are sorted alphanumerically and only the first of a pair is kept (paired by sequence)
def assemble_contig(self):
Evidence.load_evidence()
open the associated bam file and read and store the evidence does some preliminary read-quality filtering
def load_evidence(self):
Evidence.generate_window()
given some input breakpoint uses the current evidence setting to determine an appropriate window/range of where one should search for supporting reads
def generate_window(self, breakpoint: Breakpoint) -> Interval:
Args
- breakpoint (Breakpoint): the breakpoint we are generating the evidence window for
Returns
- Interval: the range where reads should be read from the bam looking for evidence for this event