mavis/validate/base

class Evidence

Attributes

assembly_max_kmer_size (int)
bam_cache (BamCache)
classification (Optional[str])
compatible_flanking_pairs (Set)
compatible_window1 (Optional[Interval])
compatible_window2 (Optional[Interval])
config (Dict)
contigs (List[Contig])
counts (List[int])
flanking_pairs (Set)
half_mapped (Tuple[Set, Set])
median_fragment_size (int)
read_length (int)
reference_genome (Dict)
spanning_reads (Set)
split_reads (Tuple[Set, Set])
stdev_fragment_size (int)
strand_determining_read (int)
inner_window1 (Interval)
inner_window2 (Interval)
outer_window1 (Interval)
outer_window2 (Interval)

Evidence.collect_from_outer_window()

determines if evidence should be collected from the outer window (looking for flanking evidence) or should be limited to the inner window (split/spanning/contig only)

def collect_from_outer_window(self):

Returns

bool: True or False

Evidence.supporting_reads()

convenience method to return all flanking, split and spanning reads associated with an evidence object

def supporting_reads(self):

Evidence.decide_sequenced_strand()

given a set of reads, determines the sequenced strand (if possible) and then returns the majority strand found

def decide_sequenced_strand(self, reads: Set[pysam.AlignedSegment]):

Args

reads (Set[pysam.AlignedSegment])

Returns

STRAND: the sequenced strand

Raises

ValueError: input was an empty set or the ratio was not sufficient to decide on a strand

Evidence.assemble_contig()

uses the split reads and the partners of the half mapped reads to create a contig representing the sequence across the breakpoints

if it is not strand specific then sequences are sorted alphanumerically and only the first of a pair is kept (paired by sequence)

def assemble_contig(self):

Evidence.generate_window()

given some input breakpoint uses the current evidence setting to determine an appropriate window/range of where one should search for supporting reads

def generate_window(self, breakpoint: Breakpoint) -> Interval:

Args

breakpoint (Breakpoint): the breakpoint we are generating the evidence window for

Returns

Interval: the range where reads should be read from the bam looking for evidence for this event