mavis/validate/align

Should take in a sam file from a aligner like bwa aln or bwa mem and convert it into a

class SUPPORTED_ALIGNER

inherits MavisNamespace

supported aligners

Attributes

BLAT: blat
BWA_MEM: bwa mem

class DiscontinuousAlignment

inherits BreakpointPair

DiscontinuousAlignment.query_coverage()

interval representing the total region of the input sequence that is covered by the combination of alignments

def query_coverage(self):

DiscontinuousAlignment.query_consumption()

fraction of the query sequence which is aligned (everything not soft-clipped) in either alignment

def query_consumption(self):

DiscontinuousAlignment.score()

scores events between 0 and 1 penalizing events interrupting the alignment. Counts a split alignment as a single event

def score(self, consec_bonus=10) -> float:

Args

consec_bonus

Returns

float

get_aligner_version()

executes a subprocess to try and run the aligner without arguments and parse the version number from the output

def get_aligner_version(aligner: str) -> str:

Args

aligner (str)

Returns

str

Examples

>>> get_aligner_version('blat')
'36x2'

convert_to_duplication()

Given a breakpoint call, tests if the untemplated sequences matches the preceding reference sequence. If it does this is annotated as a duplication and the new breakpoint pair is returned. If not, then the original breakpoint pair is returned

def convert_to_duplication(alignment, reference_genome: ReferenceGenome):

Args

alignment
reference_genome (ReferenceGenome)

call_read_events()

Given a read, return breakpoint pairs representing all putative events

def call_read_events(read, secondary_read=None, is_stranded=False):

Args

read
secondary_read
is_stranded

read_breakpoint()

convert a given read to a single breakpoint

def read_breakpoint(read):

Args

read

call_paired_read_event()

For a given pair of reads call all applicable events. Assume there is a major event from both reads and then call indels from the individual reads. Should be alternate alignments of the same read

def call_paired_read_event(read1, read2, is_stranded=False):

Args

read1
read2
is_stranded

align_sequences()

calls the alignment tool and parses the return output for a set of sequences

def align_sequences(
    sequences: Dict[str, str],
    input_bam_cache: 'BamCache',
    reference_genome: ReferenceGenome,
    aligner: str,
    aligner_reference: str,
    aligner_output_file='aligner_out.temp',
    aligner_fa_input_file='aligner_in.fa',
    aligner_output_log='aligner_out.log',
    blat_limit_top_aln=25,
    blat_min_identity=0.7,
    clean_files=True,
    **kwargs,
):

Args

sequences (Dict[str, str]): dictionary of sequences by name
input_bam_cache (BamCache): bam cache to be used as a template for reading the alignments
reference_genome (ReferenceGenome): the reference genome
aligner (str): the name of the aligner to be used
aligner_reference (str): path to the aligner reference file
aligner_output_file
aligner_fa_input_file
aligner_output_log
blat_limit_top_aln
blat_min_identity
clean_files

select_contig_alignments()

standardize/simplify reads and filter bad/irrelevant alignments adds the contig alignments to the contigs

def select_contig_alignments(evidence, reads_by_query):

Args

evidence
reads_by_query