mavis.align
Should take in a sam file from a aligner like bwa aln or bwa mem and convert it into a
SUPPORTED_ALIGNER
SUPPORTED_ALIGNER = MavisNamespace(
BWA_MEM='bwa mem', BLAT='blat', __name__='mavis.align.SUPPORTED_ALIGNER'
)
class mavis.align.SplitAlignment
inherits BreakpointPair
mavis.align.SplitAlignment.query_coverage()
interval representing the total region of the input sequence that is covered by the combination of alignments
def query_coverage(self):
mavis.align.SplitAlignment.query_consumption()
fraction of the query sequence which is aligned (everything not soft-clipped) in either alignment
def query_consumption(self):
mavis.align.SplitAlignment.score()
scores events between 0 and 1 penalizing events interrupting the alignment. Counts a split alignment as a single event
def score(self, consec_bonus=10):
Args
- consec_bonus
mavis.align.get_aligner_version()
executes a subprocess to try and run the aligner without arguments and parse the version number from the output
def get_aligner_version(aligner):
Args
- aligner
Examples
>>> get_aligner_version('blat')
'36x2'
mavis.align.convert_to_duplication()
Given a breakpoint call, tests if the untemplated sequences matches the preceding reference sequence. If it does this is annotated as a duplication and the new breakpoint pair is returned. If not, then the original breakpoint pair is returned
def convert_to_duplication(alignment, reference_genome):
Args
- alignment
- reference_genome
mavis.align.call_read_events()
Given a read, return breakpoint pairs representing all putative events
def call_read_events(read, secondary_read=None, is_stranded=False):
Args
- read
- secondary_read
- is_stranded
mavis.align.read_breakpoint()
convert a given read to a single breakpoint
def read_breakpoint(read):
Args
- read
mavis.align.call_paired_read_event()
For a given pair of reads call all applicable events. Assume there is a major event from both reads and then call indels from the individual reads
def call_paired_read_event(read1, read2, is_stranded=False):
Args
- read1
- read2
- is_stranded
mavis.align.align_sequences()
calls the alignment tool and parses the return output for a set of sequences
def align_sequences(
sequences,
input_bam_cache,
reference_genome,
aligner,
aligner_reference,
aligner_output_file='aligner_out.temp',
aligner_fa_input_file='aligner_in.fa',
aligner_output_log='aligner_out.log',
blat_limit_top_aln=25,
blat_min_identity=0.7,
clean_files=True,
log=DEVNULL,
**kwargs
):
Args
- sequences (
Dict[str,str]
): dictionary of sequences by name - input_bam_cache (
BamCache
): bam cache to be used as a template for reading the alignments - reference_genome: the reference genome
- aligner (
SUPPORTED_ALIGNER
): the name of the aligner to be used - aligner_reference (
str
): path to the aligner reference file - aligner_output_file
- aligner_fa_input_file
- aligner_output_log
- blat_limit_top_aln
- blat_min_identity
- clean_files
- log
mavis.align.select_contig_alignments()
standardize/simplify reads and filter bad/irrelevant alignments adds the contig alignments to the contigs
def select_contig_alignments(evidence, reads_by_query):
Args
- evidence
- reads_by_query