mavis/validate/align
Should take in a sam file from a aligner like bwa aln or bwa mem and convert it into a
class SUPPORTED_ALIGNER
inherits MavisNamespace
supported aligners
Attributes
class DiscontinuousAlignment
inherits BreakpointPair
DiscontinuousAlignment.query_coverage()
interval representing the total region of the input sequence that is covered by the combination of alignments
def query_coverage(self):
DiscontinuousAlignment.query_consumption()
fraction of the query sequence which is aligned (everything not soft-clipped) in either alignment
def query_consumption(self):
DiscontinuousAlignment.score()
scores events between 0 and 1 penalizing events interrupting the alignment. Counts a split alignment as a single event
def score(self, consec_bonus=10) -> float:
Args
- consec_bonus
Returns
float
get_aligner_version()
executes a subprocess to try and run the aligner without arguments and parse the version number from the output
def get_aligner_version(aligner: str) -> str:
Args
- aligner (
str
)
Returns
str
Examples
>>> get_aligner_version('blat')
'36x2'
convert_to_duplication()
Given a breakpoint call, tests if the untemplated sequences matches the preceding reference sequence. If it does this is annotated as a duplication and the new breakpoint pair is returned. If not, then the original breakpoint pair is returned
def convert_to_duplication(alignment, reference_genome: ReferenceGenome):
Args
- alignment
- reference_genome (ReferenceGenome)
call_read_events()
Given a read, return breakpoint pairs representing all putative events
def call_read_events(read, secondary_read=None, is_stranded=False):
Args
- read
- secondary_read
- is_stranded
read_breakpoint()
convert a given read to a single breakpoint
def read_breakpoint(read):
Args
- read
call_paired_read_event()
For a given pair of reads call all applicable events. Assume there is a major event from both reads and then call indels from the individual reads. Should be alternate alignments of the same read
def call_paired_read_event(read1, read2, is_stranded=False):
Args
- read1
- read2
- is_stranded
align_sequences()
calls the alignment tool and parses the return output for a set of sequences
def align_sequences(
sequences: Dict[str, str],
input_bam_cache: 'BamCache',
reference_genome: ReferenceGenome,
aligner: str,
aligner_reference: str,
aligner_output_file='aligner_out.temp',
aligner_fa_input_file='aligner_in.fa',
aligner_output_log='aligner_out.log',
blat_limit_top_aln=25,
blat_min_identity=0.7,
clean_files=True,
**kwargs,
):
Args
- sequences (
Dict[str, str]
): dictionary of sequences by name - input_bam_cache (BamCache): bam cache to be used as a template for reading the alignments
- reference_genome (ReferenceGenome): the reference genome
- aligner (
str
): the name of the aligner to be used - aligner_reference (
str
): path to the aligner reference file - aligner_output_file
- aligner_fa_input_file
- aligner_output_log
- blat_limit_top_aln
- blat_min_identity
- clean_files
select_contig_alignments()
standardize/simplify reads and filter bad/irrelevant alignments adds the contig alignments to the contigs
def select_contig_alignments(evidence, reads_by_query):
Args
- evidence
- reads_by_query