Skip to content

mavis/validate/align

Should take in a sam file from a aligner like bwa aln or bwa mem and convert it into a

class SUPPORTED_ALIGNER

inherits MavisNamespace

supported aligners

Attributes

class DiscontinuousAlignment

inherits BreakpointPair

DiscontinuousAlignment.query_coverage()

interval representing the total region of the input sequence that is covered by the combination of alignments

def query_coverage(self):

DiscontinuousAlignment.query_consumption()

fraction of the query sequence which is aligned (everything not soft-clipped) in either alignment

def query_consumption(self):

DiscontinuousAlignment.score()

scores events between 0 and 1 penalizing events interrupting the alignment. Counts a split alignment as a single event

def score(self, consec_bonus=10) -> float:

Args

  • consec_bonus

Returns

  • float

get_aligner_version()

executes a subprocess to try and run the aligner without arguments and parse the version number from the output

def get_aligner_version(aligner: str) -> str:

Args

  • aligner (str)

Returns

  • str

Examples

>>> get_aligner_version('blat')
'36x2'

convert_to_duplication()

Given a breakpoint call, tests if the untemplated sequences matches the preceding reference sequence. If it does this is annotated as a duplication and the new breakpoint pair is returned. If not, then the original breakpoint pair is returned

def convert_to_duplication(alignment, reference_genome: ReferenceGenome):

Args

call_read_events()

Given a read, return breakpoint pairs representing all putative events

def call_read_events(read, secondary_read=None, is_stranded=False):

Args

  • read
  • secondary_read
  • is_stranded

read_breakpoint()

convert a given read to a single breakpoint

def read_breakpoint(read):

Args

  • read

call_paired_read_event()

For a given pair of reads call all applicable events. Assume there is a major event from both reads and then call indels from the individual reads. Should be alternate alignments of the same read

def call_paired_read_event(read1, read2, is_stranded=False):

Args

  • read1
  • read2
  • is_stranded

align_sequences()

calls the alignment tool and parses the return output for a set of sequences

def align_sequences(
    sequences: Dict[str, str],
    input_bam_cache: 'BamCache',
    reference_genome: ReferenceGenome,
    aligner: str,
    aligner_reference: str,
    aligner_output_file='aligner_out.temp',
    aligner_fa_input_file='aligner_in.fa',
    aligner_output_log='aligner_out.log',
    blat_limit_top_aln=25,
    blat_min_identity=0.7,
    clean_files=True,
    **kwargs,
):

Args

  • sequences (Dict[str, str]): dictionary of sequences by name
  • input_bam_cache (BamCache): bam cache to be used as a template for reading the alignments
  • reference_genome (ReferenceGenome): the reference genome
  • aligner (str): the name of the aligner to be used
  • aligner_reference (str): path to the aligner reference file
  • aligner_output_file
  • aligner_fa_input_file
  • aligner_output_log
  • blat_limit_top_aln
  • blat_min_identity
  • clean_files

select_contig_alignments()

standardize/simplify reads and filter bad/irrelevant alignments adds the contig alignments to the contigs

def select_contig_alignments(evidence, reads_by_query):

Args

  • evidence
  • reads_by_query