Skip to content

mavis.bam.cigar

holds methods related to processing cigar tuples. Cigar tuples are generally an iterable list of tuples where the first element in each tuple is the CIGAR value (i.e. 1 for an insertion), and the second value is the frequency

EVENT_STATES

EVENT_STATES = {CIGAR.D, CIGAR.I, CIGAR.X}

ALIGNED_STATES

ALIGNED_STATES = {CIGAR.M, CIGAR.X, CIGAR.EQ}

REFERENCE_ALIGNED_STATES

REFERENCE_ALIGNED_STATES = ALIGNED_STATES | {CIGAR.D, CIGAR.N}

QUERY_ALIGNED_STATES

QUERY_ALIGNED_STATES = ALIGNED_STATES | {CIGAR.I, CIGAR.S}

CLIPPING_STATE

CLIPPING_STATE = {CIGAR.S, CIGAR.H}

mavis.bam.cigar.recompute_cigar_mismatch()

for cigar tuples where M is used, recompute to replace with X/= for increased utility and specificity

def recompute_cigar_mismatch(read, ref):

Args

  • read (pysam.AlignedSegment): the input read
  • ref (str): the reference sequence

Returns

  • List[Tuple[int,int]]: the cigar tuple

mavis.bam.cigar.longest_fuzzy_match()

computes the longest sequence of exact matches allowing for 'x' event interrupts

def longest_fuzzy_match(cigar, max_fuzzy_interupt=1):

Args

  • cigar: cigar tuples
  • max_fuzzy_interupt (int): number of mismatches allowed

mavis.bam.cigar.longest_exact_match()

returns the longest consecutive exact match

def longest_exact_match(cigar):

Args

  • cigar (List[Tuple[int,int]]): the cigar tuples

mavis.bam.cigar.score()

scoring based on sw alignment properties with gap extension penalties

def score(cigar, **kwargs):

Args

  • cigar (List[Tuple[mavis.constants.CIGAR,int]]): list of cigar tuple values

Returns

  • int: the score value

mavis.bam.cigar.match_percent()

calculates the percent of aligned bases (matches or mismatches) that are matches

def match_percent(cigar):

Args

  • cigar

mavis.bam.cigar.join()

given a number of cigar lists, joins them and merges any consecutive tuples with the same cigar value

def join(*pos):

Examples

>>> join([(1, 1), (4, 7)], [(4, 3), (2, 4)])
[(1, 1), (4, 10), (2, 4)]

mavis.bam.cigar.extend_softclipping()

given some input cigar, extends softclipping if there are mismatches/insertions/deletions close to the end of the aligned portion. The stopping point is defined by the min_exact_to_stop_softclipping parameter. this function will throw an error if there is no exact match aligned portion to signal stop

def extend_softclipping(cigar, min_exact_to_stop_softclipping):

Args

  • cigar
  • min_exact_to_stop_softclipping (int): number of exact matches to terminate extension

Returns

  • Tuple[List[Tuple[mavis.constants.CIGAR,int]], int]: new cigar list and shift from the original start position

mavis.bam.cigar.compute()

given a ref and alt sequence compute the cigar string representing the alt

returns the cigar tuples along with the start position of the alt relative to the ref

def compute(ref, alt, force_softclipping=True, min_exact_to_stop_softclipping=6):

Args

  • ref
  • alt
  • force_softclipping
  • min_exact_to_stop_softclipping

mavis.bam.cigar.convert_for_igv()

igv does not support the extended CIGAR values for match v mismatch

def convert_for_igv(cigar):

Args

  • cigar

Examples

>>> convert_for_igv([(7, 4), (8, 1), (7, 5)])
[(0, 10)]

mavis.bam.cigar.alignment_matches()

counts the number of aligned bases irrespective of match/mismatch this is equivalent to counting all CIGAR.M

def alignment_matches(cigar):

Args

  • cigar

mavis.bam.cigar.merge_indels()

For a given cigar tuple, merges adjacent insertions/deletions

def merge_indels(cigar):

Args

  • cigar

Examples

>>> merge_indels([(CIGAR.EQ, 10), (CIGAR.I, 3), (CIGAR.D, 4), (CIGAR.I, 2), (CIGAR.D, 2), (CIGAR.EQ, 10)])
[(CIGAR.EQ, 10), (CIGAR.I, 5), (CIGAR.D, 6), (CIGAR.EQ, 10)]

mavis.bam.cigar.hgvs_standardize_cigar()

extend alignments as long as matches are possible. call insertions before deletions

def hgvs_standardize_cigar(read, reference_seq):

Args

  • read
  • reference_seq

mavis.bam.cigar.convert_string_to_cigar()

Given a cigar string, converts it to the appropriate cigar tuple

def convert_string_to_cigar(string):

Args

  • string

Examples

>>> convert_string_to_cigar('8M2I1D9X')
[(CIGAR.M, 8), (CIGAR.I, 2), (CIGAR.D, 1), (CIGAR.X, 9)]

mavis.bam.cigar.merge_internal_events()

merges events (insertions, deletions, mismatches) within a cigar if they are between exact matches on either side (anchors) and separated by less exact matches than the given parameter

does not merge two mismatches, must contain a deletion/insertion

def merge_internal_events(cigar, inner_anchor=10, outer_anchor=10):

Args

  • cigar (List): a list of tuples of cigar states and counts
  • inner_anchor (int): minimum number of consecutive exact matches separating events
  • outer_anchor (int): minimum consecutively aligned exact matches to anchor an end for merging

Returns

  • List: new list of cigar tuples with merged events

Examples

>>> merge_internal_events([(CIGAR.EQ, 10), (CIGAR.X, 1), (CIGAR.EQ, 2), (CIGAR.D, 1), (CIGAR.EQ, 10)])
[(CIGAR.EQ, 10), (CIGAR.I, 3), (CIGAR.D, 4), (CIGAR.EQ, 10)]