Skip to content

mavis/bam/cigar

holds methods related to processing cigar tuples. Cigar tuples are generally an iterable list of tuples where the first element in each tuple is the CIGAR value (i.e. 1 for an insertion), and the second value is the frequency

EVENT_STATES

EVENT_STATES = {CIGAR.D, CIGAR.I, CIGAR.X}

ALIGNED_STATES

ALIGNED_STATES = {CIGAR.M, CIGAR.X, CIGAR.EQ}

REFERENCE_ALIGNED_STATES

REFERENCE_ALIGNED_STATES = ALIGNED_STATES | {CIGAR.D, CIGAR.N}

QUERY_ALIGNED_STATES

QUERY_ALIGNED_STATES = ALIGNED_STATES | {CIGAR.I, CIGAR.S}

CLIPPING_STATE

CLIPPING_STATE = {CIGAR.S, CIGAR.H}

recompute_cigar_mismatch()

for cigar tuples where M is used, recompute to replace with X/= for increased utility and specificity

def recompute_cigar_mismatch(read: pysam.AlignedSegment, ref: str) -> CigarTuples:

Args

  • read (pysam.AlignedSegment): the input read
  • ref (str): the reference sequence

Returns

longest_fuzzy_match()

computes the longest sequence of exact matches allowing for 'x' event interrupts

def longest_fuzzy_match(cigar: CigarTuples, max_fuzzy_interupt: int = 1) -> int:

Args

  • cigar (CigarTuples): cigar tuples
  • max_fuzzy_interupt (int): number of mismatches allowed

Returns

  • int

longest_exact_match()

returns the longest consecutive exact match

def longest_exact_match(cigar: CigarTuples) -> int:

Args

Returns

  • int

score()

scoring based on sw alignment properties with gap extension penalties

def score(cigar: CigarTuples, **kwargs) -> int:

Args

Returns

  • int: the score value

match_percent()

calculates the percent of aligned bases (matches or mismatches) that are matches

def match_percent(cigar: CigarTuples) -> float:

Args

Returns

  • float

join()

given a number of cigar lists, joins them and merges any consecutive tuples with the same cigar value

def join(*pos):

Examples

>>> join([(1, 1), (4, 7)], [(4, 3), (2, 4)])
[(1, 1), (4, 10), (2, 4)]

extend_softclipping()

given some input cigar, extends softclipping if there are mismatches/insertions/deletions close to the end of the aligned portion. The stopping point is defined by the min_exact_to_stop_softclipping parameter. this function will throw an error if there is no exact match aligned portion to signal stop

def extend_softclipping(
    cigar: CigarTuples, min_exact_to_stop_softclipping: int
) -> Tuple[CigarTuples, int]:

Args

  • cigar (CigarTuples)
  • min_exact_to_stop_softclipping (int): number of exact matches to terminate extension

Returns

  • Tuple[CigarTuples, int]: new cigar list and shift from the original start position

compute()

given a ref and alt sequence compute the cigar string representing the alt

returns the cigar tuples along with the start position of the alt relative to the ref

def compute(
    ref: str, alt: str, force_softclipping: bool = True, min_exact_to_stop_softclipping: int = 6
) -> Tuple[CigarTuples, int]:

Args

  • ref (str)
  • alt (str)
  • force_softclipping (bool)
  • min_exact_to_stop_softclipping (int)

Returns

convert_for_igv()

igv does not support the extended CIGAR values for match v mismatch

def convert_for_igv(cigar: CigarTuples) -> CigarTuples:

Args

Returns

Examples

>>> convert_for_igv([(7, 4), (8, 1), (7, 5)])
[(0, 10)]

alignment_matches()

counts the number of aligned bases irrespective of match/mismatch this is equivalent to counting all CIGAR.M

def alignment_matches(cigar: CigarTuples) -> int:

Args

Returns

  • int

merge_indels()

For a given cigar tuple, merges adjacent insertions/deletions

def merge_indels(cigar: CigarTuples) -> CigarTuples:

Args

Returns

Examples

>>> merge_indels([(CIGAR.EQ, 10), (CIGAR.I, 3), (CIGAR.D, 4), (CIGAR.I, 2), (CIGAR.D, 2), (CIGAR.EQ, 10)])
[(CIGAR.EQ, 10), (CIGAR.I, 5), (CIGAR.D, 6), (CIGAR.EQ, 10)]

hgvs_standardize_cigar()

extend alignments as long as matches are possible. call insertions before deletions

def hgvs_standardize_cigar(read: pysam.AlignedSegment, reference_seq: str) -> CigarTuples:

Args

  • read (pysam.AlignedSegment)
  • reference_seq (str)

Returns

convert_string_to_cigar()

Given a cigar string, converts it to the appropriate cigar tuple

def convert_string_to_cigar(string: str) -> CigarTuples:

Args

  • string (str)

Returns

Examples

>>> convert_string_to_cigar('8M2I1D9X')
[(CIGAR.M, 8), (CIGAR.I, 2), (CIGAR.D, 1), (CIGAR.X, 9)]

merge_internal_events()

merges events (insertions, deletions, mismatches) within a cigar if they are between exact matches on either side (anchors) and separated by less exact matches than the given parameter

does not merge two mismatches, must contain a deletion/insertion

def merge_internal_events(
    cigar: CigarTuples, inner_anchor: int = 10, outer_anchor: int = 10
) -> CigarTuples:

Args

  • cigar (CigarTuples): a list of tuples of cigar states and counts
  • inner_anchor (int): minimum number of consecutive exact matches separating events
  • outer_anchor (int): minimum consecutively aligned exact matches to anchor an end for merging

Returns

  • CigarTuples: new list of cigar tuples with merged events

Examples

>>> merge_internal_events([(CIGAR.EQ, 10), (CIGAR.X, 1), (CIGAR.EQ, 2), (CIGAR.D, 1), (CIGAR.EQ, 10)])
[(CIGAR.EQ, 10), (CIGAR.I, 3), (CIGAR.D, 4), (CIGAR.EQ, 10)]