mavis.bam.cigar
holds methods related to processing cigar tuples. Cigar tuples are generally an iterable list of tuples where the first element in each tuple is the CIGAR value (i.e. 1 for an insertion), and the second value is the frequency
EVENT_STATES
EVENT_STATES = {CIGAR.D, CIGAR.I, CIGAR.X}
ALIGNED_STATES
ALIGNED_STATES = {CIGAR.M, CIGAR.X, CIGAR.EQ}
REFERENCE_ALIGNED_STATES
REFERENCE_ALIGNED_STATES = ALIGNED_STATES | {CIGAR.D, CIGAR.N}
QUERY_ALIGNED_STATES
QUERY_ALIGNED_STATES = ALIGNED_STATES | {CIGAR.I, CIGAR.S}
CLIPPING_STATE
CLIPPING_STATE = {CIGAR.S, CIGAR.H}
mavis.bam.cigar.recompute_cigar_mismatch()
for cigar tuples where M is used, recompute to replace with X/= for increased utility and specificity
def recompute_cigar_mismatch(read, ref):
Args
- read (
pysam.AlignedSegment
): the input read - ref (
str
): the reference sequence
Returns
List[Tuple[int,int]]
: the cigar tuple
mavis.bam.cigar.longest_fuzzy_match()
computes the longest sequence of exact matches allowing for 'x' event interrupts
def longest_fuzzy_match(cigar, max_fuzzy_interupt=1):
Args
- cigar: cigar tuples
- max_fuzzy_interupt (
int
): number of mismatches allowed
mavis.bam.cigar.longest_exact_match()
returns the longest consecutive exact match
def longest_exact_match(cigar):
Args
- cigar (
List[Tuple[int,int]]
): the cigar tuples
mavis.bam.cigar.score()
scoring based on sw alignment properties with gap extension penalties
def score(cigar, **kwargs):
Args
- cigar (
List[Tuple[mavis.constants.CIGAR,int]]
): list of cigar tuple values
Returns
int
: the score value
mavis.bam.cigar.match_percent()
calculates the percent of aligned bases (matches or mismatches) that are matches
def match_percent(cigar):
Args
- cigar
mavis.bam.cigar.join()
given a number of cigar lists, joins them and merges any consecutive tuples with the same cigar value
def join(*pos):
Examples
>>> join([(1, 1), (4, 7)], [(4, 3), (2, 4)])
[(1, 1), (4, 10), (2, 4)]
mavis.bam.cigar.extend_softclipping()
given some input cigar, extends softclipping if there are mismatches/insertions/deletions close to the end of the aligned portion. The stopping point is defined by the min_exact_to_stop_softclipping parameter. this function will throw an error if there is no exact match aligned portion to signal stop
def extend_softclipping(cigar, min_exact_to_stop_softclipping):
Args
- cigar
- min_exact_to_stop_softclipping (
int
): number of exact matches to terminate extension
Returns
Tuple[List[Tuple[mavis.constants.CIGAR,int]], int]
: new cigar list and shift from the original start position
mavis.bam.cigar.compute()
given a ref and alt sequence compute the cigar string representing the alt
returns the cigar tuples along with the start position of the alt relative to the ref
def compute(ref, alt, force_softclipping=True, min_exact_to_stop_softclipping=6):
Args
- ref
- alt
- force_softclipping
- min_exact_to_stop_softclipping
mavis.bam.cigar.convert_for_igv()
igv does not support the extended CIGAR values for match v mismatch
def convert_for_igv(cigar):
Args
- cigar
Examples
>>> convert_for_igv([(7, 4), (8, 1), (7, 5)])
[(0, 10)]
mavis.bam.cigar.alignment_matches()
counts the number of aligned bases irrespective of match/mismatch this is equivalent to counting all CIGAR.M
def alignment_matches(cigar):
Args
- cigar
mavis.bam.cigar.merge_indels()
For a given cigar tuple, merges adjacent insertions/deletions
def merge_indels(cigar):
Args
- cigar
Examples
>>> merge_indels([(CIGAR.EQ, 10), (CIGAR.I, 3), (CIGAR.D, 4), (CIGAR.I, 2), (CIGAR.D, 2), (CIGAR.EQ, 10)])
[(CIGAR.EQ, 10), (CIGAR.I, 5), (CIGAR.D, 6), (CIGAR.EQ, 10)]
mavis.bam.cigar.hgvs_standardize_cigar()
extend alignments as long as matches are possible. call insertions before deletions
def hgvs_standardize_cigar(read, reference_seq):
Args
- read
- reference_seq
mavis.bam.cigar.convert_string_to_cigar()
Given a cigar string, converts it to the appropriate cigar tuple
def convert_string_to_cigar(string):
Args
- string
Examples
>>> convert_string_to_cigar('8M2I1D9X')
[(CIGAR.M, 8), (CIGAR.I, 2), (CIGAR.D, 1), (CIGAR.X, 9)]
mavis.bam.cigar.merge_internal_events()
merges events (insertions, deletions, mismatches) within a cigar if they are between exact matches on either side (anchors) and separated by less exact matches than the given parameter
does not merge two mismatches, must contain a deletion/insertion
def merge_internal_events(cigar, inner_anchor=10, outer_anchor=10):
Args
- cigar (
List
): a list of tuples of cigar states and counts - inner_anchor (
int
): minimum number of consecutive exact matches separating events - outer_anchor (
int
): minimum consecutively aligned exact matches to anchor an end for merging
Returns
List
: new list of cigar tuples with merged events
Examples
>>> merge_internal_events([(CIGAR.EQ, 10), (CIGAR.X, 1), (CIGAR.EQ, 2), (CIGAR.D, 1), (CIGAR.EQ, 10)])
[(CIGAR.EQ, 10), (CIGAR.I, 3), (CIGAR.D, 4), (CIGAR.EQ, 10)]