read module

class mavis.bam.read.SamRead(reference_name=None, next_reference_name=None, alignment_score=None, **kwargs)[source]

Bases: pysam.libcalignedsegment.AlignedSegment

Subclass to extend the pysam.AlignedSegment class adding some utility methods and convenient representations

Allows next_reference_name and reference_name to be set directly so that is does not depend on a bam header

alignment_id()[source]
classmethod copy(pysamread)[source]
classmethod copy_onto(pysamread, copyread=None)[source]
deletion_sequences(reference_genome)[source]

returns the reference sequences for all deletions

insertion_sequences()[source]

returns the inserted sequence for all insertions

key()[source]

uses a stored _key attribute, if available. This is to avoid the hash changing if the reference start (for example) is changed but also allow this attribute to be used and calculated for non SamRead objects

This way to change the hash behaviour the user must be explicit and use the set_key method

next_reference_name

reference name of the mate/next read (None if no AlignmentFile is associated)

reference_name

reference name

set_key()[source]

Warning

Using this method sets the _key attribute which is used for comparison and hashing. If you alter this attribute while items are in a hashed state it may lead to unexpected results such as duplicates of a single object within a set

mavis.bam.read.breakpoint_pos(read, orient='?')[source]

assumes the breakpoint is the position following softclipping on the side with more softclipping (unless and orientation has been specified)

Parameters:
  • read (AlignedSegment) – the read object
  • orient (ORIENT) – the orientation
Returns:

the position of the breakpoint in the input read

Return type:

int

mavis.bam.read.calculate_alignment_score(read, consec_bonus=1)[source]

calculates a score for comparing alignments

Parameters:read (pysam.AlignedSegment) – the input read
Returns:the score
Return type:float
mavis.bam.read.convert_events_to_softclipping(read, orientation, max_event_size, min_anchor_size=None)[source]

given an alignment, simplifies the alignment by grouping everything past the first anchor and including the first event considered too large and unaligning them turning them into softclipping

mavis.bam.read.map_ref_range_to_query_range(read, ref_range)[source]
Parameters:
Returns:

1-based inclusive range

Return type:

Interval

mavis.bam.read.nsb_align(ref, seq, weight_of_score=0.5, min_overlap_percent=1, min_match=0, min_consecutive_match=1, scoring_function=<function calculate_alignment_score>)[source]

given some reference string and a smaller sequence string computes the best non-space-breaking alignment i.e. an alignment that does not allow for indels (straight-match). Positions in the aligned segments are given relative to the length of the reference sequence (1-based)

Parameters:
  • ref (str) – the reference sequence
  • seq (str) – the sequence being aligned
  • weight_of_score (float) – when scoring alignments this determines the amount of weight to place on the cigar match. Should be a number between 0 and 1
  • min_overlap_percent (float) – the minimum amount of overlap of the input sequence to the reference should be a number between 0 and 1
  • min_match (float) – the minimum number of matches compared to total
  • scoring_function (callable) – any function that will take a read as input and return a float used in comparing alignments to choose the best alignment
Returns:

list of aligned segments

Return type:

list of AlignedSegment

Note

using a higher min_match may improve performance as low quality alignments are rejected more quickly. However this may also result in no match being returned when there is no high quality match to be found.

mavis.bam.read.orientation_supports_type(read, event_type)[source]

checks if the orientation is compatible with the type of event

Parameters:
  • read (AlignedSegment) – a read from the pair
  • event_type (SVTYPE) – the type of event to check
Returns:

  • True - the read pair is in the correct orientation for this event type
  • False - the read is not in the correct orientation

Return type:

bool

mavis.bam.read.pileup(reads, filter_func=None)[source]

For a given set of reads generate a pileup of all reads (excluding those for which the filter_func returns True)

Parameters:
  • reads (iterable of pysam.AlignedSegment) – reads to pileup
  • filter_func (callable) – function which takes in a read and returns True if it should be ignored and False otherwise
Returns:

tuples of genomic position and read count at that position

Return type:

iterable of tuple of int and int

Note

returns positions using 1-based indexing

mavis.bam.read.read_pair_type(read)[source]

assumptions based on illumina pairs: only 4 possible combinations

Parameters:read (AlignedSegment) – the input read
Returns:the type of input read pair
Return type:READ_PAIR_TYPE
Raises:NotImplementedError – for any read that does not fall into the four expected configurations (see below)
++++> <---- is LR same-strand
++++> ++++> is LL opposite
<---- <---- is RR opposite
<---- ++++> is RL same-strand
mavis.bam.read.sequence_complexity(seq)[source]

basic measure of sequence complexity

mavis.bam.read.sequenced_strand(read, strand_determining_read=2)[source]

determines the strand that was sequenced

Parameters:
  • read (AlignedSegment) – the read being used to determine the strand
  • strand_determining_read (int) – which read in the read pair is the same as the sequenced strand
Returns:

the strand that was sequenced

Return type:

STRAND

Raises:

ValueError – if strand_determining_read is not 1 or 2

Warning

if the input pair is unstranded the information will not be representative of the strand sequenced since the assumed convention is not followed