Skip to content

mavis/bam/cache

class BamCache

caches reads by name to facilitate getting read mates without jumping around the file if we've already read that section

Attributes

  • fh (pysam.AlignmentFile)
  • stranded (bool)
  • cache (Dict)

BamCache.__init__()

def __init__(self, bamfile: Union[pysam.AlignmentFile, str], stranded: bool = False):

Args

  • bamfile (Union[pysam.AlignmentFile, str]): path to the input bam file
  • stranded (bool)

BamCache.valid_chr()

checks if a reference name exists in the bam file header

def valid_chr(self, chrom: str) -> bool:

Args

  • chrom (str)

Returns

  • bool

BamCache.has_read()

checks if a read query name exists in the current cache

def has_read(self, read: pysam.AlignedSegment) -> bool:

Args

  • read (pysam.AlignedSegment)

Returns

  • bool

BamCache.fetch_from_bins()

wrapper around the fetch method, returns a list to avoid errors with changing the file pointer position from within the loop. Also caches reads if requested and can return a limited read number

def fetch_from_bins(
    self,
    input_chrom: str,
    start: int,
    stop: int,
    read_limit: int = 10000,
    cache: bool = False,
    sample_bins: int = 3,
    cache_if: Callable = lambda x: True,
    min_bin_size: int = 10,
    filter_if: Callable = lambda x: False,
) -> Set[pysam.AlignedSegment]:

Args

  • input_chrom (str): the chromosome
  • start (int): the start position
  • stop (int): the end position
  • read_limit (int): the maximum number of reads to parse
  • cache (bool): flag to store reads
  • sample_bins (int): number of bins to split the region into
  • cache_if (Callable): function to check to against a read to determine if it should be cached
  • min_bin_size (int)
  • filter_if (Callable)

Returns

  • Set[pysam.AlignedSegment]: set of reads gathered from the region

BamCache.close()

close the bam file handle

def close(self):