mavis/bam/cache
class BamCache
caches reads by name to facilitate getting read mates without jumping around the file if we've already read that section
Attributes
- fh (
pysam.AlignmentFile) - stranded (
bool) - cache (
Dict)
BamCache.__init__()
def __init__(self, bamfile: Union[pysam.AlignmentFile, str], stranded: bool = False):
Args
- bamfile (
Union[pysam.AlignmentFile, str]): path to the input bam file - stranded (
bool)
BamCache.valid_chr()
checks if a reference name exists in the bam file header
def valid_chr(self, chrom: str) -> bool:
Args
- chrom (
str)
Returns
bool
BamCache.has_read()
checks if a read query name exists in the current cache
def has_read(self, read: pysam.AlignedSegment) -> bool:
Args
- read (
pysam.AlignedSegment)
Returns
bool
BamCache.fetch_from_bins()
wrapper around the fetch method, returns a list to avoid errors with changing the file pointer position from within the loop. Also caches reads if requested and can return a limited read number
def fetch_from_bins(
self,
input_chrom: str,
start: int,
stop: int,
read_limit: int = 10000,
cache: bool = False,
sample_bins: int = 3,
cache_if: Callable = lambda x: True,
min_bin_size: int = 10,
filter_if: Callable = lambda x: False,
) -> Set[pysam.AlignedSegment]:
Args
- input_chrom (
str): the chromosome - start (
int): the start position - stop (
int): the end position - read_limit (
int): the maximum number of reads to parse - cache (
bool): flag to store reads - sample_bins (
int): number of bins to split the region into - cache_if (
Callable): function to check to against a read to determine if it should be cached - min_bin_size (
int) - filter_if (
Callable)
Returns
Set[pysam.AlignedSegment]: set of reads gathered from the region
BamCache.close()
close the bam file handle
def close(self):