mavis/bam/cache
class BamCache
caches reads by name to facilitate getting read mates without jumping around the file if we've already read that section
Attributes
- fh (
pysam.AlignmentFile
) - stranded (
bool
) - cache (
Dict
)
BamCache.__init__()
def __init__(self, bamfile: Union[pysam.AlignmentFile, str], stranded: bool = False):
Args
- bamfile (
Union[pysam.AlignmentFile, str]
): path to the input bam file - stranded (
bool
)
BamCache.valid_chr()
checks if a reference name exists in the bam file header
def valid_chr(self, chrom: str) -> bool:
Args
- chrom (
str
)
Returns
bool
BamCache.has_read()
checks if a read query name exists in the current cache
def has_read(self, read: pysam.AlignedSegment) -> bool:
Args
- read (
pysam.AlignedSegment
)
Returns
bool
BamCache.fetch_from_bins()
wrapper around the fetch method, returns a list to avoid errors with changing the file pointer position from within the loop. Also caches reads if requested and can return a limited read number
def fetch_from_bins(
self,
input_chrom: str,
start: int,
stop: int,
read_limit: int = 10000,
cache: bool = False,
sample_bins: int = 3,
cache_if: Callable = lambda x: True,
min_bin_size: int = 10,
filter_if: Callable = lambda x: False,
) -> Set[pysam.AlignedSegment]:
Args
- input_chrom (
str
): the chromosome - start (
int
): the start position - stop (
int
): the end position - read_limit (
int
): the maximum number of reads to parse - cache (
bool
): flag to store reads - sample_bins (
int
): number of bins to split the region into - cache_if (
Callable
): function to check to against a read to determine if it should be cached - min_bin_size (
int
) - filter_if (
Callable
)
Returns
Set[pysam.AlignedSegment]
: set of reads gathered from the region
BamCache.close()
close the bam file handle
def close(self):