cache module

class mavis.bam.cache.BamCache(bamfile, stranded=False)[source]

Bases: object

caches reads by name to facilitate getting read mates without jumping around the file if we’ve already read that section

Parameters:bamfile (str) – path to the input bam file
add_read(read)[source]
Parameters:read (pysam.AlignedSegment) – the read to add to the cache
close()[source]

close the bam file handle

fetch(input_chrom, start, stop, limit=10000, cache_if=<function BamCache.<lambda>>, filter_if=<function BamCache.<lambda>>, stop_on_cached_read=False)[source]
Parameters:
  • input_chrom (str) – chromosome name
  • start (int) – start position
  • end (int) – end position
  • limit (int) – maximum number of reads to fetch
  • cache_if (function) – if returns True then the read is added to the cache
  • filter_if (function) – if returns True then the read is not returned as part of the result
  • stop_on_cached_read (bool) – stop reading at the first read found that is already in the cache

Note

the cache_if and filter_if functions must be any function that takes a read as input and returns a boolean

Returns:a set of reads which overlap the input region
Return type:set of pysam.AlignedSegment
fetch_from_bins(input_chrom, start, stop, read_limit=10000, cache=False, sample_bins=3, cache_if=<function BamCache.<lambda>>, min_bin_size=10, filter_if=<function BamCache.<lambda>>)[source]

wrapper around the fetch method, returns a list to avoid errors with changing the file pointer position from within the loop. Also caches reads if requested and can return a limited read number

Parameters:
  • chrom (str) – the chromosome
  • start (int) – the start position
  • stop (int) – the end position
  • read_limit (int) – the maximum number of reads to parse
  • cache (bool) – flag to store reads
  • sample_bins (int) – number of bins to split the region into
  • cache_if (callable) – function to check to against a read to determine if it should be cached
  • bin_gap_size (int) – gap between the bins for the fetch area
Returns:

set of reads gathered from the region

Return type:

set of pysam.AlignedSegment

get_mate(read, primary_only=True, allow_file_access=False)[source]
Parameters:
  • read (pysam.AlignedSegment) – the read
  • primary_only (bool) – ignore secondary alignments
  • allow_file_access (bool) – determines if the bam can be accessed to try to find the mate
Returns:

list of mates of the input read

Return type:

list of pysam.AlignedSegment

get_read_reference_name(read)[source]
Parameters:read (pysam.AlignedSegment) – the read we want the chromosome name for
Returns:the name of the chromosome
Return type:str
has_read(read)[source]

checks if a read query name exists in the current cache

reference_id(chrom)[source]
Parameters:chrom (str) – the chromosome/reference name
Returns:the reference id corresponding to input chromosome name
Return type:int
valid_chr(chrom)[source]

checks if a reference name exists in the bam file header