stats module

class mavis.bam.stats.BamStats(median_fragment_size, stdev_fragment_size, read_length)[source]

Bases: object

add_stranded_information(strand_hist)[source]
class mavis.bam.stats.Histogram[source]

Bases: dict

__add__(other)[source]

sum two histograms and return the result as a new histogram

Example

>>> x, y = Histogram(), Histogram()
>>> x.add('item')
>>> y.add('item')
>>> x + y
{'item': 2}
add(item, freq=1)[source]

add a key to the histogram with a default frequency of 1

distribution_stderr(median, fraction, error_function=<function Histogram.<lambda>>)[source]
median()[source]

flattens the histogram to compute the median value

mavis.bam.stats.compute_genome_bam_stats(bam_file_handle, sample_bin_size, sample_size, min_mapping_quality=1, sample_cap=10000, distribution_fraction=0.99)[source]

computes various statistical measures relating the input bam file

Parameters:
  • bam_file_handle (pysam.AlignmentFile) – the input bam file handle
  • sample_bin_size (int) – how large to make the sample bin (in bp)
  • sample_size (int) – the number of genes to compute stats over
  • log (callable) – outputs logging information
  • min_mapping_quality (int) – the minimum mapping quality for a read to be used
  • sample_cap (int) – maximum number of reads to collect for any given sample region
  • distribution_fraction (float) – the proportion of the distribution to use in computing stdev
Returns:

the fragment size median, stdev and the read length in a object

Return type:

BamStats

mavis.bam.stats.compute_transcriptome_bam_stats(bam_cache, annotations, sample_size, min_mapping_quality=1, stranded=True, sample_cap=10000, distribution_fraction=0.97)[source]

computes various statistical measures relating the input bam file

Parameters:
  • bam_file_handle (BamCache) – the input bam file handle
  • annotations (object) – see load_reference_genes()
  • sample_size (int) – the number of genes to compute stats over
  • log (callable) – outputs logging information
  • min_mapping_quality (int) – the minimum mapping quality for a read to be used
  • stranded (bool) – if True then reads must match the gene strand
  • sample_cap (int) – maximum number of reads to collect for any given sample region
  • distribution_fraction (float) – the proportion of the distribution to use in computing stdev
Returns:

the fragment size median, stdev and the read length in a object

Return type:

BamStats