mavis/annotate/file_io

module which holds all functions relating to loading reference files

class ReferenceFile

Attributes

LOAD_FUNCTIONS (Dict[str, Optional[Callable]])

ReferenceFile.load()

load (or return) the contents of a reference file and add it to the cache if enabled

def load(self, ignore_cache=False, verbose=True):

Args

ignore_cache
verbose

load_masking_regions()

reads a file of regions. The expect input format for the file is tab-delimited and the header should contain the following columns - chr: the chromosome - start: start of the region, 1-based inclusive - end: end of the region, 1-based inclusive - name: the name/label of the region For example: .. code-block:: text

chr start end name

chr20 25600000 27500000 centromere

def load_masking_regions(*filepaths: str) -> Dict[str, List[BioInterval]]:

Returns

Dict[str, List[BioInterval]]: a dictionary keyed by chromosome name with values of lists of regions on the chromosome

load_known_sv()

loads a standard MAVIS or BED file input to a list of known breakpoints.

Standard BED file requirements: reads a file of regions. The expect input format for the file is tab-delimited and the header should contain the following columns

chr: the chromosome
start: start of the region, 1-based inclusive
end: end of the region, 1-based inclusive
name: the name/label of the region

For example:

.. code-block:: text

chr start end name

chr20 25600000 27500000 centromere

def load_known_sv(*filepaths: str) -> Dict[str, List["BreakpointPair"]]:

Returns

Dict[str, List[BreakpointPair]]: {BreakpointPair}}

load_annotations()

loads gene models from an input file. Expects a tabbed or json file.

def load_annotations(
    *filepaths: str,
    reference_genome: Optional[ReferenceGenome] = None,
    best_transcripts_only: bool = False,
) -> Dict[str, List[Gene]]:

Returns

Dict[str, List[Gene]]: lists of genes keyed by chromosome name

parse_annotations_json()

parses a json of annotation information into annotation objects

def parse_annotations_json(
    data,
    reference_genome: Optional[ReferenceGenome] = None,
    best_transcripts_only=False,
) -> ReferenceAnnotations:

Args

data
reference_genome (Optional[ReferenceGenome])
best_transcripts_only

Returns

ReferenceAnnotations

load_templates()

primarily useful if template drawings are required and is not necessary otherwise assumes the input file is 0-indexed with [start,end) style. Columns are expected in the following order, tab-delimited. A header should not be given

name
start
end
band_name
giemsa_stain

for example

.. code-block:: text

chr1 0 2300000 p36.33 gneg chr1 2300000 5400000 p36.32 gpos25

def load_templates(*filepaths: str) -> Dict[str, Template]:

Returns

Dict[str, Template]: templates loaded