mavis/annotate/file_io
module which holds all functions relating to loading reference files
class ReferenceFile
Attributes
- LOAD_FUNCTIONS (
Dict[str, Optional[Callable]]
)
ReferenceFile.load()
load (or return) the contents of a reference file and add it to the cache if enabled
def load(self, ignore_cache=False, verbose=True):
Args
- ignore_cache
- verbose
load_masking_regions()
reads a file of regions. The expect input format for the file is tab-delimited and the header should contain the following columns - chr: the chromosome - start: start of the region, 1-based inclusive - end: end of the region, 1-based inclusive - name: the name/label of the region For example: .. code-block:: text
chr start end name
chr20 25600000 27500000 centromere
def load_masking_regions(*filepaths: str) -> Dict[str, List[BioInterval]]:
Returns
- Dict[
str
, List[BioInterval]]: a dictionary keyed by chromosome name with values of lists of regions on the chromosome
load_known_sv()
loads a standard MAVIS or BED file input to a list of known breakpoints.
Standard BED file requirements: reads a file of regions. The expect input format for the file is tab-delimited and the header should contain the following columns
- chr: the chromosome
- start: start of the region, 1-based inclusive
- end: end of the region, 1-based inclusive
- name: the name/label of the region
For example:
.. code-block:: text
chr start end name
chr20 25600000 27500000 centromere
def load_known_sv(*filepaths: str) -> Dict[str, List["BreakpointPair"]]:
Returns
- Dict[
str
, List[BreakpointPair]]: {BreakpointPair}}
load_annotations()
loads gene models from an input file. Expects a tabbed or json file.
def load_annotations(
*filepaths: str,
reference_genome: Optional[ReferenceGenome] = None,
best_transcripts_only: bool = False,
) -> Dict[str, List[Gene]]:
Returns
- Dict[
str
, List[Gene]]: lists of genes keyed by chromosome name
parse_annotations_json()
parses a json of annotation information into annotation objects
def parse_annotations_json(
data,
reference_genome: Optional[ReferenceGenome] = None,
best_transcripts_only=False,
) -> ReferenceAnnotations:
Args
- data
- reference_genome (Optional[ReferenceGenome])
- best_transcripts_only
Returns
load_templates()
primarily useful if template drawings are required and is not necessary otherwise assumes the input file is 0-indexed with [start,end) style. Columns are expected in the following order, tab-delimited. A header should not be given
- name
- start
- end
- band_name
- giemsa_stain
for example
.. code-block:: text
chr1 0 2300000 p36.33 gneg chr1 2300000 5400000 p36.32 gpos25
def load_templates(*filepaths: str) -> Dict[str, Template]:
Returns
- Dict[
str
, Template]: templates loaded