Skip to content

mavis/constants

module responsible for small utility functions and constants used throughout the structural_variant package

PROGNAME

PROGNAME: str = 'mavis'

EXIT_OK

EXIT_OK: int = 0

EXIT_ERROR

EXIT_ERROR: int = 1

EXIT_INCOMPLETE

EXIT_INCOMPLETE: int = 2

COMPLETE_STAMP

COMPLETE_STAMP: str = 'MAVIS.COMPLETE'

CODON_SIZE

CODON_SIZE: int = 3

GAP

GAP: str = '-'

NA_MAPPING_QUALITY

NA_MAPPING_QUALITY: int = 255

DNA_ALPHABET

DNA_ALPHABET = alphabet = Gapped(ambiguous_dna, '-')

alphabet

DNA_ALPHABET = alphabet = Gapped(ambiguous_dna, '-')

DNA_ALPHABET.match

DNA_ALPHABET.match = lambda x, y: _match_ambiguous_dna(x, y)

START_AA

START_AA: str = 'M'

STOP_AA

STOP_AA: str = '*'

INTEGER_COLUMNS

INTEGER_COLUMNS = {
    COLUMNS.break1_position_end,
    COLUMNS.break1_position_start,
    COLUMNS.break2_position_end,
    COLUMNS.break2_position_start,

FLOAT_COLUMNS

FLOAT_COLUMNS = {
    COLUMNS.break1_ewindow_count,
    COLUMNS.break1_split_reads_forced,
    COLUMNS.break1_split_reads,
    COLUMNS.break2_ewindow_count,
    COLUMNS.break2_split_reads_forced,
    COLUMNS.break2_split_reads,
    COLUMNS.cluster_size,
    COLUMNS.contig_alignment_query_consumption,
    COLUMNS.contig_alignment_rank,
    COLUMNS.contig_alignment_score,
    COLUMNS.contig_break1_read_depth,
    COLUMNS.contig_break2_read_depth,
    COLUMNS.contig_build_score,
    COLUMNS.contig_read_depth,
    COLUMNS.contig_remap_score,
    COLUMNS.contig_remapped_reads,
    COLUMNS.contigs_assembled,
    COLUMNS.flanking_pairs_compatible,
    COLUMNS.flanking_pairs,
    COLUMNS.linking_split_reads,
    COLUMNS.raw_break1_half_mapped_reads,
    COLUMNS.raw_break1_split_reads,
    COLUMNS.raw_break2_half_mapped_reads,
    COLUMNS.raw_break2_split_reads,
    COLUMNS.raw_flanking_pairs,
    COLUMNS.raw_spanning_reads,
    COLUMNS.repeat_count,
    COLUMNS.spanning_reads,

BOOLEAN_COLUMNS

BOOLEAN_COLUMNS = {COLUMNS.opposing_strands, COLUMNS.stranded, COLUMNS.supplementary_call}

SUMMARY_LIST_COLUMNS

SUMMARY_LIST_COLUMNS = {
    COLUMNS.annotation_figure,
    COLUMNS.annotation_id,
    COLUMNS.break1_split_reads,
    COLUMNS.break2_split_reads,
    COLUMNS.call_method,
    COLUMNS.contig_alignment_score,
    COLUMNS.contig_remapped_reads,
    COLUMNS.contig_seq,
    COLUMNS.event_type,
    COLUMNS.flanking_pairs,
    COLUMNS.pairing,
    COLUMNS.product_id,
    COLUMNS.spanning_reads,
    COLUMNS.tools,
    COLUMNS.tools,
    COLUMNS.tracking_id,
    COLUMNS.dgv,
    COLUMNS.known_sv_count,

class SPLICE_TYPE

inherits MavisNamespace

holds controlled vocabulary for allowed splice type classification values

Attributes

  • RETAIN (str): an intron was retained
  • SKIP (str): an exon was skipped
  • NORMAL (str): no exons were skipped and no introns were retained. the normal/expected splicing pattern was followed
  • MULTI_RETAIN (str): multiple introns were retained
  • MULTI_SKIP (str): multiple exons were skipped
  • COMPLEX (str): some combination of exon skipping and intron retention

class ORIENT

inherits MavisNamespace

holds controlled vocabulary for allowed orientation values

Attributes

  • LEFT (str): left wrt to the positive/forward strand
  • RIGHT (str): right wrt to the positive/forward strand
  • NS (str): orientation is not specified

class PROTOCOL

inherits MavisNamespace

holds controlled vocabulary for allowed protocol values

Attributes

  • GENOME (str)
  • TRANS (str)

class DISEASE_STATUS

inherits MavisNamespace

holds controlled vocabulary for allowed disease status

Attributes

  • DISEASED (str)
  • NORMAL (str)

class STRAND

inherits MavisNamespace

holds controlled vocabulary for allowed strand values

Attributes

  • POS (str): the positive/forward strand
  • NEG (str): the negative/reverse strand
  • NS (str): strand is not specified

class SVTYPE

inherits MavisNamespace

holds controlled vocabulary for acceptable structural variant classifications

Attributes

  • ITRANS (str)
  • INV (str)
  • INS (str)
  • DUP (str)

class CIGAR

inherits MavisNamespace

Enum-like. For readable cigar values

Attributes

  • M: alignment match (can be a sequence match or mismatch)
  • I: insertion to the reference
  • D: deletion from the reference
  • N: skipped region from the reference
  • S: soft clipping (clipped sequences present in SEQ)
  • H: hard clipping (clipped sequences NOT present in SEQ)
  • P: padding (silent deletion from padded reference)
  • EQ: sequence match (=)
  • X: sequence mismatch

Note

descriptions are taken from the samfile documentation <https://samtools.github.io/hts-specs/SAMv1.pdf>_

class PYSAM_READ_FLAGS

inherits MavisNamespace

Enum-like. For readable PYSAM flag constants

Attributes

  • REVERSE (int): SEQ being reverse complemented
  • MATE_REVERSE (int): SEQ of the next segment in the template being reverse complemented
  • UNMAPPED (int): segment unmapped
  • MATE_UNMAPPED (int): next segment in the template unmapped
  • FIRST_IN_PAIR (int): the first segment in the template
  • LAST_IN_PAIR (int): the last segment in the template
  • SECONDARY (int): secondary alignment
  • MULTIMAP (int): template having multiple segments in sequencing
  • SUPPLEMENTARY (int): supplementary alignment
  • TARGETED_ALIGNMENT (str)
  • RECOMPUTED_CIGAR (str)
  • BLAT_RANK (str)
  • BLAT_SCORE (str)
  • BLAT_ALIGNMENTS (str)
  • BLAT_PERCENT_IDENTITY (str)
  • BLAT_PMS (str)

Note

descriptions are taken from the samfile documentation <https://samtools.github.io/hts-specs/SAMv1.pdf>_

class FLAGS

inherits MavisNamespace

Attributes

  • LQ (str)

class READ_PAIR_TYPE

inherits MavisNamespace

Attributes

  • RR (str)
  • LL (str)
  • RL (str)
  • LR (str)

class CALL_METHOD

inherits MavisNamespace

holds controlled vocabulary for allowed call methods

Attributes

  • CONTIG (str): a contig was assembled and aligned across the breakpoints
  • SPLIT (str): the event was called by split read
  • FLANK (str): the event was called by flanking read pair
  • SPAN (str): the event was called by spanning read
  • INPUT (str)

class GENE_PRODUCT_TYPE

inherits MavisNamespace

controlled vocabulary for gene products

Attributes

  • SENSE (str): the gene product is a sense fusion
  • ANTI_SENSE (str): the gene product is anti-sense

class PRIME

inherits MavisNamespace

Attributes

  • FIVE (int): five prime
  • THREE (int): three prime

class GIEMSA_STAIN

inherits MavisNamespace

holds controlled vocabulary relating to stains of chromosome bands

Attributes

  • GNEG (str)
  • GPOS33 (str)
  • GPOS50 (str)
  • GPOS66 (str)
  • GPOS75 (str)
  • GPOS25 (str)
  • GPOS100 (str)
  • ACEN (str)
  • GVAR (str)
  • STALK (str)

class COLUMNS

inherits MavisNamespace

Column names for i/o files used throughout the pipeline

see column descriptions

Attributes

  • tracking_id (str)
  • library (str)
  • cluster_id (str)
  • cluster_size (str)
  • dgv (str)
  • validation_id (str)
  • annotation_id (str)
  • product_id (str)
  • event_type (str)
  • pairing (str)
  • inferred_pairing (str)
  • gene1 (str)
  • gene1_direction (str)
  • gene2 (str)
  • gene2_direction (str)
  • gene1_aliases (str)
  • gene2_aliases (str)
  • gene_product_type (str)
  • transcript1 (str)
  • transcript2 (str)
  • fusion_splicing_pattern (str)
  • fusion_cdna_coding_start (str)
  • fusion_cdna_coding_end (str)
  • fusion_mapped_domains (str)
  • fusion_sequence_fasta_id (str)
  • fusion_sequence_fasta_file (str)
  • fusion_protein_hgvs (str)
  • annotation_figure (str)
  • annotation_figure_legend (str)
  • genes_encompassed (str)
  • genes_overlapping_break1 (str)
  • genes_overlapping_break2 (str)
  • genes_proximal_to_break1 (str)
  • genes_proximal_to_break2 (str)
  • break1_chromosome (str)
  • break1_position_start (str)
  • break1_position_end (str)
  • break1_orientation (str)
  • exon_last_5prime (str)
  • exon_first_3prime (str)
  • break1_strand (str)
  • break1_seq (str)
  • break2_chromosome (str)
  • break2_position_start (str)
  • break2_position_end (str)
  • break2_orientation (str)
  • break2_strand (str)
  • break2_seq (str)
  • opposing_strands (str)
  • stranded (str)
  • protocol (str)
  • disease_status (str)
  • tools (str)
  • call_method (str)
  • break1_ewindow (str)
  • break1_ewindow_count (str)
  • break1_homologous_seq (str)
  • break1_split_read_names (str)
  • break1_split_reads (str)
  • break1_split_reads_forced (str)
  • break2_ewindow (str)
  • break2_ewindow_count (str)
  • break2_homologous_seq (str)
  • break2_split_read_names (str)
  • break2_split_reads (str)
  • break2_split_reads_forced (str)
  • contig_alignment_query_consumption (str)
  • contig_alignment_score (str)
  • contig_alignment_query_name (str)
  • contig_read_depth (str)
  • contig_break1_read_depth (str)
  • contig_break2_read_depth (str)
  • contig_alignment_rank (str)
  • contig_build_score (str)
  • contig_remap_score (str)
  • contig_remap_coverage (str)
  • contig_remapped_read_names (str)
  • contig_remapped_reads (str)
  • contig_seq (str)
  • contig_strand_specific (str)
  • contigs_assembled (str)
  • call_sequence_complexity (str)
  • known_sv_count (str)
  • spanning_reads (str)
  • spanning_read_names (str)
  • flanking_median_fragment_size (str)
  • flanking_pairs (str)
  • flanking_pairs_compatible (str)
  • flanking_pairs_read_names (str)
  • flanking_pairs_compatible_read_names (str)
  • flanking_stdev_fragment_size (str)
  • linking_split_read_names (str)
  • linking_split_reads (str)
  • raw_break1_half_mapped_reads (str)
  • raw_break1_split_reads (str)
  • raw_break2_half_mapped_reads (str)
  • raw_break2_split_reads (str)
  • raw_flanking_pairs (str)
  • raw_spanning_reads (str)
  • untemplated_seq (str)
  • filter_comment (str)
  • cdna_synon (str)
  • protein_synon (str)
  • supplementary_call (str)
  • net_size (str)
  • repeat_count (str)
  • assumed_untemplated (str)

float_fraction()

cast input to a float

def float_fraction(num):

Args

  • num: input to cast

Returns

: float

Raises

  • TypeError: if the input cannot be cast to a float or the number is not between 0 and 1

reverse_complement()

wrapper for the Bio.Seq reverse_complement method

def reverse_complement(s: str) -> str:

Args

  • s (str): the input DNA sequence

Returns

  • str: the reverse complement of the input sequence

Examples

>>> reverse_complement('ATCCGGT')
'ACCGGAT'

Warning

assumes the input is a DNA sequence

translate()

given a DNA sequence, translates it and returns the protein amino acid sequence

def translate(s: str, reading_frame: int = 0) -> str:

Args

  • s (str): the input DNA sequence
  • reading_frame (int): where to start translating the sequence

Returns

  • str: the amino acid sequence