constants module

module responsible for small utility functions and constants used throughout the structural_variant package

mavis.constants.CALL_METHOD = MavisNamespace(CONTIG='contig', FLANK='flanking reads', INPUT='input', SPAN='spanning reads', SPLIT='split reads')

holds controlled vocabulary for allowed call methods

Type:MavisNamespace
mavis.constants.CIGAR = MavisNamespace(D=2, EQ=7, H=5, I=1, M=0, N=3, P=6, S=4, X=8)

Enum-like. For readable cigar values

  • M: alignment match (can be a sequence match or mismatch)
  • I: insertion to the reference
  • D: deletion from the reference
  • N: skipped region from the reference
  • S: soft clipping (clipped sequences present in SEQ)
  • H: hard clipping (clipped sequences NOT present in SEQ)
  • P: padding (silent deletion from padded reference)
  • EQ: sequence match (=)
  • X: sequence mismatch

note: descriptions are taken from the samfile documentation

Type:MavisNamespace
mavis.constants.CODON_SIZE = 3

the number of bases making up a codon

Type:int
mavis.constants.COLUMNS = MavisNamespace(annotation_figure='annotation_figure', annotation_figure_legend='annotation_figure_legend', annotation_id='annotation_id', assumed_untemplated='assumed_untemplated', break1_chromosome='break1_chromosome', break1_ewindow='break1_ewindow', break1_ewindow_count='break1_ewindow_count', break1_ewindow_practical_coverage='break1_ewindow_practical_coverage', break1_homologous_seq='break1_homologous_seq', break1_orientation='break1_orientation', break1_position_end='break1_position_end', break1_position_start='break1_position_start', break1_seq='break1_seq', break1_split_read_names='break1_split_read_names', break1_split_reads='break1_split_reads', break1_split_reads_forced='break1_split_reads_forced', break1_strand='break1_strand', break2_chromosome='break2_chromosome', break2_ewindow='break2_ewindow', break2_ewindow_count='break2_ewindow_count', break2_ewindow_practical_coverage='break2_ewindow_practical_coverage', break2_homologous_seq='break2_homologous_seq', break2_orientation='break2_orientation', break2_position_end='break2_position_end', break2_position_start='break2_position_start', break2_seq='break2_seq', break2_split_read_names='break2_split_read_names', break2_split_reads='break2_split_reads', break2_split_reads_forced='break2_split_reads_forced', break2_strand='break2_strand', call_method='call_method', call_sequence_complexity='call_sequence_complexity', cdna_synon='cdna_synon', cluster_id='cluster_id', cluster_size='cluster_size', contig_alignment_query_consumption='contig_alignment_query_consumption', contig_alignment_query_name='contig_alignment_query_name', contig_alignment_rank='contig_alignment_rank', contig_alignment_score='contig_alignment_score', contig_break1_read_depth='contig_break1_read_depth', contig_break2_read_depth='contig_break2_read_depth', contig_build_score='contig_build_score', contig_read_depth='contig_read_depth', contig_remap_coverage='contig_remap_coverage', contig_remap_score='contig_remap_score', contig_remapped_read_names='contig_remapped_read_names', contig_remapped_reads='contig_remapped_reads', contig_seq='contig_seq', contig_strand_specific='contig_strand_specific', contigs_assembled='contigs_assembled', disease_status='disease_status', event_type='event_type', exon_first_3prime='exon_first_3prime', exon_last_5prime='exon_last_5prime', filter_comment='filter_comment', flanking_median_fragment_size='flanking_median_fragment_size', flanking_pairs='flanking_pairs', flanking_pairs_compatible='flanking_pairs_compatible', flanking_pairs_compatible_read_names='flanking_pairs_compatible_read_names', flanking_pairs_read_names='flanking_pairs_read_names', flanking_stdev_fragment_size='flanking_stdev_fragment_size', fusion_cdna_coding_end='fusion_cdna_coding_end', fusion_cdna_coding_start='fusion_cdna_coding_start', fusion_mapped_domains='fusion_mapped_domains', fusion_protein_hgvs='fusion_protein_hgvs', fusion_sequence_fasta_file='fusion_sequence_fasta_file', fusion_sequence_fasta_id='fusion_sequence_fasta_id', fusion_splicing_pattern='fusion_splicing_pattern', gene1='gene1', gene1_aliases='gene1_aliases', gene1_direction='gene1_direction', gene2='gene2', gene2_aliases='gene2_aliases', gene2_direction='gene2_direction', gene_product_type='gene_product_type', genes_encompassed='genes_encompassed', genes_overlapping_break1='genes_overlapping_break1', genes_overlapping_break2='genes_overlapping_break2', genes_proximal_to_break1='genes_proximal_to_break1', genes_proximal_to_break2='genes_proximal_to_break2', inferred_pairing='inferred_pairing', library='library', linking_split_read_names='linking_split_read_names', linking_split_reads='linking_split_reads', net_size='net_size', opposing_strands='opposing_strands', pairing='pairing', product_id='product_id', protein_synon='protein_synon', protocol='protocol', raw_break1_half_mapped_reads='raw_break1_half_mapped_reads', raw_break1_split_reads='raw_break1_split_reads', raw_break2_half_mapped_reads='raw_break2_half_mapped_reads', raw_break2_split_reads='raw_break2_split_reads', raw_flanking_pairs='raw_flanking_pairs', raw_spanning_reads='raw_spanning_reads', repeat_count='repeat_count', spanning_read_names='spanning_read_names', spanning_reads='spanning_reads', stranded='stranded', supplementary_call='supplementary_call', tools='tools', tracking_id='tracking_id', transcript1='transcript1', transcript2='transcript2', untemplated_seq='untemplated_seq', validation_id='validation_id')

Column names for i/o files used throughout the pipeline

Type:MavisNamespace
mavis.constants.COMPLETE_STAMP = 'MAVIS.COMPLETE'

Filename for all complete stamp files

Type:str
mavis.constants.DISEASE_STATUS = MavisNamespace(DISEASED='diseased', NORMAL='normal')

holds controlled vocabulary for allowed disease status

  • DISEASED: diseased
  • NORMAL: normal
Type:MavisNamespace
mavis.constants.GENE_PRODUCT_TYPE = MavisNamespace(ANTI_SENSE='anti-sense', SENSE='sense')

controlled vocabulary for gene products

  • SENSE: the gene product is a sense fusion
  • ANTI_SENSE: the gene product is anti-sense
Type:MavisNamespace
mavis.constants.GIEMSA_STAIN = MavisNamespace(ACEN='acen', GNEG='gneg', GPOS100='gpos100', GPOS25='gpos25', GPOS33='gpos33', GPOS50='gpos50', GPOS66='gpos66', GPOS75='gpos75', GVAR='gvar', STALK='stalk')

holds controlled vocabulary relating to stains of chromosome bands

Type:MavisNamespace
class mavis.constants.MavisNamespace(*pos, **kwargs)[source]

Bases: object

Namespace to hold module constants

Example

>>> nspace = MavisNamespace(thing=1, otherthing=2)
>>> nspace.thing
1
>>> nspace.otherthing
2
DELIM = '[;,\\s]+'

delimiter to use is parsing listable variables from the environment or config file

Type:str
add(attr, value, defn=None, cast_type=None, nullable=False, env_overwritable=False, listable=False)[source]

Add an attribute to the name space

Parameters:
  • attr (str) – name of the attribute being added
  • value – the value of the attribute
  • defn (str) – the definition, will be used in generating documentation and help menus
  • cast_type (callable) – the function to use in casting the value
  • nullable (bool) – True if this attribute can have a None value
  • env_overwritable (bool) – True if this attribute will be overriden by its environment variable equivalent
  • listable (bool) – True if this attribute can have multiple values

Example

>>> nspace = MavisNamespace()
>>> nspace.add('thing', 1, int, 'I am a thing')
>>> nspace = MavisNamespace()
>>> nspace.add('thing', 1, int)
>>> nspace = MavisNamespace()
>>> nspace.add('thing', 1)
>>> nspace = MavisNamespace()
>>> nspace.add('thing', value=1, cast_type=int, defn='I am a thing')
copy_from(source, attrs=None)[source]

Copy variables from one namespace onto the current namespace

define(attr, *pos)[source]

Get the definition of a given attribute or return a default (when given) if the attribute does not exist

Returns:definition for the attribute
Return type:str
Raises:KeyError – the attribute does not exist and a default was not given

Example

>>> nspace = MavisNamespace()
>>> nspace.add('thing', 1, defn='I am a thing')
>>> nspace.add('otherthing', 2)
>>> nspace.define('thing')
'I am a thing'
>>> nspace.define('otherthing')
Traceback (most recent call last):
....
>>> nspace.define('otherthing', 'I am some other thing')
'I am some other thing'
discard(attr)[source]

Remove a variable if it exists

enforce(value)[source]

checks that the current namespace has a given value

Returns:the input value
Raises:KeyError – the value did not exist

Example

>>> nspace = MavisNamespace(thing=1, otherthing=2)
>>> nspace.enforce(1)
1
>>> nspace.enforce(3)
Traceback (most recent call last):
....
get(key, *pos)[source]

get an attribute, return a default (if given) if the attribute does not exist

Example

>>> nspace = MavisNamespace(thing=1, otherthing=2)
>>> nspace.get('thing', 2)
1
>>> nspace.get('nonexistant_thing', 2)
2
>>> nspace.get('nonexistant_thing')
Traceback (most recent call last):
....
get_env_name(attr)[source]

Get the name of the corresponding environment variable

Example

>>> nspace = MavisNamespace(a=1)
>>> nspace.get_env_name('a')
'MAVIS_A'
get_env_var(attr)[source]

retrieve the environment variable definition of a given attribute

is_env_overwritable(attr)[source]
Returns:True if the variable is overrided by specifying the environment variable equivalent
Return type:bool
is_listable(attr)[source]
Returns:True if the variable should be parsed as a list
Return type:bool
is_nullable(attr)[source]
Returns:True if the variable can be set to None
Return type:bool
items()[source]

Example

>>> MavisNamespace(thing=1, otherthing=2).items()
[('thing', 1), ('otherthing', 2)]
keys()[source]

get the attribute keys as a list

Example

>>> MavisNamespace(thing=1, otherthing=2).keys()
['thing', 'otherthing']
classmethod parse_listable_string(string, cast_type=<class 'str'>, nullable=False)[source]

Given some string, parse it into a list

Example

>>> MavisNamespace.parse_listable_string('1,2,3', int)
[1, 2, 3]
>>> MavisNamespace.parse_listable_string('1;2,None', int, True)
[1, 2, None]
reverse(value)[source]

for a given value, return the associated key

Parameters:

value – the value to get the key/attribute name for

Raises:

Example

>>> nspace = MavisNamespace(thing=1, otherthing=2)
>>> nspace.reverse(1)
'thing'
to_dict()[source]
type(attr, *pos)[source]

returns the type

Example

>>> nspace = MavisNamespace(thing=1, otherthing=2)
>>> nspace.type('thing')
<class 'int'>
values()[source]

get the attribute values as a list

Example

>>> MavisNamespace(thing=1, otherthing=2).values()
[1, 2]
mavis.constants.NA_MAPPING_QUALITY = 255

mapping quality value to indicate mapping was not performed/calculated

Type:int
mavis.constants.ORIENT = MavisNamespace(LEFT='L', NS='?', RIGHT='R', compare=<function <lambda>>, expand=<function <lambda>>)

holds controlled vocabulary for allowed orientation values

  • LEFT: left wrt to the positive/forward strand
  • RIGHT: right wrt to the positive/forward strand
  • NS: orientation is not specified
Type:MavisNamespace
mavis.constants.PRIME = MavisNamespace(FIVE=5, THREE=3)

holds controlled vocabulary

  • FIVE: five prime
  • THREE: three prime
Type:MavisNamespace
mavis.constants.PROTOCOL = MavisNamespace(GENOME='genome', TRANS='transcriptome')

holds controlled vocabulary for allowed protocol values

  • GENOME: genome
  • TRANS: transcriptome
Type:MavisNamespace
mavis.constants.PYSAM_READ_FLAGS = MavisNamespace(BLAT_ALIGNMENTS='ba', BLAT_PERCENT_IDENTITY='bi', BLAT_PMS='bp', BLAT_RANK='br', BLAT_SCORE='bs', FIRST_IN_PAIR=64, LAST_IN_PAIR=128, MATE_REVERSE=32, MATE_UNMAPPED=8, MULTIMAP=1, RECOMPUTED_CIGAR='rc', REVERSE=16, SECONDARY=256, SUPPLEMENTARY=2048, TARGETED_ALIGNMENT='ta', UNMAPPED=4)

Enum-like. For readable PYSAM flag constants

  • MULTIMAP: template having multiple segments in sequencing
  • UNMAPPED: segment unmapped
  • MATE_UNMAPPED: next segment in the template unmapped
  • REVERSE: SEQ being reverse complemented
  • MATE_REVERSE: SEQ of the next segment in the template being reverse complemented
  • FIRST_IN_PAIR: the first segment in the template
  • LAST_IN_PAIR: the last segment in the template
  • SECONDARY: secondary alignment
  • SUPPLEMENTARY: supplementary alignment

note: descriptions are taken from the samfile documentation

Type:MavisNamespace
mavis.constants.START_AA = 'M'

The amino acid expected to start translation

Type:str
mavis.constants.STOP_AA = '*'

The amino acid expected to end translation

Type:str
mavis.constants.STRAND = MavisNamespace(NEG='-', NS='?', POS='+', compare=<function <lambda>>, expand=<function <lambda>>)

holds controlled vocabulary for allowed strand values

  • POS: the positive/forward strand
  • NEG: the negative/reverse strand
  • NS: strand is not specified
Type:MavisNamespace
mavis.constants.SUBCOMMAND = MavisNamespace(ANNOTATE='annotate', CLUSTER='cluster', CONFIG='config', CONVERT='convert', OVERLAY='overlay', PAIR='pairing', SCHEDULE='schedule', SETUP='setup', SUMMARY='summary', VALIDATE='validate')

holds controlled vocabulary for allowed pipeline stage values

  • annotate
  • cluster
  • config
  • convert
  • pairing
  • pipeline
  • schedule
  • summary
  • validate
Type:MavisNamespace
mavis.constants.SVTYPE = MavisNamespace(DEL='deletion', DUP='duplication', INS='insertion', INV='inversion', ITRANS='inverted translocation', TRANS='translocation')

holds controlled vocabulary for acceptable structural variant classifications

  • DEL: deletion
  • TRANS: translocation
  • ITRANS: inverted translocation
  • INV: inversion
  • INS: insertion
  • DUP: duplication
Type:MavisNamespace
mavis.constants.float_fraction(num)[source]

cast input to a float

Parameters:num – input to cast
Returns:float
Raises:TypeError – if the input cannot be cast to a float or the number is not between 0 and 1
mavis.constants.reverse_complement(s)[source]

wrapper for the Bio.Seq reverse_complement method

Parameters:s (str) – the input DNA sequence
Returns:the reverse complement of the input sequence
Return type:str

Warning

assumes the input is a DNA sequence

Example

>>> reverse_complement('ATCCGGT')
'ACCGGAT'
mavis.constants.sort_columns(input_columns)[source]
mavis.constants.translate(s, reading_frame=0)[source]

given a DNA sequence, translates it and returns the protein amino acid sequence

Parameters:
  • s (str) – the input DNA sequence
  • reading_frame (int) – where to start translating the sequence
Returns:

the amino acid sequence

Return type:

str