Skip to content

Configurable Settings

aligner

type: mavis.align.SUPPORTED_ALIGNER

environment variable: MAVIS_ALIGNER

default: 'blat'

accepted values: 'bwa mem', 'blat'

The aligner to use to map the contigs/reads back to the reference e.g blat or bwa

aligner_reference

type: filepath

environment variable: MAVIS_ALIGNER_REFERENCE

default: None

Path to the aligner reference file used for aligning the contig sequences

annotation_filters

type: str

environment variable: MAVIS_ANNOTATION_FILTERS

default: 'choose_more_annotated,choose_transcripts_by_priority'

A comma separated list of filters to apply to putative annotations

annotation_memory

type: int

environment variable: MAVIS_ANNOTATION_MEMORY

default: 12000

Default memory limit (mb) for the annotation stage

annotations

type: filepath

environment variable: MAVIS_ANNOTATIONS

default: []

Path to the reference annotations of genes, transcript, exons, domains, etc

assembly_kmer_size

type: float_fraction

environment variable: MAVIS_ASSEMBLY_KMER_SIZE

default: 0.74

The percent of the read length to make kmers for assembly

assembly_max_paths

type: int

environment variable: MAVIS_ASSEMBLY_MAX_PATHS

default: 8

The maximum number of paths to resolve. this is used to limit when there is a messy assembly graph to resolve. the assembly will pre-calculate the number of paths (or putative assemblies) and stop if it is greater than the given setting

assembly_min_edge_trim_weight

type: int

environment variable: MAVIS_ASSEMBLY_MIN_EDGE_TRIM_WEIGHT

default: 3

This is used to simplify the debruijn graph before path finding. edges with less than this frequency will be discarded if they are non-cutting, at a fork, or the end of a path

assembly_min_exact_match_to_remap

type: int

environment variable: MAVIS_ASSEMBLY_MIN_EXACT_MATCH_TO_REMAP

default: 15

The minimum length of exact matches to initiate remapping a read to a contig

assembly_min_remap_coverage

type: float_fraction

environment variable: MAVIS_ASSEMBLY_MIN_REMAP_COVERAGE

default: 0.9

Minimum fraction of the contig sequence which the remapped sequences must align over

assembly_min_remapped_seq

type: int

environment variable: MAVIS_ASSEMBLY_MIN_REMAPPED_SEQ

default: 3

The minimum input sequences that must remap for an assembled contig to be used

assembly_min_uniq

type: float_fraction

environment variable: MAVIS_ASSEMBLY_MIN_UNIQ

default: 0.1

Minimum percent uniq required to keep separate assembled contigs. if contigs are more similar then the lower scoring, then shorter, contig is dropped

assembly_strand_concordance

type: float_fraction

environment variable: MAVIS_ASSEMBLY_STRAND_CONCORDANCE

default: 0.51

When the number of remapped reads from each strand are compared, the ratio must be above this number to decide on the strand

blat_limit_top_aln

type: int

environment variable: MAVIS_BLAT_LIMIT_TOP_ALN

default: 10

Number of results to return from blat (ranking based on score)

blat_min_identity

type: float_fraction

environment variable: MAVIS_BLAT_MIN_IDENTITY

default: 0.9

The minimum percent identity match required for blat results when aligning contigs

breakpoint_color

type: str

environment variable: MAVIS_BREAKPOINT_COLOR

default: '#000000'

Breakpoint outline color

call_error

type: int

environment variable: MAVIS_CALL_ERROR

default: 10

Buffer zone for the evidence window

clean_aligner_files

type: cast_boolean

environment variable: MAVIS_CLEAN_ALIGNER_FILES

default: False

Remove the aligner output files after the validation stage is complete. not required for subsequent steps but can be useful in debugging and deep investigation of events

cluster_initial_size_limit

type: int

environment variable: MAVIS_CLUSTER_INITIAL_SIZE_LIMIT

default: 25

The maximum cumulative size of both breakpoints for breakpoint pairs to be used in the initial clustering phase (combining based on overlap)

cluster_radius

type: int

environment variable: MAVIS_CLUSTER_RADIUS

default: 100

Maximum distance allowed between paired breakpoint pairs

concurrency_limit

type: int

environment variable: MAVIS_CONCURRENCY_LIMIT

default: None

The concurrency limit for tasks in any given job array or the number of concurrent processes allowed for a local run

contig_aln_max_event_size

type: int

environment variable: MAVIS_CONTIG_ALN_MAX_EVENT_SIZE

default: 50

Relates to determining breakpoints when pairing contig alignments. for any given read in a putative pair the soft clipping is extended to include any events of greater than this size. the softclipping is added to the side of the alignment as indicated by the breakpoint we are assigning pairs to

contig_aln_merge_inner_anchor

type: int

environment variable: MAVIS_CONTIG_ALN_MERGE_INNER_ANCHOR

default: 20

The minimum number of consecutive exact match base pairs to not merge events within a contig alignment

contig_aln_merge_outer_anchor

type: int

environment variable: MAVIS_CONTIG_ALN_MERGE_OUTER_ANCHOR

default: 15

Minimum consecutively aligned exact matches to anchor an end for merging internal events

contig_aln_min_anchor_size

type: int

environment variable: MAVIS_CONTIG_ALN_MIN_ANCHOR_SIZE

default: 50

The minimum number of aligned bases for a contig (m or =) in order to simplify. do not have to be consecutive

contig_aln_min_extend_overlap

type: int

environment variable: MAVIS_CONTIG_ALN_MIN_EXTEND_OVERLAP

default: 10

Minimum number of bases the query coverage interval must be extended by in order to pair alignments as a single split alignment

contig_aln_min_query_consumption

type: float_fraction

environment variable: MAVIS_CONTIG_ALN_MIN_QUERY_CONSUMPTION

default: 0.9

Minimum fraction of the original query sequence that must be used by the read(s) of the alignment

contig_aln_min_score

type: float_fraction

environment variable: MAVIS_CONTIG_ALN_MIN_SCORE

default: 0.9

Minimum score for a contig to be used as evidence in a call by contig

contig_call_distance

type: int

environment variable: MAVIS_CONTIG_CALL_DISTANCE

default: 10

The maximum distance allowed between breakpoint pairs (called by contig) in order for them to pair

dgv_annotation

type: filepath

environment variable: MAVIS_DGV_ANNOTATION

default: []

Path to the dgv reference processed to look like the cytoband file

domain_color

type: str

environment variable: MAVIS_DOMAIN_COLOR

default: '#ccccb3'

Domain fill color

domain_mismatch_color

type: str

environment variable: MAVIS_DOMAIN_MISMATCH_COLOR

default: '#b2182b'

Domain fill color on 0%% match

domain_name_regex_filter

type: str

environment variable: MAVIS_DOMAIN_NAME_REGEX_FILTER

default: '^PF\\d+$'

The regular expression used to select domains to be displayed (filtered by name)

domain_scaffold_color

type: str

environment variable: MAVIS_DOMAIN_SCAFFOLD_COLOR

default: '#000000'

The color of the domain scaffold

draw_fusions_only

type: cast_boolean

environment variable: MAVIS_DRAW_FUSIONS_ONLY

default: True

Flag to indicate if events which do not produce a fusion transcript should produce illustrations

draw_non_synonymous_cdna_only

type: cast_boolean

environment variable: MAVIS_DRAW_NON_SYNONYMOUS_CDNA_ONLY

default: True

Flag to indicate if events which are synonymous at the cdna level should produce illustrations

drawing_width_iter_increase

type: int

environment variable: MAVIS_DRAWING_WIDTH_ITER_INCREASE

default: 500

The amount (in pixels) by which to increase the drawing width upon failure to fit

exon_min_focus_size

type: int

environment variable: MAVIS_EXON_MIN_FOCUS_SIZE

default: 10

Minimum size of an exon for it to be granted a label or min exon width

fetch_min_bin_size

type: int

environment variable: MAVIS_FETCH_MIN_BIN_SIZE

default: 50

The minimum size of any bin for reading from a bam file. increasing this number will result in smaller bins being merged or less bins being created (depending on the fetch method)

fetch_reads_bins

type: int

environment variable: MAVIS_FETCH_READS_BINS

default: 5

Number of bins to split an evidence window into to ensure more even sampling of high coverage regions

fetch_reads_limit

type: int

environment variable: MAVIS_FETCH_READS_LIMIT

default: 3000

Maximum number of reads, cap, to loop over for any given evidence window

filter_cdna_synon

type: cast_boolean

environment variable: MAVIS_FILTER_CDNA_SYNON

default: True

Filter all annotations synonymous at the cdna level

filter_min_complexity

type: float_fraction

environment variable: MAVIS_FILTER_MIN_COMPLEXITY

default: 0.2

Filter event calls based on call sequence complexity

filter_min_flanking_reads

type: int

environment variable: MAVIS_FILTER_MIN_FLANKING_READS

default: 10

Minimum number of flanking pairs for a call by flanking pairs

filter_min_linking_split_reads

type: int

environment variable: MAVIS_FILTER_MIN_LINKING_SPLIT_READS

default: 1

Minimum number of linking split reads for a call by split reads

filter_min_remapped_reads

type: int

environment variable: MAVIS_FILTER_MIN_REMAPPED_READS

default: 5

Minimum number of remapped reads for a call by contig

filter_min_spanning_reads

type: int

environment variable: MAVIS_FILTER_MIN_SPANNING_READS

default: 5

Minimum number of spanning reads for a call by spanning reads

filter_min_split_reads

type: int

environment variable: MAVIS_FILTER_MIN_SPLIT_READS

default: 5

Minimum number of split reads for a call by split reads

filter_protein_synon

type: cast_boolean

environment variable: MAVIS_FILTER_PROTEIN_SYNON

default: False

Filter all annotations synonymous at the protein level

filter_secondary_alignments

type: cast_boolean

environment variable: MAVIS_FILTER_SECONDARY_ALIGNMENTS

default: True

Filter secondary alignments when gathering read evidence

filter_trans_homopolymers

type: cast_boolean

environment variable: MAVIS_FILTER_TRANS_HOMOPOLYMERS

default: True

Filter all single bp ins/del/dup events that are in a homopolymer region of at least 3 bps and are not paired to a genomic event

flanking_call_distance

type: int

environment variable: MAVIS_FLANKING_CALL_DISTANCE

default: 50

The maximum distance allowed between breakpoint pairs (called by flanking pairs) in order for them to pair

fuzzy_mismatch_number

type: int

environment variable: MAVIS_FUZZY_MISMATCH_NUMBER

default: 1

The number of events/mismatches allowed to be considered a fuzzy match

gene1_color

type: str

environment variable: MAVIS_GENE1_COLOR

default: '#657e91'

The color of genes near the first gene

gene1_color_selected

type: str

environment variable: MAVIS_GENE1_COLOR_SELECTED

default: '#518dc5'

The color of the first gene

gene2_color

type: str

environment variable: MAVIS_GENE2_COLOR

default: '#325556'

The color of genes near the second gene

gene2_color_selected

type: str

environment variable: MAVIS_GENE2_COLOR_SELECTED

default: '#4c9677'

The color of the second gene

import_env

type: cast_boolean

environment variable: MAVIS_IMPORT_ENV

default: True

Flag to import environment variables

input_call_distance

type: int

environment variable: MAVIS_INPUT_CALL_DISTANCE

default: 20

The maximum distance allowed between breakpoint pairs (called by input tools, not validated) in order for them to pair

label_color

type: str

environment variable: MAVIS_LABEL_COLOR

default: '#000000'

The label color

limit_to_chr

type: str

environment variable: MAVIS_LIMIT_TO_CHR

default: ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', 'X', 'Y']

A list of chromosome names to use. breakpointpairs on other chromosomes will be filteredout. for example '1 2 3 4' would filter out events/breakpoint pairs on any chromosomes but 1, 2, 3, and 4

mail_type

type: mavis.schedule.constants.MAIL_TYPE

environment variable: MAVIS_MAIL_TYPE

default: 'NONE'

accepted values: 'BEGIN', 'END', 'FAIL', 'ALL', 'NONE'

When to notify the mail_user (if given)

mail_user

type: str

environment variable: MAVIS_MAIL_USER

default: ''

User(s) to send notifications to

mask_fill

type: str

environment variable: MAVIS_MASK_FILL

default: '#ffffff'

Color of mask (for deleted region etc.)

mask_opacity

type: float_fraction

environment variable: MAVIS_MASK_OPACITY

default: 0.7

Opacity of the mask layer

masking

type: filepath

environment variable: MAVIS_MASKING

default: []

File containing regions for which input events overlapping them are dropped prior to validation

max_drawing_retries

type: int

environment variable: MAVIS_MAX_DRAWING_RETRIES

default: 5

The maximum number of retries for attempting a drawing. each iteration the width is extended. if it is still insufficient after this number a gene-level only drawing will be output

max_files

type: int

environment variable: MAVIS_MAX_FILES

default: 200

The maximum number of files to output from clustering/splitting

max_orf_cap

type: int

environment variable: MAVIS_MAX_ORF_CAP

default: 3

The maximum number of orfs to return (best putative orfs will be retained)

max_proximity

type: int

environment variable: MAVIS_MAX_PROXIMITY

default: 5000

The maximum distance away from an annotation before the region in considered to be uninformative

max_sc_preceeding_anchor

type: int

environment variable: MAVIS_MAX_SC_PRECEEDING_ANCHOR

default: 6

When remapping a softclipped read this determines the amount of softclipping allowed on the side opposite of where we expect it. for example for a softclipped read on a breakpoint with a left orientation this limits the amount of softclipping that is allowed on the right. if this is set to none then there is no limit on softclipping

memory_limit

type: int

environment variable: MAVIS_MEMORY_LIMIT

default: 16000

The maximum number of megabytes (mb) any given job is allowed

min_anchor_exact

type: int

environment variable: MAVIS_MIN_ANCHOR_EXACT

default: 6

Applies to re-aligning softclipped reads to the opposing breakpoint. the minimum number of consecutive exact matches to anchor a read to initiate targeted realignment

min_anchor_fuzzy

type: int

environment variable: MAVIS_MIN_ANCHOR_FUZZY

default: 10

Applies to re-aligning softclipped reads to the opposing breakpoint. the minimum length of a fuzzy match to anchor a read to initiate targeted realignment

min_anchor_match

type: float_fraction

environment variable: MAVIS_MIN_ANCHOR_MATCH

default: 0.9

Minimum percent match for a read to be kept as evidence

min_call_complexity

type: float_fraction

environment variable: MAVIS_MIN_CALL_COMPLEXITY

default: 0.1

The minimum complexity score for a call sequence. is an average for non-contig calls. filters low complexity contigs before alignment. see contig_complexity

min_clusters_per_file

type: int

environment variable: MAVIS_MIN_CLUSTERS_PER_FILE

default: 50

The minimum number of breakpoint pairs to output to a file

min_domain_mapping_match

type: float_fraction

environment variable: MAVIS_MIN_DOMAIN_MAPPING_MATCH

default: 0.9

A number between 0 and 1 representing the minimum percent match a domain must map to the fusion transcript to be displayed

min_double_aligned_to_estimate_insertion_size

type: int

environment variable: MAVIS_MIN_DOUBLE_ALIGNED_TO_ESTIMATE_INSERTION_SIZE

default: 2

The minimum number of reads which map soft-clipped to both breakpoints to assume the size of the untemplated sequence between the breakpoints is at most the read length - 2 * min_softclipping

min_flanking_pairs_resolution

type: int

environment variable: MAVIS_MIN_FLANKING_PAIRS_RESOLUTION

default: 10

The minimum number of flanking reads required to call a breakpoint by flanking evidence

min_linking_split_reads

type: int

environment variable: MAVIS_MIN_LINKING_SPLIT_READS

default: 2

The minimum number of split reads which aligned to both breakpoints

min_mapping_quality

type: int

environment variable: MAVIS_MIN_MAPPING_QUALITY

default: 5

The minimum mapping quality of reads to be used as evidence

min_non_target_aligned_split_reads

type: int

environment variable: MAVIS_MIN_NON_TARGET_ALIGNED_SPLIT_READS

default: 1

The minimum number of split reads aligned to a breakpoint by the input bam and no forced by local alignment to the target region to call a breakpoint by split read evidence

min_orf_size

type: int

environment variable: MAVIS_MIN_ORF_SIZE

default: 300

The minimum length (in base pairs) to retain a putative open reading frame (orf)

min_sample_size_to_apply_percentage

type: int

environment variable: MAVIS_MIN_SAMPLE_SIZE_TO_APPLY_PERCENTAGE

default: 10

Minimum number of aligned bases to compute a match percent. if there are less than this number of aligned bases (match or mismatch) the percent comparator is not used

min_softclipping

type: int

environment variable: MAVIS_MIN_SOFTCLIPPING

default: 6

Minimum number of soft-clipped bases required for a read to be used as soft-clipped evidence

min_spanning_reads_resolution

type: int

environment variable: MAVIS_MIN_SPANNING_READS_RESOLUTION

default: 5

Minimum number of spanning reads required to call an event by spanning evidence

min_splits_reads_resolution

type: int

environment variable: MAVIS_MIN_SPLITS_READS_RESOLUTION

default: 3

Minimum number of split reads required to call a breakpoint by split reads

novel_exon_color

type: str

environment variable: MAVIS_NOVEL_EXON_COLOR

default: '#5D3F6A'

Novel exon fill color

outer_window_min_event_size

type: int

environment variable: MAVIS_OUTER_WINDOW_MIN_EVENT_SIZE

default: 125

The minimum size of an event in order for flanking read evidence to be collected

queue

type: str

environment variable: MAVIS_QUEUE

default: ''

The queue jobs are to be submitted to

reference_genome

type: filepath

environment variable: MAVIS_REFERENCE_GENOME

default: []

Path to the human reference genome fasta file

remote_head_ssh

type: str

environment variable: MAVIS_REMOTE_HEAD_SSH

default: ''

Ssh target for remote scheduler commands

scaffold_color

type: str

environment variable: MAVIS_SCAFFOLD_COLOR

default: '#000000'

The color used for the gene/transcripts scaffolds

scheduler

type: mavis.schedule.constants.SCHEDULER

environment variable: MAVIS_SCHEDULER

default: 'SLURM'

accepted values: 'SGE', 'SLURM', 'TORQUE', 'LOCAL'

The scheduler being used

spanning_call_distance

type: int

environment variable: MAVIS_SPANNING_CALL_DISTANCE

default: 20

The maximum distance allowed between breakpoint pairs (called by spanning reads) in order for them to pair

splice_color

type: str

environment variable: MAVIS_SPLICE_COLOR

default: '#000000'

Splicing lines color

split_call_distance

type: int

environment variable: MAVIS_SPLIT_CALL_DISTANCE

default: 20

The maximum distance allowed between breakpoint pairs (called by split reads) in order for them to pair

stdev_count_abnormal

type: float

environment variable: MAVIS_STDEV_COUNT_ABNORMAL

default: 3.0

The number of standard deviations away from the normal considered expected and therefore not qualifying as flanking reads

strand_determining_read

type: int

environment variable: MAVIS_STRAND_DETERMINING_READ

default: 2

1 or 2. the read in the pair which determines if (assuming a stranded protocol) the first or second read in the pair matches the strand sequenced

template_metadata

type: filepath

environment variable: MAVIS_TEMPLATE_METADATA

default: []

File containing the cytoband template information. used for illustrations only

time_limit

type: int

environment variable: MAVIS_TIME_LIMIT

default: 57600

The time in seconds any given jobs is allowed

trans_fetch_reads_limit

type: int

environment variable: MAVIS_TRANS_FETCH_READS_LIMIT

default: 12000

Related to fetch_reads_limit. overrides fetch_reads_limit for transcriptome libraries when set. if this has a value of none then fetch_reads_limit will be used for transcriptome libraries instead

trans_min_mapping_quality

type: int

environment variable: MAVIS_TRANS_MIN_MAPPING_QUALITY

default: 0

Related to min_mapping_quality. overrides the min_mapping_quality if the library is a transcriptome and this is set to any number not none. if this value is none, min_mapping_quality is used for transcriptomes aswell as genomes

trans_validation_memory

type: int

environment variable: MAVIS_TRANS_VALIDATION_MEMORY

default: 18000

Default memory limit (mb) for the validation stage (for transcriptomes)

uninformative_filter

type: cast_boolean

environment variable: MAVIS_UNINFORMATIVE_FILTER

default: False

Flag that determines if breakpoint pairs which are not within max_proximity to any annotations are filtered out prior to clustering

validation_memory

type: int

environment variable: MAVIS_VALIDATION_MEMORY

default: 16000

Default memory limit (mb) for the validation stage

width

type: int

environment variable: MAVIS_WIDTH

default: 1000

The drawing width in pixels

write_evidence_files

type: cast_boolean

environment variable: MAVIS_WRITE_EVIDENCE_FILES

default: True

Write the intermediate bam and bed files containing the raw evidence collected and contigs aligned. not required for subsequent steps but can be useful in debugging and deep investigation of events