Column Names
List of column names and their definitions. The types indicated here are the expected types in a row for a given column name.
library
Identifier for the library/source
cluster_id
Identifier for the merging/clustering step
cluster_size
type: int
The number of breakpoint pair calls that were grouped in creating the cluster
validation_id
Identifier for the validation step
annotation_id
Identifier for the annotation step
product_id
Unique identifier of the final fusion including splicing and ORF decision from the annotation step
event_type
type: mavis.constants.SVTYPE
The classification of the event
inferred_pairing
A semi colon delimited of event identifiers i.e.
<annotation_id>_<splicing pattern>_<cds start>_<cds end>
which were paired to the current event based on predicted products
pairing
A semi colon delimited of event identifiers i.e.
<annotation_id>_<splicing pattern>_<cds start>_<cds end>
which were paired to the current event based on breakpoint positions
gene1
Gene for the current annotation at the first breakpoint
gene1_direction
type: mavis.constants.PRIME
The direction/prime of the gene
gene2
Gene for the current annotation at the second breakpoint
gene2_direction
type: mavis.constants.PRIME
The direction/prime of the gene. Has the following possible values
gene1_aliases
Other gene names associated with the current annotation at the first breakpoint
gene2_aliases
Other gene names associated with the current annotation at the second breakpoint
gene_product_type
type: mavis.constants.GENE_PRODUCT_TYPE
Describes if the putative fusion product will be sense or anti-sense
transcript1
Transcript for the current annotation at the first breakpoint
transcript2
Transcript for the current annotation at the second breakpoint
fusion_splicing_pattern
type: mavis.constants.SPLICE_TYPE
Type of splicing pattern used to create the fusion cDNA.
fusion_cdna_coding_start
type: int
Position wrt the 5' end of the fusion transcript where coding begins first base of the Met amino acid.
fusion_cdna_coding_end
type: int
Position wrt the 5' end of the fusion transcript where coding ends last base of the stop codon
fusion_mapped_domains
type: JSON
List of domains in JSON format where each domain start and end positions are given wrt to the fusion transcript and the mapping quality is the number of matching amino acid positions over the total number of amino acids. The sequence is the amino acid sequence of the domain on the reference/original transcript
fusion_sequence_fasta_id
The sequence identifier for the cdna sequence output fasta file
fusion_sequence_fasta_file
type: FILEPATH
Path to the corresponding fasta output file
annotation_figure
type: FILEPATH
File path to the svg drawing representing the annotation
annotation_figure_legend
type: JSON
JSON data for the figure legend
genes_encompassed
Applies to intrachromosomal events only. List of genes which overlap any region that occurs between both breakpoints. For example in a deletion event these would be deleted genes.
genes_overlapping_break1
list of genes which overlap the first breakpoint
genes_overlapping_break2
list of genes which overlap the second breakpoint
genes_proximal_to_break1
list of genes near the breakpoint and the distance away from the breakpoint
genes_proximal_to_break2
list of genes near the breakpoint and the distance away from the breakpoint
break1_chromosome
type: str
The name of the chromosome on which breakpoint 1 is situated
break1_position_start
type: int
Start integer inclusive 1-based of the range representing breakpoint 1
break1_position_end
type: int
End integer inclusive 1-based of the range representing breakpoint 1
break1_orientation
type: mavis.constants.ORIENT
The side of the breakpoint wrt the positive/forward strand that is retained.
break1_strand
type: mavis.constants.STRAND
The strand wrt to the reference positive/forward strand at this breakpoint.
break1_seq
type: str
The sequence up to and including the breakpoint. Always given wrt to the positive/forward strand
break2_chromosome
The name of the chromosome on which breakpoint 2 is situated
break2_position_start
type: int
Start integer inclusive 1-based of the range representing breakpoint 2
break2_position_end
type: int
End integer inclusive 1-based of the range representing breakpoint 2
break2_orientation
type: mavis.constants.ORIENT
The side of the breakpoint wrt the positive/forward strand that is retained.
break2_strand
type: mavis.constants.STRAND
The strand wrt to the reference positive/forward strand at this breakpoint.
break2_seq
type: str
The sequence up to and including the breakpoint. Always given wrt to the positive/forward strand
opposing_strands
type: bool
Specifies if breakpoints are on opposite strands wrt to the reference. Expects a boolean
stranded
type: bool
Specifies if the sequencing protocol was strand specific or not. Expects a boolean
protocol
type: mavis.constants.PROTOCOL
Specifies the type of library
tools
The tools that called the event originally from the cluster step.
Should be a semi-colon delimited list of <tool name>_<tool version>
contigs_assembled
type: int
Number of contigs that were built from split read sequences
contigs_aligned
type: int
Number of contigs that were able to align
contig_alignment_query_name
The query name for the contig alignment. Should match the 'read' name(s) in the .contigs.bam output file
contig_seq
type: str
Sequence of the current contig wrt to the positive forward strand if not strand specific
contig_remap_score
type: float
Score representing the number of sequences from the set of sequences given to the assembly algorithm that were aligned to the resulting contig with an acceptable scoring based on user-set thresholds. For any sequence its contribution to the score is divided by the number of mappings to give less weight to multimaps
call_sequence_complexity
type: float
The minimum amount any two bases account for of the proportion of call sequence. An average for non-contig calls
contig_remapped_reads
type: int
the number of reads from the input bam that map to the assembled contig
contig_remapped_read_names
read query names for the reads that were remapped. A -1 or -2 has been appended to the end of the name to indicate if this is the first or second read in the pair
contig_alignment_score
type: float
A rank based on the alignment tool blat etc. of the alignment being used. An average if split alignments were used. Lower numbers indicate a better alignment. If it was the best alignment possible then this would be zero.
contig_alignment_reference_start
The reference start(s) <chr>:<position>
of the contig alignment.
Semi-colon delimited
contig_alignment_cigar
The cigar string(s) representing the contig alignment. Semi-colon delimited
contig_remap_coverage
type: float
Fraction of the contig sequence which is covered by the remapped reads
contig_build_score
type: int
Score representing the edge weights of all edges used in building the sequence
contig_strand_specific
type: bool
A flag to indicate if it was possible to resolve the strand for this contig
spanning_reads
type: int
the number of spanning reads which support the event
spanning_read_names
read query names of the spanning reads which support the current event
call_method
type: mavis.constants.CALL_METHOD
The method used to call the breakpoints
flanking_pairs
type: int
Number of read-pairs where one read aligns to the first breakpoint window and the second read aligns to the other. The count here is based on the number of unique query names
flanking_pairs_compatible
type: int
Number of flanking pairs of a compatible orientation type. This applies to insertions and duplications. Flanking pairs supporting an insertion will be compatible to a duplication and flanking pairs supporting a duplication will be compatible to an insertion (possibly indicating an internal translocation)
flanking_median_fragment_size
type: int
The median fragment size of the flanking reads being used as evidence
flanking_stdev_fragment_size
type: float
The standard deviation in fragment size of the flanking reads being used as evidence
break1_split_reads
type: int
Number of split reads that call the exact breakpoint given
break1_split_reads_forced
type: int
Number of split reads which were aligned to the opposite breakpoint window using a targeted alignment
break2_split_reads
type: int
Number of split reads that call the exact breakpoint given
break2_split_reads_forced
type: int
Number of split reads which were aligned to the opposite breakpoint window using a targeted alignment
linking_split_reads
type: int
Number of split reads that align to both breakpoints
untemplated_seq
type: str
The untemplated/novel sequence between the breakpoints
break1_homologous_seq
type: str
Sequence in common at the first breakpoint and other side of the second breakpoint
break2_homologous_seq
type: str
Sequence in common at the second breakpoint and other side of the first breakpoint
break1_ewindow
type: int-int
Window where evidence was gathered for the first breakpoint
break1_ewindow_count
type: int
Number of reads processed/looked-at in the first evidence window
break1_ewindow_practical_coverage
break1_ewindow_practical_coverage: float = break1_ewindow_count / len(break1_ewindow)
Not the actual coverage as bins are sampled within and there is a read limit cutoff
break2_ewindow
type: int-int
Window where evidence was gathered for the second breakpoint
break2_ewindow_count
type: int
Number of reads processed/looked-at in the second evidence window
break2_ewindow_practical_coverage
break2_ewindow_practical_coverage: float = break2_ewindow_count / len(break2_ewindow)
Not the actual coverage as bins are sampled within and there is a read limit cutoff
raw_flanking_pairs
type: int
Number of flanking reads before calling the breakpoint. The count here is based on the number of unique query names
raw_spanning_reads
type: int
Number of spanning reads collected during evidence collection before calling the breakpoint
raw_break1_split_reads
type: int
Number of split reads before calling the breakpoint
raw_break2_split_reads
type: int
Number of split reads before calling the breakpoint
cdna_synon
semi-colon delimited list of transcript ids which have an identical cdna sequence to the cdna sequence of the current fusion product
protein_synon
semi-colon delimited list of transcript ids which produce a translation with an identical amino-acid sequence to the current fusion product
tracking_id
column used to store input identifiers from the original SV calls. Used to track calls from the input files to the final outputs.
fusion_protein_hgvs
type: str
Describes the fusion protein in HGVS notation. Will be None if the change is not an indel or is synonymous
net_size
type: int-int
The net size of an event. For translocations and inversion this will always be 0. For indels it will be negative for deletions and positive for insertions. It is a range to accommodate non-specific events.
supplementary_call
type: bool
Flag to indicate if the current event was a supplementary call, meaning a call that was found as a result of validating another event.
dgv
type: str
ID(s) of SVs from dgv database matched to a SV call from the summary step
known_sv_count
type: int
Number of known SVs matched to a call in the summary step