Column Names

List of column names and their definitions. The types indicated here are the expected types in a row for a given column name.

library

Identifier for the library/source

cluster_id

Identifier for the merging/clustering step

cluster_size

type: int

The number of breakpoint pair calls that were grouped in creating the cluster

validation_id

Identifier for the validation step

annotation_id

Identifier for the annotation step

product_id

Unique identifier of the final fusion including splicing and ORF decision from the annotation step

event_type

type: mavis.constants.SVTYPE

The classification of the event

inferred_pairing

A semi colon delimited of event identifiers i.e. <annotation_id>_<splicing pattern>_<cds start>_<cds end> which were paired to the current event based on predicted products

pairing

A semi colon delimited of event identifiers i.e. <annotation_id>_<splicing pattern>_<cds start>_<cds end> which were paired to the current event based on breakpoint positions

gene1

Gene for the current annotation at the first breakpoint

gene1_direction

type: mavis.constants.PRIME

The direction/prime of the gene

gene2

Gene for the current annotation at the second breakpoint

gene2_direction

type: mavis.constants.PRIME

The direction/prime of the gene. Has the following possible values

gene1_aliases

Other gene names associated with the current annotation at the first breakpoint

gene2_aliases

Other gene names associated with the current annotation at the second breakpoint

gene_product_type

type: mavis.constants.GENE_PRODUCT_TYPE

Describes if the putative fusion product will be sense or anti-sense

transcript1

Transcript for the current annotation at the first breakpoint

transcript2

Transcript for the current annotation at the second breakpoint

fusion_splicing_pattern

type: mavis.constants.SPLICE_TYPE

Type of splicing pattern used to create the fusion cDNA.

fusion_cdna_coding_start

type: int

Position wrt the 5' end of the fusion transcript where coding begins first base of the Met amino acid.

fusion_cdna_coding_end

type: int

Position wrt the 5' end of the fusion transcript where coding ends last base of the stop codon

fusion_mapped_domains

type: JSON

List of domains in JSON format where each domain start and end positions are given wrt to the fusion transcript and the mapping quality is the number of matching amino acid positions over the total number of amino acids. The sequence is the amino acid sequence of the domain on the reference/original transcript

fusion_sequence_fasta_id

The sequence identifier for the cdna sequence output fasta file

fusion_sequence_fasta_file

type: FILEPATH

Path to the corresponding fasta output file

annotation_figure

type: FILEPATH

File path to the svg drawing representing the annotation

annotation_figure_legend

type: JSON

JSON data for the figure legend

genes_encompassed

Applies to intrachromosomal events only. List of genes which overlap any region that occurs between both breakpoints. For example in a deletion event these would be deleted genes.

genes_overlapping_break1

list of genes which overlap the first breakpoint

genes_overlapping_break2

list of genes which overlap the second breakpoint

genes_proximal_to_break1

list of genes near the breakpoint and the distance away from the breakpoint

genes_proximal_to_break2

list of genes near the breakpoint and the distance away from the breakpoint

break1_chromosome

type: str

The name of the chromosome on which breakpoint 1 is situated

break1_position_start

type: int

Start integer inclusive 1-based of the range representing breakpoint 1

break1_position_end

type: int

End integer inclusive 1-based of the range representing breakpoint 1

break1_orientation

type: mavis.constants.ORIENT

The side of the breakpoint wrt the positive/forward strand that is retained.

break1_strand

type: mavis.constants.STRAND

The strand wrt to the reference positive/forward strand at this breakpoint.

break1_seq

type: str

The sequence up to and including the breakpoint. Always given wrt to the positive/forward strand

break2_chromosome

The name of the chromosome on which breakpoint 2 is situated

break2_position_start

type: int

Start integer inclusive 1-based of the range representing breakpoint 2

break2_position_end

type: int

End integer inclusive 1-based of the range representing breakpoint 2

break2_orientation

type: mavis.constants.ORIENT

The side of the breakpoint wrt the positive/forward strand that is retained.

break2_strand

type: mavis.constants.STRAND

The strand wrt to the reference positive/forward strand at this breakpoint.

break2_seq

type: str

The sequence up to and including the breakpoint. Always given wrt to the positive/forward strand

opposing_strands

type: bool

Specifies if breakpoints are on opposite strands wrt to the reference. Expects a boolean

stranded

type: bool

Specifies if the sequencing protocol was strand specific or not. Expects a boolean

protocol

type: mavis.constants.PROTOCOL

Specifies the type of library

tools

The tools that called the event originally from the cluster step. Should be a semi-colon delimited list of <tool name>_<tool version>

contigs_assembled

type: int

Number of contigs that were built from split read sequences

contigs_aligned

type: int

Number of contigs that were able to align

contig_alignment_query_name

The query name for the contig alignment. Should match the 'read' name(s) in the .contigs.bam output file

contig_seq

type: str

Sequence of the current contig wrt to the positive forward strand if not strand specific

contig_remap_score

type: float

Score representing the number of sequences from the set of sequences given to the assembly algorithm that were aligned to the resulting contig with an acceptable scoring based on user-set thresholds. For any sequence its contribution to the score is divided by the number of mappings to give less weight to multimaps

call_sequence_complexity

type: float

The minimum amount any two bases account for of the proportion of call sequence. An average for non-contig calls

contig_remapped_reads

type: int

the number of reads from the input bam that map to the assembled contig

contig_remapped_read_names

read query names for the reads that were remapped. A -1 or -2 has been appended to the end of the name to indicate if this is the first or second read in the pair

contig_alignment_score

type: float

A rank based on the alignment tool blat etc. of the alignment being used. An average if split alignments were used. Lower numbers indicate a better alignment. If it was the best alignment possible then this would be zero.

contig_alignment_reference_start

The reference start(s) <chr>:<position> of the contig alignment. Semi-colon delimited

contig_alignment_cigar

The cigar string(s) representing the contig alignment. Semi-colon delimited

contig_remap_coverage

type: float

Fraction of the contig sequence which is covered by the remapped reads

contig_build_score

type: int

Score representing the edge weights of all edges used in building the sequence

contig_strand_specific

type: bool

A flag to indicate if it was possible to resolve the strand for this contig

spanning_reads

type: int

the number of spanning reads which support the event

spanning_read_names

read query names of the spanning reads which support the current event

call_method

type: mavis.constants.CALL_METHOD

The method used to call the breakpoints

flanking_pairs

type: int

Number of read-pairs where one read aligns to the first breakpoint window and the second read aligns to the other. The count here is based on the number of unique query names

flanking_pairs_compatible

type: int

Number of flanking pairs of a compatible orientation type. This applies to insertions and duplications. Flanking pairs supporting an insertion will be compatible to a duplication and flanking pairs supporting a duplication will be compatible to an insertion (possibly indicating an internal translocation)

flanking_median_fragment_size

type: int

The median fragment size of the flanking reads being used as evidence

flanking_stdev_fragment_size

type: float

The standard deviation in fragment size of the flanking reads being used as evidence

break1_split_reads

type: int

Number of split reads that call the exact breakpoint given

break1_split_reads_forced

type: int

Number of split reads which were aligned to the opposite breakpoint window using a targeted alignment

break2_split_reads

type: int

Number of split reads that call the exact breakpoint given

break2_split_reads_forced

type: int

Number of split reads which were aligned to the opposite breakpoint window using a targeted alignment

linking_split_reads

type: int

Number of split reads that align to both breakpoints

untemplated_seq

type: str

The untemplated/novel sequence between the breakpoints

break1_homologous_seq

type: str

Sequence in common at the first breakpoint and other side of the second breakpoint

break2_homologous_seq

type: str

Sequence in common at the second breakpoint and other side of the first breakpoint

break1_ewindow

type: int-int

Window where evidence was gathered for the first breakpoint

break1_ewindow_count

type: int

Number of reads processed/looked-at in the first evidence window

break1_ewindow_practical_coverage

break1_ewindow_practical_coverage: float = break1_ewindow_count / len(break1_ewindow)

Not the actual coverage as bins are sampled within and there is a read limit cutoff

break2_ewindow

type: int-int

Window where evidence was gathered for the second breakpoint

break2_ewindow_count

type: int

Number of reads processed/looked-at in the second evidence window

break2_ewindow_practical_coverage

break2_ewindow_practical_coverage: float = break2_ewindow_count / len(break2_ewindow)

Not the actual coverage as bins are sampled within and there is a read limit cutoff

raw_flanking_pairs

type: int

Number of flanking reads before calling the breakpoint. The count here is based on the number of unique query names

raw_spanning_reads

type: int

Number of spanning reads collected during evidence collection before calling the breakpoint

raw_break1_split_reads

type: int

Number of split reads before calling the breakpoint

raw_break2_split_reads

type: int

Number of split reads before calling the breakpoint

cdna_synon

semi-colon delimited list of transcript ids which have an identical cdna sequence to the cdna sequence of the current fusion product

protein_synon

semi-colon delimited list of transcript ids which produce a translation with an identical amino-acid sequence to the current fusion product

tracking_id

column used to store input identifiers from the original SV calls. Used to track calls from the input files to the final outputs.

fusion_protein_hgvs

type: str

Describes the fusion protein in HGVS notation. Will be None if the change is not an indel or is synonymous

net_size

type: int-int

The net size of an event. For translocations and inversion this will always be 0. For indels it will be negative for deletions and positive for insertions. It is a range to accommodate non-specific events.

supplementary_call

type: bool

Flag to indicate if the current event was a supplementary call, meaning a call that was found as a result of validating another event.

dgv

type: str

ID(s) of SVs from dgv database matched to a SV call from the summary step

known_sv_count

type: int

Number of known SVs matched to a call in the summary step