mavis.annotate.protein

class mavis.annotate.protein.DomainRegion

inherits BioInterval

class mavis.annotate.protein.Domain

mavis.annotate.protein.Domain.translation()

mavis.annotate.Translation: the Translation this domain belongs to

@property
def translation(self):

Args

self

mavis.annotate.protein.Domain.key()

Tuple: a tuple representing the items expected to be unique. for hashing and comparing

def key(self):

mavis.annotate.protein.Domain.score_region_mapping()

compares the sequence in each DomainRegion to the sequence collected for that domain region from the translation object

def score_region_mapping(self, reference_genome=None):

Args

reference_genome (Dict[str,Bio.SeqRecord]): dict of reference sequence

Returns

tuple of int and int: tuple contains - int: the number of matching amino acids - int: the total number of amino acids

mavis.annotate.protein.Domain.get_seqs()

returns the amino acid sequences for each of the domain regions associated with this domain in the order of the regions (sorted by start)

def get_seqs(self, reference_genome=None, ignore_cache=False):

Args

reference_genome (Dict[str,Bio.SeqRecord]): dict of reference sequence
ignore_cache

Returns

List[str]: list of amino acid sequences for each DomainRegion

Raises

AttributeError: if there is not enough sequence information given to determine this

mavis.annotate.protein.Domain.align_seq()

align each region to the input sequence starting with the last one. then take the subset of sequence that remains to align the second last and so on return a list of intervals for the alignment. If multiple alignments are found, then raise an error

def align_seq(self, input_sequence, reference_genome=None, min_region_match=0.5):

Args

input_sequence (str): the sequence to be aligned to
reference_genome (Dict[str,Bio.SeqRecord]): dict of reference sequence
min_region_match (float): percent between 0 and 1. Each region must have a score len(seq) * min_region_match

Returns

Tuple[int,int,List[DomainRegion]]: - the number of matches - the total number of amino acids to be aligned - the list of domain regions on the new input sequence

Raises

AttributeError: if sequence information is not available
UserWarning: if a valid alignment could not be found or no best alignment was found

class mavis.annotate.protein.Translation

inherits BioInterval

mavis.annotate.protein.Translation.init()

describes the splicing pattern and cds start and end with reference to a particular transcript

def __init__(self, start, end, transcript=None, domains=None, seq=None, name=None):

Args

start (int): start of the coding sequence (cds) relative to the start of the first exon in the transcript
end (int): end of the coding sequence (cds) relative to the start of the first exon in the transcript
transcript (Transcript): the transcript this is a Translation of
domains (List[Domain]): a list of the domains on this translation
seq
name

mavis.annotate.protein.Translation.transcript()

mavis.annotate.genomic.Transcript: the spliced transcript this translation belongs to

@property
def transcript(self):

Args

self

mavis.annotate.protein.Translation.convert_genomic_to_cds()

converts a genomic position to its cds (coding sequence) equivalent

def convert_genomic_to_cds(self, pos):

Args

pos (int): the genomic position

Returns

int: the cds position (negative if before the initiation start site)

mavis.annotate.protein.Translation.convert_genomic_to_nearest_cds()

converts a genomic position to its cds equivalent or (if intronic) the nearest cds and shift

def convert_genomic_to_nearest_cds(self, pos):

Args

pos (int): the genomic position

Returns

tuple of int and int: * int - the cds position * int - the intronic shift

mavis.annotate.protein.Translation.convert_genomic_to_cds_notation()

converts a genomic position to its cds (coding sequence) equivalent using hgvs <http://www.hgvs.org/mutnomen/recs-DNA.html>_ cds notation

def convert_genomic_to_cds_notation(self, pos):

Args

pos (int): the genomic position

Returns

str: the cds position notation

Examples

>>> tl = Translation(...)
# a position before the translation start
>>> tl.convert_genomic_to_cds_notation(1010)
'-50'
# a position after the translation end
>>> tl.convert_genomic_to_cds_notation(2031)
'*72'
# an intronic position
>>> tl.convert_genomic_to_cds_notation(1542)
'50+10'
>>> tl.convert_genomic_to_cds_notation(1589)
'51-14'

mavis.annotate.protein.Translation.get_seq()

wrapper for the sequence method

def get_seq(self, reference_genome=None, ignore_cache=False):

Args

reference_genome (Dict[str,Bio.SeqRecord]): dict of reference sequence
ignore_cache

mavis.annotate.protein.Translation.key()

see :func:structural_variant.annotate.base.BioInterval.key

def key(self):

mavis.annotate.protein.calculate_orf()

calculate all possible open reading frames given a spliced cdna sequence (no introns)

def calculate_orf(spliced_cdna_sequence, min_orf_size=None):

Args

spliced_cdna_sequence (str): the sequence
min_orf_size

Returns

List[Interval]: list of open reading frame positions on the input sequence

mavis.annotate.protein

class mavis.annotate.protein.DomainRegion

class mavis.annotate.protein.Domain

mavis.annotate.protein.Domain.translation()

mavis.annotate.protein.Domain.key()

mavis.annotate.protein.Domain.score_region_mapping()

mavis.annotate.protein.Domain.get_seqs()

mavis.annotate.protein.Domain.align_seq()

class mavis.annotate.protein.Translation

mavis.annotate.protein.Translation.__init__()

mavis.annotate.protein.Translation.transcript()

mavis.annotate.protein.Translation.convert_genomic_to_cds()

mavis.annotate.protein.Translation.convert_genomic_to_nearest_cds()

mavis.annotate.protein.Translation.convert_genomic_to_cds_notation()

mavis.annotate.protein.Translation.get_seq()

mavis.annotate.protein.Translation.key()

mavis.annotate.protein.calculate_orf()

mavis.annotate.protein.Translation.init()