mavis.annotate.protein
class mavis.annotate.protein.DomainRegion
inherits BioInterval
class mavis.annotate.protein.Domain
mavis.annotate.protein.Domain.translation()
mavis.annotate.Translation: the Translation this domain belongs to
@property
def translation(self):
Args
- self
mavis.annotate.protein.Domain.key()
Tuple: a tuple representing the items expected to be unique. for hashing and comparing
def key(self):
mavis.annotate.protein.Domain.score_region_mapping()
compares the sequence in each DomainRegion to the sequence collected for that domain region from the translation object
def score_region_mapping(self, reference_genome=None):
Args
- reference_genome (
Dict[str,Bio.SeqRecord]
): dict of reference sequence
Returns
tuple of int and int
: tuple contains - int: the number of matching amino acids - int: the total number of amino acids
mavis.annotate.protein.Domain.get_seqs()
returns the amino acid sequences for each of the domain regions associated with this domain in the order of the regions (sorted by start)
def get_seqs(self, reference_genome=None, ignore_cache=False):
Args
- reference_genome (
Dict[str,Bio.SeqRecord]
): dict of reference sequence - ignore_cache
Returns
List[str]
: list of amino acid sequences for each DomainRegion
Raises
AttributeError
: if there is not enough sequence information given to determine this
mavis.annotate.protein.Domain.align_seq()
align each region to the input sequence starting with the last one. then take the subset of sequence that remains to align the second last and so on return a list of intervals for the alignment. If multiple alignments are found, then raise an error
def align_seq(self, input_sequence, reference_genome=None, min_region_match=0.5):
Args
- input_sequence (
str
): the sequence to be aligned to - reference_genome (
Dict[str,Bio.SeqRecord]
): dict of reference sequence - min_region_match (
float
): percent between 0 and 1. Each region must have a score len(seq) * min_region_match
Returns
Tuple[int,int,List[DomainRegion]]
: - the number of matches - the total number of amino acids to be aligned - the list of domain regions on the new input sequence
Raises
AttributeError
: if sequence information is not availableUserWarning
: if a valid alignment could not be found or no best alignment was found
class mavis.annotate.protein.Translation
inherits BioInterval
mavis.annotate.protein.Translation.__init__()
describes the splicing pattern and cds start and end with reference to a particular transcript
def __init__(self, start, end, transcript=None, domains=None, seq=None, name=None):
Args
- start (
int
): start of the coding sequence (cds) relative to the start of the first exon in the transcript - end (
int
): end of the coding sequence (cds) relative to the start of the first exon in the transcript - transcript (
Transcript
): the transcript this is a Translation of - domains (
List[Domain]
): a list of the domains on this translation - seq
- name
mavis.annotate.protein.Translation.transcript()
mavis.annotate.genomic.Transcript: the spliced transcript this translation belongs to
@property
def transcript(self):
Args
- self
mavis.annotate.protein.Translation.convert_genomic_to_cds()
converts a genomic position to its cds (coding sequence) equivalent
def convert_genomic_to_cds(self, pos):
Args
- pos (
int
): the genomic position
Returns
int
: the cds position (negative if before the initiation start site)
mavis.annotate.protein.Translation.convert_genomic_to_nearest_cds()
converts a genomic position to its cds equivalent or (if intronic) the nearest cds and shift
def convert_genomic_to_nearest_cds(self, pos):
Args
- pos (
int
): the genomic position
Returns
tuple of int and int
: * int - the cds position * int - the intronic shift
mavis.annotate.protein.Translation.convert_genomic_to_cds_notation()
converts a genomic position to its cds (coding sequence) equivalent using
hgvs <http://www.hgvs.org/mutnomen/recs-DNA.html>
_ cds notation
def convert_genomic_to_cds_notation(self, pos):
Args
- pos (
int
): the genomic position
Returns
str
: the cds position notation
Examples
>>> tl = Translation(...)
# a position before the translation start
>>> tl.convert_genomic_to_cds_notation(1010)
'-50'
# a position after the translation end
>>> tl.convert_genomic_to_cds_notation(2031)
'*72'
# an intronic position
>>> tl.convert_genomic_to_cds_notation(1542)
'50+10'
>>> tl.convert_genomic_to_cds_notation(1589)
'51-14'
mavis.annotate.protein.Translation.get_seq()
wrapper for the sequence method
def get_seq(self, reference_genome=None, ignore_cache=False):
Args
- reference_genome (
Dict[str,Bio.SeqRecord]
): dict of reference sequence - ignore_cache
mavis.annotate.protein.Translation.key()
see :func:structural_variant.annotate.base.BioInterval.key
def key(self):
mavis.annotate.protein.calculate_orf()
calculate all possible open reading frames given a spliced cdna sequence (no introns)
def calculate_orf(spliced_cdna_sequence, min_orf_size=None):
Args
- spliced_cdna_sequence (
str
): the sequence - min_orf_size
Returns
List[Interval]
: list of open reading frame positions on the input sequence