Skip to content

mavis/annotate/protein

class DomainRegion

inherits BioInterval

class Domain

Domain.translation()

the Translation this domain belongs to

@property
def translation(self) -> Optional['Translation']:

Args

  • self

Returns

Domain.key()

Tuple: a tuple representing the items expected to be unique. for hashing and comparing

def key(self):

Domain.score_region_mapping()

compares the sequence in each DomainRegion to the sequence collected for that domain region from the translation object

def score_region_mapping(
    self, reference_genome: Optional[ReferenceGenome] = None
) -> Tuple[int, int]:

Args

  • reference_genome (Optional[ReferenceGenome]): dict of reference sequence by template/chr name

Returns

  • Tuple[int, int]: the number of matching amino acids - int: the total number of amino acids

Domain.get_seqs()

returns the amino acid sequences for each of the domain regions associated with this domain in the order of the regions (sorted by start)

def get_seqs(
    self, reference_genome: ReferenceGenome = None, ignore_cache: bool = False
) -> List[str]:

Args

  • reference_genome (ReferenceGenome): dict of reference sequence by template/chr name
  • ignore_cache (bool)

Returns

  • List[str]: list of amino acid sequences for each DomainRegion

Raises

  • AttributeError: if there is not enough sequence information given to determine this

Domain.align_seq()

align each region to the input sequence starting with the last one. then take the subset of sequence that remains to align the second last and so on return a list of intervals for the alignment. If multiple alignments are found, then raise an error

def align_seq(
    self,
    input_sequence: str,
    reference_genome: Optional[ReferenceGenome] = None,
    min_region_match: float = 0.5,
) -> Tuple[int, int, List[DomainRegion]]:

Args

  • input_sequence (str): the sequence to be aligned to
  • reference_genome (Optional[ReferenceGenome]): dict of reference sequence by template/chr name
  • min_region_match (float): percent between 0 and 1. Each region must have a score len(seq) * min_region_match

Returns

  • Tuple[int, int, List[DomainRegion]]: - the number of matches - the total number of amino acids to be aligned - the list of domain regions on the new input sequence

Raises

  • AttributeError: if sequence information is not available
  • UserWarning: if a valid alignment could not be found or no best alignment was found

class Translation

inherits BioInterval

Translation.__init__()

describes the splicing pattern and cds start and end with reference to a particular transcript

def __init__(
    self,
    start: int,
    end: int,
    transcript: Optional['Transcript'] = None,
    domains: Optional[List[Domain]] = None,
    seq: Optional[str] = None,
    name: Optional[str] = None,
):

Args

  • start (int): start of the coding sequence (cds) relative to the start of the first exon in the transcript
  • end (int): end of the coding sequence (cds) relative to the start of the first exon in the transcript
  • transcript (Optional[Transcript]): the transcript this is a Translation of
  • domains (Optional[List[Domain]]): a list of the domains on this translation
  • seq (Optional[str])
  • name (Optional[str])

Translation.transcript()

the spliced transcript this translation belongs to

@property
def transcript(self) -> 'Transcript':

Args

  • self

Returns

Translation.convert_genomic_to_cds()

converts a genomic position to its cds (coding sequence) equivalent

def convert_genomic_to_cds(self, pos: int) -> int:

Args

  • pos (int): the genomic position

Returns

  • int: the cds position (negative if before the initiation start site)

Translation.convert_genomic_to_nearest_cds()

converts a genomic position to its cds equivalent or (if intronic) the nearest cds and shift

def convert_genomic_to_nearest_cds(self, pos: int) -> Tuple[int, int]:

Args

  • pos (int): the genomic position

Returns

  • Tuple[int, int]: - the cds position - the intronic shift

Translation.convert_genomic_to_cds_notation()

converts a genomic position to its cds (coding sequence) equivalent using hgvs <http://www.hgvs.org/mutnomen/recs-DNA.html>_ cds notation

def convert_genomic_to_cds_notation(self, pos: int) -> str:

Args

  • pos (int): the genomic position

Returns

  • str: the cds position notation

Examples

>>> tl = Translation(...)
# a position before the translation start
>>> tl.convert_genomic_to_cds_notation(1010)
'-50'
# a position after the translation end
>>> tl.convert_genomic_to_cds_notation(2031)
'*72'
# an intronic position
>>> tl.convert_genomic_to_cds_notation(1542)
'50+10'
>>> tl.convert_genomic_to_cds_notation(1589)
'51-14'

Translation.get_seq()

wrapper for the sequence method

def get_seq(
    self, reference_genome: Optional[ReferenceGenome] = None, ignore_cache: bool = False
):

Args

  • reference_genome (Optional[ReferenceGenome]): dict of reference sequence by template/chr name
  • ignore_cache (bool)

Translation.key()

see :func:structural_variant.annotate.base.BioInterval.key

def key(self):

calculate_orf()

calculate all possible open reading frames given a spliced cdna sequence (no introns)

def calculate_orf(
    spliced_cdna_sequence: str, min_orf_size: Optional[Union[float, int]] = None
) -> List[Interval]:

Args

  • spliced_cdna_sequence (str): the sequence
  • min_orf_size (Optional[Union[float, int]])

Returns

  • List[Interval]: list of open reading frame positions on the input sequence