mavis/annotate/protein
class DomainRegion
inherits BioInterval
class Domain
Domain.translation()
the Translation this domain belongs to
@property
def translation(self) -> Optional['Translation']:
Args
- self
Returns
- Optional[Translation]
Domain.key()
Tuple: a tuple representing the items expected to be unique. for hashing and comparing
def key(self):
Domain.score_region_mapping()
compares the sequence in each DomainRegion to the sequence collected for that domain region from the translation object
def score_region_mapping(
self, reference_genome: Optional[ReferenceGenome] = None
) -> Tuple[int, int]:
Args
- reference_genome (Optional[ReferenceGenome]): dict of reference sequence by template/chr name
Returns
Tuple[int, int]
: the number of matching amino acids - int: the total number of amino acids
Domain.get_seqs()
returns the amino acid sequences for each of the domain regions associated with this domain in the order of the regions (sorted by start)
def get_seqs(
self, reference_genome: ReferenceGenome = None, ignore_cache: bool = False
) -> List[str]:
Args
- reference_genome (ReferenceGenome): dict of reference sequence by template/chr name
- ignore_cache (
bool
)
Returns
List[str]
: list of amino acid sequences for each DomainRegion
Raises
AttributeError
: if there is not enough sequence information given to determine this
Domain.align_seq()
align each region to the input sequence starting with the last one. then take the subset of sequence that remains to align the second last and so on return a list of intervals for the alignment. If multiple alignments are found, then raise an error
def align_seq(
self,
input_sequence: str,
reference_genome: Optional[ReferenceGenome] = None,
min_region_match: float = 0.5,
) -> Tuple[int, int, List[DomainRegion]]:
Args
- input_sequence (
str
): the sequence to be aligned to - reference_genome (Optional[ReferenceGenome]): dict of reference sequence by template/chr name
- min_region_match (
float
): percent between 0 and 1. Each region must have a score len(seq) * min_region_match
Returns
- Tuple[
int
,int
, List[DomainRegion]]: - the number of matches - the total number of amino acids to be aligned - the list of domain regions on the new input sequence
Raises
AttributeError
: if sequence information is not availableUserWarning
: if a valid alignment could not be found or no best alignment was found
class Translation
inherits BioInterval
Translation.__init__()
describes the splicing pattern and cds start and end with reference to a particular transcript
def __init__(
self,
start: int,
end: int,
transcript: Optional['Transcript'] = None,
domains: Optional[List[Domain]] = None,
seq: Optional[str] = None,
name: Optional[str] = None,
):
Args
- start (
int
): start of the coding sequence (cds) relative to the start of the first exon in the transcript - end (
int
): end of the coding sequence (cds) relative to the start of the first exon in the transcript - transcript (Optional[Transcript]): the transcript this is a Translation of
- domains (Optional[List[Domain]]): a list of the domains on this translation
- seq (
Optional[str]
) - name (
Optional[str]
)
Translation.transcript()
the spliced transcript this translation belongs to
@property
def transcript(self) -> 'Transcript':
Args
- self
Returns
Translation.convert_genomic_to_cds()
converts a genomic position to its cds (coding sequence) equivalent
def convert_genomic_to_cds(self, pos: int) -> int:
Args
- pos (
int
): the genomic position
Returns
int
: the cds position (negative if before the initiation start site)
Translation.convert_genomic_to_nearest_cds()
converts a genomic position to its cds equivalent or (if intronic) the nearest cds and shift
def convert_genomic_to_nearest_cds(self, pos: int) -> Tuple[int, int]:
Args
- pos (
int
): the genomic position
Returns
Tuple[int, int]
: - the cds position - the intronic shift
Translation.convert_genomic_to_cds_notation()
converts a genomic position to its cds (coding sequence) equivalent using
hgvs <http://www.hgvs.org/mutnomen/recs-DNA.html>
_ cds notation
def convert_genomic_to_cds_notation(self, pos: int) -> str:
Args
- pos (
int
): the genomic position
Returns
str
: the cds position notation
Examples
>>> tl = Translation(...)
# a position before the translation start
>>> tl.convert_genomic_to_cds_notation(1010)
'-50'
# a position after the translation end
>>> tl.convert_genomic_to_cds_notation(2031)
'*72'
# an intronic position
>>> tl.convert_genomic_to_cds_notation(1542)
'50+10'
>>> tl.convert_genomic_to_cds_notation(1589)
'51-14'
Translation.get_seq()
wrapper for the sequence method
def get_seq(
self, reference_genome: Optional[ReferenceGenome] = None, ignore_cache: bool = False
):
Args
- reference_genome (Optional[ReferenceGenome]): dict of reference sequence by template/chr name
- ignore_cache (
bool
)
Translation.key()
see :func:structural_variant.annotate.base.BioInterval.key
def key(self):
calculate_orf()
calculate all possible open reading frames given a spliced cdna sequence (no introns)
def calculate_orf(
spliced_cdna_sequence: str, min_orf_size: Optional[Union[float, int]] = None
) -> List[Interval]:
Args
- spliced_cdna_sequence (
str
): the sequence - min_orf_size (
Optional[Union[float, int]]
)
Returns
- List[Interval]: list of open reading frame positions on the input sequence