mavis.assemble
class mavis.assemble.Contig
mavis.assemble.Contig.remap_depth()
the average depth of remapped reads over a give range of the contig sequence
def remap_depth(self, query_range=None):
Args
- query_range (
Interval
): 1-based inclusive range
class mavis.assemble.DeBruijnGraph
inherits nx.DiGraph
wrapper for a basic digraph enforces edge weights
mavis.assemble.DeBruijnGraph.get_edge_freq()
returns the freq from the data attribute for a specified edge
def get_edge_freq(self, n1, n2):
Args
- n1
- n2
mavis.assemble.DeBruijnGraph.add_edge()
add a given edge to the graph, if it exists add the frequency to the existing frequency count
def add_edge(self, n1, n2, freq=1):
Args
- n1
- n2
- freq
mavis.assemble.DeBruijnGraph.trim_tails_by_freq()
for any paths where all edges are lower than the minimum weight trim
def trim_tails_by_freq(self, min_weight):
Args
- min_weight (
int
): the minimum weight for an edge to be retained
mavis.assemble.DeBruijnGraph.trim_forks_by_freq()
for all nodes in the graph, if the node has an out-degree > 1 and one of the outgoing edges has freq < min_weight. then that outgoing edge is deleted
def trim_forks_by_freq(self, min_weight):
Args
- min_weight
mavis.assemble.DeBruijnGraph.trim_noncutting_paths_by_freq()
trim any low weight edges where another path exists between the source and target of higher weight
def trim_noncutting_paths_by_freq(self, min_weight):
Args
- min_weight
mavis.assemble.DeBruijnGraph.get_sinks()
returns all nodes with an outgoing degree of zero
def get_sinks(self, subgraph=None):
Args
- subgraph
mavis.assemble.DeBruijnGraph.get_sources()
returns all nodes with an incoming degree of zero
def get_sources(self, subgraph=None):
Args
- subgraph
mavis.assemble.digraph_connected_components()
the networkx module does not support deriving connected components from digraphs (only simple graphs) this function assumes that connection != reachable this means there is no difference between connected components in a simple graph and a digraph
def digraph_connected_components(graph, subgraph=None):
Args
- graph (
networkx.DiGraph
): the input graph to gather components from - subgraph
Returns
List[List]
: returns a list of compnents which are lists of node names
mavis.assemble.pull_contigs_from_component()
builds contigs from the a connected component of the assembly DeBruijn graph
def pull_contigs_from_component(
assembly, component, min_edge_trim_weight, assembly_max_paths, log=DEVNULL
):
Args
- assembly (
DeBruijnGraph
): the assembly graph - component (
list
): list of nodes which make up the connected component - min_edge_trim_weight (
int
): the minimum weight to not remove a non cutting edge/path - assembly_max_paths (
int
): the maximum number of paths allowed before the graph is further simplified - log (
Callable
): the log function
Returns
Dict[str,int]
: the paths/contigs and their scores
mavis.assemble.filter_contigs()
given a list of contigs, removes similar contigs to leave the highest (of the similar) scoring contig only
def filter_contigs(contigs, assembly_min_uniq=0.01):
Args
- contigs
- assembly_min_uniq
mavis.assemble.assemble()
for a set of sequences creates a DeBruijnGraph simplifies trailing and leading paths where edges fall below a weight threshold and the return all possible unitigs/contigs
drops any sequences too small to fit the kmer size
def assemble(
sequences,
kmer_size,
min_edge_trim_weight=3,
assembly_max_paths=20,
assembly_min_uniq=0.01,
min_complexity=0,
log=lambda *pos, **kwargs: None,
**kwargs
):
Args
- sequences (
List[str]
): a list of strings/sequences to assemble - kmer_size: see assembly_kmer_size the size of the kmer to use
- min_edge_trim_weight: see assembly_min_edge_trim_weight
- assembly_max_paths: see assembly_max_paths
- assembly_min_uniq
- min_complexity
- log (
Callable
): the log function
Returns
List[Contig]
: a list of putative contigs
mavis.assemble.kmers()
for a sequence, compute and return a list of all kmers of a specified size
def kmers(s, size):
Args
- s (
str
): the input sequence - size (
int
): the size of the kmers
Returns
List[str]
: the list of kmers
Examples
>>> kmers('abcdef', 2)
['ab', 'bc', 'cd', 'de', 'ef']