assemble module¶
-
class
mavis.assemble.
Contig
(sequence, score)[source]¶ Bases:
object
-
class
mavis.assemble.
DeBruijnGraph
(data=None, **attr)[source]¶ Bases:
networkx.classes.digraph.DiGraph
wrapper for a basic digraph enforces edge weights
Initialize a graph with edges, name, graph attributes.
Parameters: - data (input graph) – Data to initialize graph. If data=None (default) an empty graph is created. The data can be an edge list, or any NetworkX graph object. If the corresponding optional Python packages are installed the data can also be a NumPy matrix or 2d ndarray, a SciPy sparse matrix, or a PyGraphviz graph.
- name (string, optional (default='')) – An optional name for the graph.
- attr (keyword arguments, optional (default= no attributes)) – Attributes to add to graph as key=value pairs.
See also
convert
Examples
>>> G = nx.Graph() # or DiGraph, MultiGraph, MultiDiGraph, etc >>> G = nx.Graph(name='my graph') >>> e = [(1,2),(2,3),(3,4)] # list of edges >>> G = nx.Graph(e)
Arbitrary graph attribute pairs (key=value) may be assigned
>>> G=nx.Graph(e, day="Friday") >>> G.graph {'day': 'Friday'}
-
add_edge
(n1, n2, freq=1)[source]¶ add a given edge to the graph, if it exists add the frequency to the existing frequency count
-
trim_forks_by_freq
(min_weight)[source]¶ for all nodes in the graph, if the node has an out-degree > 1 and one of the outgoing edges has freq < min_weight. then that outgoing edge is deleted
-
mavis.assemble.
assemble
(sequences, kmer_size, min_edge_trim_weight=3, assembly_max_paths=20, assembly_min_uniq=0.01, min_complexity=0, log=<function <lambda>>, **kwargs)[source]¶ for a set of sequences creates a DeBruijnGraph simplifies trailing and leading paths where edges fall below a weight threshold and the return all possible unitigs/contigs
drops any sequences too small to fit the kmer size
Parameters: - sequences (
list
ofstr
) – a list of strings/sequences to assemble - kmer_size – see assembly_kmer_size the size of the kmer to use
- min_edge_trim_weight – see assembly_min_edge_trim_weight
- remap_min_match – Minimum match percentage of the remapped read (based on the exact matches in the cigar)
- remap_min_overlap – defaults to the kmer size. Minimum amount of overlap between the contig and the remapped read
- min_contig_length – Minimum length of contigs assemble to attempt remapping reads to. Shorter contigs will be ignored
- remap_min_exact_match – see assembly_min_exact_match_to_remap
- assembly_max_paths – see assembly_max_paths
- log (function) – the log function
Returns: a list of putative contigs
Return type: - sequences (
-
mavis.assemble.
digraph_connected_components
(graph, subgraph=None)[source]¶ the networkx module does not support deriving connected components from digraphs (only simple graphs) this function assumes that connection != reachable this means there is no difference between connected components in a simple graph and a digraph
Parameters: graph (networkx.DiGraph) – the input graph to gather components from Returns: returns a list of compnents which are lists of node names Return type: list
oflist
-
mavis.assemble.
filter_contigs
(contigs, assembly_min_uniq=0.01)[source]¶ given a list of contigs, removes similar contigs to leave the highest (of the similar) scoring contig only
-
mavis.assemble.
kmers
(s, size)[source]¶ for a sequence, compute and return a list of all kmers of a specified size
Parameters: Returns: the list of kmers
Return type: Example
>>> kmers('abcdef', 2) ['ab', 'bc', 'cd', 'de', 'ef']
-
mavis.assemble.
pull_contigs_from_component
(assembly, component, min_edge_trim_weight, assembly_max_paths, log=<mavis.util.Log object>)[source]¶ builds contigs from the a connected component of the assembly DeBruijn graph
Parameters: - assembly (DeBruijnGraph) – the assembly graph
- component (list) – list of nodes which make up the connected component
- min_edge_trim_weight (int) – the minimum weight to not remove a non cutting edge/path
- assembly_max_paths (int) – the maximum number of paths allowed before the graph is further simplified
- log (function) – the log function
Returns: the paths/contigs and their scores
Return type: