mavis/validate/assemble
class Contig
Contig.remap_depth()
the average depth of remapped reads over a give range of the contig sequence
def remap_depth(self, query_range: Optional[Interval] = None):
Args
- query_range (Optional[Interval]): 1-based inclusive range
class DeBruijnGraph
inherits nx.DiGraph
wrapper for a basic digraph enforces edge weights
DeBruijnGraph.get_edge_freq()
returns the freq from the data attribute for a specified edge
def get_edge_freq(self, n1, n2):
Args
- n1
- n2
DeBruijnGraph.add_edge()
add a given edge to the graph, if it exists add the frequency to the existing frequency count
def add_edge(self, n1, n2, freq=1):
Args
- n1
- n2
- freq
DeBruijnGraph.trim_tails_by_freq()
for any paths where all edges are lower than the minimum weight trim
def trim_tails_by_freq(self, min_weight: int):
Args
- min_weight (
int
): the minimum weight for an edge to be retained
DeBruijnGraph.trim_forks_by_freq()
for all nodes in the graph, if the node has an out-degree > 1 and one of the outgoing edges has freq < min_weight. then that outgoing edge is deleted
def trim_forks_by_freq(self, min_weight):
Args
- min_weight
DeBruijnGraph.trim_noncutting_paths_by_freq()
trim any low weight edges where another path exists between the source and target of higher weight
def trim_noncutting_paths_by_freq(self, min_weight):
Args
- min_weight
DeBruijnGraph.get_sinks()
returns all nodes with an outgoing degree of zero
def get_sinks(self, subgraph=None):
Args
- subgraph
DeBruijnGraph.get_sources()
returns all nodes with an incoming degree of zero
def get_sources(self, subgraph=None):
Args
- subgraph
digraph_connected_components()
the networkx module does not support deriving connected components from digraphs (only simple graphs) this function assumes that connection != reachable this means there is no difference between connected components in a simple graph and a digraph
def digraph_connected_components(graph: nx.DiGraph, subgraph=None) -> List[List]:
Args
- graph (
nx.DiGraph
): the input graph to gather components from - subgraph
Returns
List[List]
: returns a list of compnents which are lists of node names
pull_contigs_from_component()
builds contigs from the a connected component of the assembly DeBruijn graph
def pull_contigs_from_component(
assembly: DeBruijnGraph, component: List, min_edge_trim_weight: int, assembly_max_paths: int
):
Args
- assembly (DeBruijnGraph): the assembly graph
- component (
List
): list of nodes which make up the connected component - min_edge_trim_weight (
int
): the minimum weight to not remove a non cutting edge/path - assembly_max_paths (
int
): the maximum number of paths allowed before the graph is further simplified
Returns
Dict[str,int]
: the paths/contigs and their scores
filter_contigs()
given a list of contigs, removes similar contigs to leave the highest (of the similar) scoring contig only
def filter_contigs(contigs, assembly_min_uniq: float = 0.01):
Args
- contigs
- assembly_min_uniq (
float
)
assemble()
for a set of sequences creates a DeBruijnGraph simplifies trailing and leading paths where edges fall below a weight threshold and the return all possible unitigs/contigs
drops any sequences too small to fit the kmer size
def assemble(
sequences: List[str],
kmer_size: float,
min_edge_trim_weight: int = 3,
assembly_max_paths: int = 20,
assembly_min_uniq: float = 0.01,
min_complexity: float = 0,
remap_min_exact_match: int = 6,
**kwargs,
) -> List[Contig]:
Args
- sequences (
List[str]
): a list of strings/sequences to assemble - kmer_size (
float
): see assembly_kmer_size the size of the kmer to use - min_edge_trim_weight (
int
): see assembly_min_edge_trim_weight - assembly_max_paths (
int
): see assembly_max_paths - assembly_min_uniq (
float
) - min_complexity (
float
): see min_call_complexity - remap_min_exact_match (
int
): see assembly_min_exact_match_to_remap
Returns
- List[Contig]: a list of putative contigs
kmers()
for a sequence, compute and return a list of all kmers of a specified size
def kmers(s: str, size: int) -> List[str]:
Args
- s (
str
): the input sequence - size (
int
): the size of the kmers
Returns
List[str]
: the list of kmers
Examples
>>> kmers('abcdef', 2)
['ab', 'bc', 'cd', 'de', 'ef']