Skip to content

mavis/validate/assemble

class Contig

Contig.remap_depth()

the average depth of remapped reads over a give range of the contig sequence

def remap_depth(self, query_range: Optional[Interval] = None):

Args

  • query_range (Optional[Interval]): 1-based inclusive range

class DeBruijnGraph

inherits nx.DiGraph

wrapper for a basic digraph enforces edge weights

DeBruijnGraph.get_edge_freq()

returns the freq from the data attribute for a specified edge

def get_edge_freq(self, n1, n2):

Args

  • n1
  • n2

DeBruijnGraph.add_edge()

add a given edge to the graph, if it exists add the frequency to the existing frequency count

def add_edge(self, n1, n2, freq=1):

Args

  • n1
  • n2
  • freq

DeBruijnGraph.trim_tails_by_freq()

for any paths where all edges are lower than the minimum weight trim

def trim_tails_by_freq(self, min_weight: int):

Args

  • min_weight (int): the minimum weight for an edge to be retained

DeBruijnGraph.trim_forks_by_freq()

for all nodes in the graph, if the node has an out-degree > 1 and one of the outgoing edges has freq < min_weight. then that outgoing edge is deleted

def trim_forks_by_freq(self, min_weight):

Args

  • min_weight

DeBruijnGraph.trim_noncutting_paths_by_freq()

trim any low weight edges where another path exists between the source and target of higher weight

def trim_noncutting_paths_by_freq(self, min_weight):

Args

  • min_weight

DeBruijnGraph.get_sinks()

returns all nodes with an outgoing degree of zero

def get_sinks(self, subgraph=None):

Args

  • subgraph

DeBruijnGraph.get_sources()

returns all nodes with an incoming degree of zero

def get_sources(self, subgraph=None):

Args

  • subgraph

digraph_connected_components()

the networkx module does not support deriving connected components from digraphs (only simple graphs) this function assumes that connection != reachable this means there is no difference between connected components in a simple graph and a digraph

def digraph_connected_components(graph: nx.DiGraph, subgraph=None) -> List[List]:

Args

  • graph (nx.DiGraph): the input graph to gather components from
  • subgraph

Returns

  • List[List]: returns a list of compnents which are lists of node names

pull_contigs_from_component()

builds contigs from the a connected component of the assembly DeBruijn graph

def pull_contigs_from_component(
    assembly: DeBruijnGraph, component: List, min_edge_trim_weight: int, assembly_max_paths: int
):

Args

  • assembly (DeBruijnGraph): the assembly graph
  • component (List): list of nodes which make up the connected component
  • min_edge_trim_weight (int): the minimum weight to not remove a non cutting edge/path
  • assembly_max_paths (int): the maximum number of paths allowed before the graph is further simplified

Returns

  • Dict[str,int]: the paths/contigs and their scores

filter_contigs()

given a list of contigs, removes similar contigs to leave the highest (of the similar) scoring contig only

def filter_contigs(contigs, assembly_min_uniq: float = 0.01):

Args

  • contigs
  • assembly_min_uniq (float)

assemble()

for a set of sequences creates a DeBruijnGraph simplifies trailing and leading paths where edges fall below a weight threshold and the return all possible unitigs/contigs

drops any sequences too small to fit the kmer size

def assemble(
    sequences: List[str],
    kmer_size: float,
    min_edge_trim_weight: int = 3,
    assembly_max_paths: int = 20,
    assembly_min_uniq: float = 0.01,
    min_complexity: float = 0,
    remap_min_exact_match: int = 6,
    **kwargs,
) -> List[Contig]:

Args

Returns

  • List[Contig]: a list of putative contigs

kmers()

for a sequence, compute and return a list of all kmers of a specified size

def kmers(s: str, size: int) -> List[str]:

Args

  • s (str): the input sequence
  • size (int): the size of the kmers

Returns

  • List[str]: the list of kmers

Examples

>>> kmers('abcdef', 2)
['ab', 'bc', 'cd', 'de', 'ef']