Skip to content

Sub-package Documentation

The cluster sub-package is responsible for merging variants coming from different inputs (i.e. different tools).

Types of Output Files

expected name/suffix file type/format content text/tabbed
uninformative_clusters.txt text list of cluster ids that were dropped by annotation proximity filter
clusters.bed bed cluster positions
cluster-*.tab text/tabbed computed clusters

Algorithm Overview

  • Collapse any duplicate breakpoint pairs
  • Split breakpoint pairs by type
  • Cluster breakpoint pairs by distance (within a type)

    • Create a graph representation of the distances between pairs
    • Find cliques up to a given input size (cluster_clique_size)
    • Hierarchically cluster the cliques (allows redundant participation)
    • For each input node/pair pick the best cluster(s)
  • Output the clusters and the mapping to the input pairs