Getting Started
An exhaustive list of the various configurable settings can be found here. Alternatively you can view them through the online schema explorer
Pipeline Configuration File
The pipeline can be run in steps or it can be configured using a JSON configuration file and setup in a single step. Scripts will be generated to run all steps following clustering.
The config schema is found in the mavis package under src/mavis/schemas/config.json
Top level settings follow the pattern <section>.<setting>
. The convert and library
sections are nested objects.
Adjusting the Resource Requirements
Choosing the Number of Validation/Annotation Jobs
MAVIS chooses the number of jobs to split validate/annotate stages into based on two settings: cluster.max_files and cluster.min_clusters_per_file.
For example, in the following situation say you have: 1000 clusters,
cluster.max_files=10
, and cluster.min_clusters_per_file=10
. Then MAVIS will set up
10 validation jobs each with 100 events.
However, if cluster.min_clusters_per_file=500
, then MAVIS would only set up 2
jobs each with 500 events. This is because
cluster.min_clusters_per_file takes precedence
over custer.max_files.
Splitting into more jobs will lower the resource requirements per job (see resource requirements). The memory and time requirements for validation are linear with respect to the number of events to be validated.
Uninformative Filter
For example, if the user is only interested in events in genes, then the cluster.uninformative_filter can be used. This will drop all events that are not within a certain distance (cluster.max_proximity) to any annotation in the annotations reference file. These events will be dropped prior to the validation stage which results in significant speed up.