Guidelines for Contributors
Getting Started
If you are new to the project a good way to get started is by adding to the documentation, or adding unit tests where there is a lack of code coverage.
Install (for Development)
Clone the repository and switch to the development branch
git clone https://github.com/bcgsc/mavis.git
cd mavis
git checkout develop
Set up a python virtual environment. If you are developing in python setting up with a virtual environment can be incredibly helpful as it allows for a clean install to test. Instructions for setting up the environment are below
python3 -m venv venv
source venv/bin/activate
Install the MAVIS python package. Running the setup in develop mode will ensure that your code changes are run when you run MAVIS from within that virtual environment
pip install -e .[dev]
Run the tests and compute code coverage
pytest tests
Build the Documentation
pip install .[docs]
markdown_refdocs mavis -o docs/package --link
mkdocs build
The contents of the user manual can then be viewed by opening the build-docs/index.html in any available web browser (i.e. google-chrome, firefox, etc.)
Deploy to PyPi
Install deployment dependencies
pip install .[deploy]
Build the distribution files
python setup.py install sdist bdist_wheel
Use twine to upload
twine upload -r pypi dist/*
Reporting a Bug
Please make sure to search through the issues before reporting a bug to ensure there isn't already an open issue.
Conventions
Linting
Use black with strings off and line length 100
black src/mavis -S -l 100
Docstrings
docstrings should follow sphinx google code style
if you want to be more explicit with nested types, please follow the same format used by python type annotations
arg1 (List[str]): a list of strings
However using proper type annotations is preferred for new code and then only including the description of the parameter in the docstring and not its type
def some_function(some_arg: List[str]) -> None:
"""
Args:
some_arg: this arg does stuff
"""
Output Columns
any column name which may appear in any of the intermediate or final output files must be defined in mavis.constants.COLUMNS
as well as added to the columns glossary
Tests
- all new code must have unit tests in the tests subdirectory
Tests can be run as follows
pytest tests
Branching Model
If you are working on a large feature, create a base branch for the feature off develop. Generally these follow the naming pattern
git checkout -b integration/issue-<number>-<short-name>
If you are working on a smaller feature then simply make a feature branch off develop
git checkout -b feature/issue-<number>-<short-name>
Once ready, a PR should be made to develop and review should be requested from the other developers.
Releases are done by creating a release branch off develop
git checkout -b release/vX.X.X
Updating the version number in setup.py in the release branch, and then making a PR to master. After the PR has been merged to master a tag/release should be created with the release notes and a PR to merge master back into develop should be made
Major Assumptions
Some assumptions have been made when developing this project. The major ones have been listed here to facilitate debugging/development if any of these are violated in the future.
- The input bam reads have stored the sequence wrt to the positive/forward strand and have not stored the reverse complement.
- The distribution of the fragment sizes in the bam file approximately follows a normal distribution.
Current Limitations
- Assembling contigs will always fail for repeat sequences as we do not resolve this. Unlike traditional assemblies we cannot assume even input coverage as we are taking a select portion of the reads to assemble.
- Currently no attempt is made to group/pair single events into complex events.
- Transcriptome validation uses a collapsed model of all overlapping transcripts and is not isoform specific. Allowing for isoform specific validation would be computationally expensive but may be considered as an optional setting for future releases.
Computing Code coverage
Since MAVIS uses multiple processes, it adds complexity to computing the code coverage. Running coverage normally will undereport. To ensure that the coverage module captures the information from the subprocesses we need to do the following
In our development python virtual environment put a coverage.pth file
(ex. venv/lib/python3.6/site-packages/coverage.pth
) containing the
following
import coverage; coverage.process_startup()
Additionally you will need to set the environment variable
export COVERAGE_PROCESS_START=/path/to/mavis/repo/mavis/.coveragerc