Obtaining P4ward#
The easiest way to obtain and run P4ward is through Docker, where it is possible to obtain and run P4ward through a single command. It is also possible to run the code using a conda environment as well as using Apptainer (former singularity) for running in high performance computational clusters (HPCs).
Docker#
Running using a single command by obtaining it from Docker Hub:
sudo docker run -v .:/home/data paulajlr/p4ward:latest --config_file config.ini
This will obtain the container, if it hasn’t been done already, and directly run the pipeline.
If you have obtained the docker container through a tar file, such as p4ward.tar
, you can build the container with:
sudo docker load -i path/to/p4ward.tar
Then the program can be run using the container with:
sudo docker run -v .:/home/data p4ward --config_file config.ini
You may also choose to build the container yourself. In this case, clone the P4ward git repository to your chosen working directory, navigate into the new p4ward
folder and build:
git clone https://github.com/PaulaJLR/p4ward.git
cd p4ward
sudo docker build -f dockerfiles/Dockerfile -t p4ward
Then run as previously described:
sudo docker run -v .:/home/data p4ward --config_file config.ini
Conda#
It is also possible to easily obtain P4ward’s dependencies and run the program locally without the need for containers. This step assumes you have conda or miniconda installed.
Install Megadock for single node environment, CPU (Instructions here). Make sure to add megadock installation to your system PATH, with:
export PATH=$PATH:path/to/megadock/installation/
Test if this step was successful by navigating to another directory and calling megadock with megadock -h
. If you want this PATH update to be permanent, you can paste it on your ~/.bashrc
file. Otherwise, remember to add megadock to PATH before running P4ward.
Clone the P4ward github repo:
git clone https://github.com/SKTeamLab/P4ward.git
Then build a conda environment to work P4ward:
conda config --add channels conda-forge
conda config --add channels bioconda
conda create -n p4ward python=3.11 --file ./p4ward/dockerfiles/conda_requirements.txt
Next, add P4ward to PYTHONPATH
:
export PYTHONPATH=$PYTHONPATH:path/to/cloned/p4ward/
Tip
More about PYTHONPATH:
When you run python from the command line, there is a way to tell your system where to find additional python packages, so that you can call them without having to worry about their installation path in your system. When you obtain the program, you get the main folder for the repository, which is called “P4ward”. Inside of it, there are many resources. For example, there is a folder called “tutorial”, where you can find all the files for this tutorial, another called “docs”, where this documentation was written, and, importantly, another folder called P4ward, which contains the python package itself. This means that python should find this folder, and so you must add the root P4ward repo folder to PYTHONPATH. This way, python will look in the root p4ward repo folder and in it, it will find the “P4ward” package. Thus, if for example you ran git clone https://github.com/SKTeamLab/P4ward.git
while you were in your Downloads folder, you should add this path to your PYTHONPATH: /home/USER/Downloads/p4ward
When running P4ward with this strategy, always remember to add megadock to your PATH, activate the conda environment, add the program to PYTHONPATH
, and then run the program. Example:
conda activate p4ward
export PATH=$PATH:path/to/megadock/installation/
export PYTHONPATH=$PYTHONPATH:path/to/cloned/p4ward/
python -m p4ward --config_file config.ini
Apptainer#
Usually, HPC clusters do not support Conda or Docker. In this case, it is possible to convert a Docker container into an Apptainer container, which is usually supported by clusters. If you don’t have the tar file of the Docker container, you can make one by running:
sudo docker save -o p4ward.tar p4ward
Next, a Docker tarfile can be converted to an Apptainer file by running:
apptainer build p4ward.sif docker-archive:p4ward.tar
At this point, make sure to do the conversion within a job script. It should not need more than 16GB of RAM and not much more than an hour to run, depending on the node specifications. This was tested with Apptainer version 1.3.4 and above.
After the .sif
file has been generated, P4ward can be run by:
apptainer run -B /[root_mount_path] /path/to/p4ward.sif --config config.ini
The root mount path will be whichever filesystem you are working on in the cluster, for example, it could be -B /scratch
or -B /project
.