Common bioinformatics software maintained by C3G

Tip of the Month

Introduction

There is a huge library of common bioinformatics software available on Compute Canada resources via the modules maintained by C3G staff and distributed via the CernVM-File System (CVMFS). Despite the breadth of the C3G CVMFS library, there may be times when using the provided software isn’t ideal.

For example:
you might want to use software that we haven’t yet made available via CVMFS and you don’t want to repeatedly install it at each HPC facility you might want to guarantee comparability of results by running exactly the same software stack on Compute Canada, your workstation, or on infrastructure from a cloud provider such as Amazon Web Services or Google Cloud there is a more recent version of the software already available in a container. Listings of existing images are available from community efforts such as biocontainers, but also might be made built directly from the source repository.In circumstances such as these, containers offer an excellent solution by packaging up your software and its dependencies into a single image that contains all the software needed for a particular analysis or workflow.

The process for running containerized software on Compute Canada can be described in three steps:

Ensure singularity is available
Download a container
Run your containerized software

Step 1: Ensure singularity is available

At all Compute Canada facilities, singularity is available as a module. Loading the module is as simple as running:

code showing module load

If you’re running singularity on your linux laptop or workstation, download instructions are available here.

Step 2: Download a container

Many software stacks are already available as Docker images at repositories such as Docker Hubor Quay.io. Unfortunately, running Docker on shared clusters introduces potential security vulnerabilities. Fortunately for us, Singularity can use Docker images to build new singularity containers. For example, let’s say that we wanted to run the genome tools suite. The bio-containers repository shows me that the latests version (1.5.10) is available at quay.io/repository/biocontainers/genometools-genometools as a Docker image. To download the image to my Compute Canada instance, I can run “singularity pull”:

code detailing how to run a singularity pull

This produces the singularity image “sif” file in the current directory.

Step 3: Run your containerized software
To run the genome tools suite from inside the new container, prepend your command with “singularity exec ”:

Instruction shown as code to run a genome tool from inside a container

That’s it! You have a perfectly reproducible software stack running without needing to worry about installation or dependencies.

Next Steps and Getting Help

As you might imagine, there are plenty of details we don’t have time to cover in this short blog post. If you’d like to learn more, or if you’re having trouble, there are plenty of ways to find help.

The Compute Canada wiki has an excellent page on running containers on their infrastructure (en/fr)
The Singularity docsare the definitive guide
The C3G has a weekly open door session to which you are welcome to bring questions about containers and reproducible bioinformatics analyses.