Meet HeDS: A Data Science Platform Powering Precision Health Research

Looking for a platform that helps you turn complex health data into real, reusable knowledge?

The Health Data Science (HeDS) platform was created to bridge critical gaps in the research ecosystem, bringing together expertise in data science, genomics, AI, and mRNA therapeutics under one roof. In this interview, Mathieu Bourgey, HeDS platform director and director of data science at D2R, shares the vision behind HeDS, the problems it aims to solve, and how the platform is supporting researchers and trainees to unlock the full potential of health and RNA data.

In your own words, what is the HeDS platform?

Mathieu Bourgey (MB): HeDS (which stands for Health Data Science) is a data science hub for health research supported by the DNA to RNA (D2R) initiative and, more recently, by the Gates Foundation. It serves as a platform where we develop tools and processes to turn health data into usable knowledge and reusable assets for the scientific community. Essentially, that involves sharing our expertise through consultations, training, collaborations, and services. Our skills range from study design and biostatistics to bioinformatics pipelines, ML/AI integration, and data management and cataloguing.

What are the main priorities or development milestones your team is focused on right now?

MB: The platform is quite recent, founded in 2025, so after its initial launch, we focused on building the best expert team. We are now fully operational and set to pursue various goals. These include providing advanced analytics support for D2R research projects and implementing a FAIR-based data management framework to maximize the value of D2R research outputs by enhancing their discoverability and enabling new collaboration opportunities. In parallel, we are also working to translate genomic discoveries into RNA-based therapeutic applications.

What are HeDS’s unique offerings to the life science research community?

MB: First, we are currently developing a Data Catalogue that will make D2R data easily discoverable and accessible while ensuring that ownership and primary benefits of the data remain under the control of the researcher who initially generated it.

Additionally, we are creating the DOTS-RNA portal, which enables the management, automation, and optimization of mRNA therapeutics sequences, such as mRNA vaccines. To support this, we have developed and integrated a benchmark dataset to facilitate the evaluation and integration of the latest tools for mRNA optimization into DOTS-RNA.

Part of our unique offering is also our ability to provide flexible support for integrating cutting-edge data science methods across diverse health research areas, such as cancer and rare diseases. This includes hands-on collaboration to develop new analytical approaches, as well as training and mentoring for students and research teams. These mentoring sessions offer a valuable opportunity to transfer knowledge from our team to our collaborators’ labs. 

How is the platform leveraging AI/ML to accelerate discovery?  

MB: Being based in Montreal already gives us a unique advantage thanks to its globally recognized AI ecosystem. However, from my experience in this community, I’ve seen a clear gap between AI method developers and health researchers who generate the data that those methods rely on.

At HeDS, we address this gap through a Benchmark–Deploy–Adapt strategy. Rather than replacing AI methodologists, we focus on rigorously evaluating existing tools and selecting the methods best suited for real-world health data. Our priorities are robustness (including generalization and zero-shot performance) and interpretability. We also collaborate closely with AI experts to ensure models are biologically informed from the start, helping bridge the gap between method development and practical health applications.    

In what ways does the HeDS platform complement C3G’s existing tools and services for researchers?

MB: C3G is a bioinformatics platform that develops unique turnkey analyses, such as through GenPipes, and provides bioinformatics services mainly to the genomics community. At HeDS, we are focusing on broader data science projects, with expertise extending beyond bioinformatics to include activities like biostatistics, AI integration, data management, and more.

Our platform also targets a diverse community of researchers, from pure biologists to chemists and clinicians. We are closely linked with C3G and other specialized McGill platforms. Therefore, if a researcher has a project within C3G’s or another platform’s expertise, but lacks the broader data science component, we direct them to these specialized platforms for more targeted support.

How has HeDS’s recent funding from the Gates Foundation influenced the platform’s direction, scale, or scientific ambitions?

 MB: This recent funding has prompted us to expand our vision beyond operating solely as a campus or national platform to becoming a global capacity-building hub within PATH’s RNA Cooperative. Specifically, it has sped up and reshaped our plans to ensure that our tools and development efforts do not become too specialized for our community of researchers but can be easily and quickly transferred to our international partners.

One of the main challenges is adjusting to the different internal infrastructure processes of our partners from Lower- and Middle-Income Countries (LMIC). This means providing targeted tools that do not require extensive computing infrastructure, rather than a single turnkey solution that might not work effectively in the partner environment.      

How can trainees or young investigators engage with the HeDs platform, and what opportunities does it open for them?

MB: We’ve designed HeDS to be accessible for trainees and researchers. We have set up an online booking system that allows researchers or trainees to request a free consultation with a member of the HeDS team. The consultations are mainly for quick guidance, problem-solving, or high-level questions. 

Another simple way to contact us is by emailing info@hedscenter.ca. Additionally, we regularly offer workshops, training sessions, and community events where everyone is welcome and encouraged to join and talk with us!

Alternatively, you can book a consultation meeting with experts from the HeDS team through this link. 

Research Spotlight: Optimal Sequencing Strategies for Human Genome Variant Detection

Designing a sequencing project? A recent Genome Biology study offers a practical framework for choosing the right technologies based on your budget and research objectives.

Accurately identifying every genetic variation in the human genome is essential for both research and clinical applications. This Genome Biology study brings together expertise from the Canadian Centre for Computational Genomics (C3G) (Robert Eveleigh, Jose Hector Galvez, Mathieu Bourgey, and Guillaume Bourque) and the Advanced Genomic Technologies Laboratory (Sarah Reiling and Jiannis Ragoussis) to benchmark state-of-the-art sequencing platforms and variant detection approaches across both small and large variant classes.

Methodology

Using the Genome in a Bottle (GIAB) HG002 reference sample, the team systematically compared short-read (SRS; Illumina, MGI) and long-read (LRS; PacBio Sequel/Revio, ONT R9/R10) technologies against Telomere-to-Telomere (T2T) and Clinically Medically Relevant Genes (CMRG) benchmarks, spanning a range of sequencing depths, genomic contexts, and bioinformatic pipelines.

Findings

Their findings confirm that platform and workflow selection should be driven by research objectives. SRS excels at identifying small variants in well-mapped regions, while LR, particularly PacBio Revio, delivers superior accuracy for structural variants and small variants in complex or repetitive regions, achieving accuracy saturation at markedly lower sequencing depths (20–45×) than short-reads (>60×). 

While SRS remains a practical choice for high-throughput genotyping, LRS provides the resolution required for clinical applications at challenging loci.

Learn More

To stay current with the latest algorithms, technologies, and benchmarking practices, C3G also maintains a live dashboard!

Read the full article

Tableau de bord d’évaluation des SNV du C3G

Inside GenPipes: Exploring C3G’s Software Solution for Life Science Research

Take a closer look at one of C3G’s software tools; its purpose, its strengths, its newest features, and how it continues to support the life science community.

What is GenPipes?

GenPipes pipelines provide high-quality genomic analyses optimized for high-performance computing (HPC) and cloud environments. It is an open-source (LGPL), Python-based platform for managing -omics workflows. GenPipes is widely adopted across the life sciences, serving bioinformatics professionals, students, and researchers working on a broad range of genomic analyses and has extensive documentation explaining each pipeline and its outputs to users.

What sets GenPipes apart from other analysis platforms or workflow management systems?

GenPipes stands out for its flexibility, scalability, and ease of use. It adapts quickly to new systems, supports multiple job schedulers and deployment types, and includes a broad set of ready-to-use pipelines. Its integration with the Digital Research Alliance of Canada makes it particularly appealing to Canadian researchers, offering a pre-installed solution for several standard–omics analyses.

GenPipes also has a low barrier to entry; users don’t need to install software, manage reference genomes, or configure compute resources, allowing them to start their analyses quickly and confidently!

 

What are the most used pipelines in GenPipes?

The most widely used pipelines in GenPipes are ChIP‑Seq, RNA‑Seq, and DNA‑Seq. The DNA‑Seq pipeline supports multiple protocols, making it suitable for both standard whole‑genome analyses and paired cancer genomics workflows. The pipelines output multiple reports and standard files, such as BAM, VCFs, peak-calls and expression matrices.

Does GenPipes have any features to help students and early-career researchers?

Yes! To make GenPipes even more accessible to new users, we developed a new tool called the GenPipes Wizard, with the help of an excellent C3G intern, Alexa Li Kim Wa.

The Wizard is an interactive assistant that helps users:

  • Quickly identify the pipeline best suited to their data
  • Automatically generate the correct command to launch their analysis

What is the most recent version of GenPipes?

We continuously enhance GenPipes based on user feedback and internal benchmarking. Last year’s major release was v.6.1.0, which introduced new pipelines and streamlined the interface by retiring tools that are no longer in use.

Our latest release is a minor update, focused on fixing small bugs uncovered through testing or reported by users. Click the link to learn more about our latest version: v.6.1.1.

If you’d like to help shape future improvements, we encourage users to email us:

pipelines@computationalgenomics.ca

*Users should be aware of whether a release is major, medium, or minor. Major releases may break backward compatibility, which is important for long-running projects that require comparable results. All previous GenPipes versions remain available as modules, so analyses can always be repeated with an older version if needed*

Subscribe to our newsletter to stay up to date on C3G’s latest software releases and updates! 

Whole Genome STR Analysis

In early 2021, Jeffrey Hyacinthe, another student here in Guillaume Bourque’s lab, wrote about repetitive sequences, focusing on transposable elements (TEs). Here I will discuss another type of repetitive sequence: short tandem repeats (STRs), also known as microsatellites.

Continue reading