Meet HeDS: A Data Science Platform Powering Precision Health Research

Looking for a platform that helps you turn complex health data into real, reusable knowledge?

The Health Data Science (HeDS) platform was created to bridge critical gaps in the research ecosystem, bringing together expertise in data science, genomics, AI, and mRNA therapeutics under one roof. In this interview, Mathieu Bourgey, HeDS platform director and director of data science at D2R, shares the vision behind HeDS, the problems it aims to solve, and how the platform is supporting researchers and trainees to unlock the full potential of health and RNA data.

In your own words, what is the HeDS platform?

Mathieu Bourgey (MB): HeDS (which stands for Health Data Science) is a data science hub for health research supported by the DNA to RNA (D2R) initiative and, more recently, by the Gates Foundation. It serves as a platform where we develop tools and processes to turn health data into usable knowledge and reusable assets for the scientific community. Essentially, that involves sharing our expertise through consultations, training, collaborations, and services. Our skills range from study design and biostatistics to bioinformatics pipelines, ML/AI integration, and data management and cataloguing.

What are the main priorities or development milestones your team is focused on right now?

MB: The platform is quite recent, founded in 2025, so after its initial launch, we focused on building the best expert team. We are now fully operational and set to pursue various goals. These include providing advanced analytics support for D2R research projects and implementing a FAIR-based data management framework to maximize the value of D2R research outputs by enhancing their discoverability and enabling new collaboration opportunities. In parallel, we are also working to translate genomic discoveries into RNA-based therapeutic applications.

What are HeDS’s unique offerings to the life science research community?

MB: First, we are currently developing a Data Catalogue that will make D2R data easily discoverable and accessible while ensuring that ownership and primary benefits of the data remain under the control of the researcher who initially generated it.

Additionally, we are creating the DOTS-RNA portal, which enables the management, automation, and optimization of mRNA therapeutics sequences, such as mRNA vaccines. To support this, we have developed and integrated a benchmark dataset to facilitate the evaluation and integration of the latest tools for mRNA optimization into DOTS-RNA.

Part of our unique offering is also our ability to provide flexible support for integrating cutting-edge data science methods across diverse health research areas, such as cancer and rare diseases. This includes hands-on collaboration to develop new analytical approaches, as well as training and mentoring for students and research teams. These mentoring sessions offer a valuable opportunity to transfer knowledge from our team to our collaborators’ labs. 

How is the platform leveraging AI/ML to accelerate discovery?  

MB: Being based in Montreal already gives us a unique advantage thanks to its globally recognized AI ecosystem. However, from my experience in this community, I’ve seen a clear gap between AI method developers and health researchers who generate the data that those methods rely on.

At HeDS, we address this gap through a Benchmark–Deploy–Adapt strategy. Rather than replacing AI methodologists, we focus on rigorously evaluating existing tools and selecting the methods best suited for real-world health data. Our priorities are robustness (including generalization and zero-shot performance) and interpretability. We also collaborate closely with AI experts to ensure models are biologically informed from the start, helping bridge the gap between method development and practical health applications.    

In what ways does the HeDS platform complement C3G’s existing tools and services for researchers?

MB: C3G is a bioinformatics platform that develops unique turnkey analyses, such as through GenPipes, and provides bioinformatics services mainly to the genomics community. At HeDS, we are focusing on broader data science projects, with expertise extending beyond bioinformatics to include activities like biostatistics, AI integration, data management, and more.

Our platform also targets a diverse community of researchers, from pure biologists to chemists and clinicians. We are closely linked with C3G and other specialized McGill platforms. Therefore, if a researcher has a project within C3G’s or another platform’s expertise, but lacks the broader data science component, we direct them to these specialized platforms for more targeted support.

How has HeDS’s recent funding from the Gates Foundation influenced the platform’s direction, scale, or scientific ambitions?

 MB: This recent funding has prompted us to expand our vision beyond operating solely as a campus or national platform to becoming a global capacity-building hub within PATH’s RNA Cooperative. Specifically, it has sped up and reshaped our plans to ensure that our tools and development efforts do not become too specialized for our community of researchers but can be easily and quickly transferred to our international partners.

One of the main challenges is adjusting to the different internal infrastructure processes of our partners from Lower- and Middle-Income Countries (LMIC). This means providing targeted tools that do not require extensive computing infrastructure, rather than a single turnkey solution that might not work effectively in the partner environment.      

How can trainees or young investigators engage with the HeDs platform, and what opportunities does it open for them?

MB: We’ve designed HeDS to be accessible for trainees and researchers. We have set up an online booking system that allows researchers or trainees to request a free consultation with a member of the HeDS team. The consultations are mainly for quick guidance, problem-solving, or high-level questions. 

Another simple way to contact us is by emailing info@hedscenter.ca. Additionally, we regularly offer workshops, training sessions, and community events where everyone is welcome and encouraged to join and talk with us!

Alternatively, you can book a consultation meeting with experts from the HeDS team through this link. 

Research Spotlight: Optimal Sequencing Strategies for Human Genome Variant Detection

Designing a sequencing project? A recent Genome Biology study offers a practical framework for choosing the right technologies based on your budget and research objectives.

Accurately identifying every genetic variation in the human genome is essential for both research and clinical applications. This Genome Biology study brings together expertise from the Canadian Centre for Computational Genomics (C3G) (Robert Eveleigh, Jose Hector Galvez, Mathieu Bourgey, and Guillaume Bourque) and the Advanced Genomic Technologies Laboratory (Sarah Reiling and Jiannis Ragoussis) to benchmark state-of-the-art sequencing platforms and variant detection approaches across both small and large variant classes.

Methodology

Using the Genome in a Bottle (GIAB) HG002 reference sample, the team systematically compared short-read (SRS; Illumina, MGI) and long-read (LRS; PacBio Sequel/Revio, ONT R9/R10) technologies against Telomere-to-Telomere (T2T) and Clinically Medically Relevant Genes (CMRG) benchmarks, spanning a range of sequencing depths, genomic contexts, and bioinformatic pipelines.

Findings

Their findings confirm that platform and workflow selection should be driven by research objectives. SRS excels at identifying small variants in well-mapped regions, while LR, particularly PacBio Revio, delivers superior accuracy for structural variants and small variants in complex or repetitive regions, achieving accuracy saturation at markedly lower sequencing depths (20–45×) than short-reads (>60×). 

While SRS remains a practical choice for high-throughput genotyping, LRS provides the resolution required for clinical applications at challenging loci.

Learn More

To stay current with the latest algorithms, technologies, and benchmarking practices, C3G also maintains a live dashboard!

Read the full article

Tableau de bord d’évaluation des SNV du C3G

Gene fusion meta-calling with MetaFusion

The Toronto Node of C3G has developed MetaFusion, a flexible meta-calling tool that amalgamates outputs from any number of fusion callers. Designed to overcome inconsistencies among frequently used fusion callers, MetaFusion is among the first ensemble fusion calling tools currently available.

Continue reading