ReporTree: a surveillance-oriented tool to strengthen the linkage between pathogen genetic clusters and epidemiological data

Verónica Mixão, Miguel Pinto, Daniel Sobral, Adriano Di Pasquale, João Paulo Gomes, Vítor Borges

Research output: Contribution to journalArticlepeer-review

13 Citations (Scopus)

Abstract

Background: Genomics-informed pathogen surveillance strengthens public health decision-making, playing an important role in infectious diseases’ prevention and control. A pivotal outcome of genomics surveillance is the identification of pathogen genetic clusters and their characterization in terms of geotemporal spread or linkage to clinical and demographic data. This task often consists of the visual exploration of (large) phylogenetic trees and associated metadata, being time-consuming and difficult to reproduce. Results: We developed ReporTree, a flexible bioinformatics pipeline that allows diving into the complexity of pathogen diversity to rapidly identify genetic clusters at any (or all) distance threshold(s) or cluster stability regions and to generate surveillance-oriented reports based on the available metadata, such as timespan, geography, or vaccination/clinical status. ReporTree is able to maintain cluster nomenclature in subsequent analyses and to generate a nomenclature code combining cluster information at different hierarchical levels, thus facilitating the active surveillance of clusters of interest. By handling several input formats and clustering methods, ReporTree is applicable to multiple pathogens, constituting a flexible resource that can be smoothly deployed in routine surveillance bioinformatics workflows with negligible computational and time costs. This is demonstrated through a comprehensive benchmarking of (i) the cg/wgMLST workflow with large datasets of four foodborne bacterial pathogens and (ii) the alignment-based SNP workflow with a large dataset of Mycobacterium tuberculosis. To further validate this tool, we reproduced a previous large-scale study on Neisseria gonorrhoeae, demonstrating how ReporTree is able to rapidly identify the main species genogroups and characterize them with key surveillance metadata, such as antibiotic resistance data. By providing examples for SARS-CoV-2 and the foodborne bacterial pathogen Listeria monocytogenes, we show how this tool is currently a useful asset in genomics-informed routine surveillance and outbreak detection of a wide variety of species. Conclusions: In summary, ReporTree is a pan-pathogen tool for automated and reproducible identification and characterization of genetic clusters that contributes to a sustainable and efficient public health genomics-informed pathogen surveillance. ReporTree is implemented in python 3.8 and is freely available at https://github.com/insapathogenomics/ReporTree .

Original languageEnglish
Article number43
JournalGenome Medicine
Volume15
Issue number1
DOIs
Publication statusPublished - Dec 2023
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2023, The Author(s).

Funding

The authors thank Dr. Holger Brendebach, Dr. Carlus Deneke, and Dr. Simon Tausch from the German Federal Institute for Risk Assessment for their support during the genome assembly of the samples used in ReporTree benchmarking and Dr. João André Carriço for the productive discussions throughout ReporTree development. We would also like to thank the National Distributed Computing Infrastructure of Portugal (INCD) for providing the necessary resources to run the genome assemblies. INCD was funded by FCT and FEDER under the project 22153-01/SAICT/2016. This work was supported by funding from the European Union’s Horizon 2020 Research and Innovation program under grant agreement No 773830: One Health European Joint Programme (2020–2022) and by national funds through FCT—Foundation for Science and Technology, I.P., in the frame of Individual CEEC 2022.00851.CEECIND/CP1748/CT0001 (2023 onwards). The authors thank Dr. Holger Brendebach, Dr. Carlus Deneke, and Dr. Simon Tausch from the German Federal Institute for Risk Assessment for their support during the genome assembly of the samples used in ReporTree benchmarking and Dr. João André Carriço for the productive discussions throughout ReporTree development. We would also like to thank the National Distributed Computing Infrastructure of Portugal (INCD) for providing the necessary resources to run the genome assemblies. INCD was funded by FCT and FEDER under the project 22153-01/SAICT/2016.

FundersFunder number
INCD
National Distributed Computing Infrastructure of Portugal
Horizon 2020 Framework Programme773830
Fundação para a Ciência e a Tecnologia2022.00851, CEECIND/CP1748/CT0001
European Regional Development Fund22153-01/SAICT/2016
Bundesinstitut für Risikobewertung

    Keywords

    • Automated pipeline
    • Genetic clustering
    • Genomic surveillance
    • Public health
    • ReporTree

    Fingerprint

    Dive into the research topics of 'ReporTree: a surveillance-oriented tool to strengthen the linkage between pathogen genetic clusters and epidemiological data'. Together they form a unique fingerprint.

    Cite this