Our goal is to unravel the cellular circuits driving and encoding the molecular basis, function, and regulation of complex biological systems such as human disease. Our work illustrates how the integration of multiple genomic data sources and studies in model microbial systems led to the development of new approaches that address both fundamental biological questions and catalyze studies in more complex organisms. For example, our analysis of large amounts of publicly available data from the model yeast Saccharomyces cerevisiae led to groundbreaking discoveries in fundamentals of evolution, as well as in the rare Mendelian kidney disease MCKD1, and the human malaria parasite Plasmodium falciparum. Taking advantage of recent advances in probing and manipulating cellular circuits on a genomic scale, we use the power of computational strategies to analyze and integrate genomic data, thus developing powerful toolboxes for application in human disease.

Selected contributions to the development and dissemination of algorithms and software tools and how they have helped advance and overcome challenges in biomedical research for better understanding, diagnosis and treatment of human disease:

  2. Publication: Champion M, Brennan K, Croonenborghs T, Gentles AJ, Pochet N, Gevaert O (2018) Module Analysis Captures Genetically and Epigenetically Deregulated Cancer Driver Genes for Smoking and Antiviral Response. bioRxiv 216754. EBioMedicine, 27:156-166.

    Funding: Supported by NIH NCI ITCR R21 CA209940 (Pochet), NIH NIAID R03 AI131066 (Pochet), and NIH NCI ITCR U01 CA214846 collaborative supplement (Carey/Pochet)

    Summary: The availability of increasing volumes of multi-omics profiles from model systems to patient studies across many cancers promises to improve our understanding of the regulatory mechanisms underlying cancer. The main challenges are to integrate these multiple levels of omics data and to translate them across in vitro and in vivo systems. We developed the AMARETTO and Community-AMARETTO frameworks that allow learning regulatory networks across biological systems with a broad range of applications, from diagnostic subtyping to driver and drug discovery in cancer. First, AMARETTO infers regulatory networks within each biological system via multi-omics data fusion. Specifically, AMARETTO identifies potential cancer drivers by identifying genes whose genetic and epigenetic cancer aberrations have a direct functional impact on their own transcriptomic or proteomic expression. AMARETTO then connects these drivers in a regulatory program with modules of co-expressed target genes that they putatively control, defined as regulatory modules or cell circuits. Second, Community-AMARETTO learns communities or subnetworks by connecting the regulatory networks and modules inferred from different systems using an edge betweenness community detection algorithm to identify drivers across diseases or biological systems. Downstream analytic functionalities of AMARETTO include functional annotation of modules, stratifying modules for increasingly specific phenotypes, and ongoing work on automated driver and drug discovery using genetic and chemical perturbations in model systems. AMARETTO offers tools for systematic assessment and benchmarking of the inferred regulatory networks for optimal generalization performance of the models. We recently released the source code in R and we are currently integrating AMARETTO into the Bioconductor, Cloud-scale Bioconductor, GenePattern, GenePattern Notebook, and GenomeSpace platforms.

    See The *AMARETTO framework

  3. GenomeSpace
  4. Publication: Qu K, Garamszegi S, Wu F, Thorvaldsdottir H, Liefeld T, Ocana M, Borges-Rivera D, Pochet N, Robinson JT, Demchak B, Hull T, Ben-Artzi G, Blankenberg D, Barber GP, Lee BT, Kuhn RM, Nekrutenko A, Segal E, Ideker T, Reich M, Regev A, Chang HY, Mesirov JP (2016) Integrative genomic analysis by interoperation of bioinformatics tools in GenomeSpace. Nature Methods, 13(3):245-247.

    Summary: Complex biomedical analyses require the use of multiple software tools in concert and remain challenging for much of the biomedical research community. We introduce GenomeSpace, a cloud-based, cooperative community resource that currently supports the streamlined interaction of 20 bioinformatics tools and data resources. To facilitate integrative analysis by non-programmers, it offers a growing set of 'recipes', short workflows to guide investigators through high-utility analysis tasks.

    Reich M, Liefeld T, Ocana M, Jang D, Bistline J, Robinson J, Carr P, Hill B, McLaughlin J, Pochet N, Borges-Rivera D, Tabor T, Thorvaldsdóttir H, Regev A, Mesirov JP (2013) GenomeSpace: an environment for frictionless bioinformatics. F1000Research, Poster 2013, 4:804.

  5. Trinity and Trinity-CTAT (Cancer Transcriptome Analysis Toolkit)
  6. Publication: Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, Couger MB, Eccles D, Li B, Lieber M, MacManes MD, Ott M, Orvis J, Pochet N, Strozzi F, Weeks N, Westerman R, William T, Dewey CN, Henschel R, LeDuc RD, Friedman N, Regev A (2013) De novo transcript sequence reconstruction from RNA-Seq using the Trinity platform for reference generation and analysis. Nature Protocols, 8(8):1494-1512.

    Funding: Supported by NIH NCI ITCR U24 CA180922 (Regev)

    Summary: De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome. We describe the use of the Trinity platform for genome-independent transcriptome assembly from RNA-seq data in non-model organisms, as well as downstream applications, including transcript abundance estimation, identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. The Trinity Cancer Transcriptome Analysis Toolkit (CTAT) aims to provide tools for leveraging RNA-Seq to gain insights into the biology of cancer transcriptomes. Bioinformatics tool support is provided for mutation detection, fusion transcript identification, de novo transcript assembly of cancer-specific transcripts, lincRNA classification, and foreign transcript detection (viruses, microbes).

    Best Performer in the DREAM6 challenge on ‘Alternative Splicing Prediction’ with Team Trinity (Manfred Grabherr, Brian Haas, Moran Yassour, Michael Ott, Nathalie Pochet, Nir Friedman and Aviv Regev)
    Haas B, Dobin A, Stransky N, Li B, Yang X, Tickle T, Bankapur A, Ganote C, Doak T, Pochet N, Sun J, Wu C, Gingeras T, Regev A (2017) STAR-Fusion: Fast and Accurate Fusion Transcript Detection from RNA-Seq. bioRxiv 120295.
    Ticke T, Bankapur A, Ganote C, Fulton B, Tirosh I, Chen J, Doak T, Henschel R, Pochet N, Wu C, Haas B, Regev A (2016) Trinity CTAT: a community resource for de novo and reference-based RNA-Seq analysis. F1000Research, Poster 2016, 5:1844.

  7. SERV (Sequence-Based Estimation of Repeat Variability)
  8. Publication: Legendre M*, Pochet N*, Pak T, Verstrepen KJ (2007) Sequence-based estimation of minisatellite and microsatellite repeat variability. Genome Research, 17(12):1787-1796. *contributed equally.

    Summary: Variation in some repeat sequences underlies rapidly evolving traits or certain diseases. We developed a nonlinear model SERV that predicts the variability of a broad range of tandem repeats in a wide range of organisms. SERV outperforms existing models and accurately predicts repeat variability in bacteria and eukaryotes, including plants and humans. SERV allows identification of known and candidate genes involved in repeat-based diseases.

    http://www.igs.cnrs-mrs.fr/SERV/ (previously at: http://hulsweb1.cgr.harvard.edu/SERV/)
    Fine mapping the causal genetic variant for the rare dominant Mendelian kidney disease MCKD1 using SERV: Kirby A, Gnirke A, Jaffe DB, Barešová V, Pochet N, Blumenstiel B, Ye C, Aird D, Stevens C, Robinson JT, Cabili MN, Gat-Viks I, Kelliher E, Daza R, DeFelice M, Hůlková H, Sovová J, Vylet'al P, Antignac C, Guttman M, Handsaker RE, Perrin D, Steelman S, Sigurdsson S, Scheinman SJ, Sougnez C, Cibulskis K, Parkin M, Green T, Rossin E, Zody MC, Xavier RJ, Pollak MR, Alper SL, Lindblad-Toh K, Gabriel S, Hart PS, Regev A, Nusbaum C, Kmoch S, Bleyer AJ, Lander ES, Daly MJ (2013) Mutations causing medullary cystic kidney disease type 1 lie in a large VNTR in MUC1 missed by massively parallel sequencing. Nature Genetics, 45(3):299-303.
    Discoveries in the fundamentals of evolution using SERV: Smukalla S*, Caldara M*, Pochet N*, Beauvais A, Guadagnini S, Yan C, Vinces MD, Jansen A, Prevost MC, Latgé JP, Fink GR, Foster KR, Verstrepen KJ (2008) FLO1 is a variable green beard gene that drives biofilm-like cooperation in budding yeast. Cell, 135(4):726-737. *contributed equally.

  9. M@CBETH (a MicroArray Classification BEnchmarking Tool on a Host server)
  10. Publication: Pochet N, Janssens FAL, De Smet F, Marchal K, Suykens JAK, De Moor BLR (2005) M@CBETH: a microarray classification benchmarking tool. Bioinformatics, 21(14):3185-3186.

    Summary: The M@CBETH web service offers the microarray community a simple tool for making optimal two-class predictions. M@CBETH aims at finding the best prediction among different classification methods by using randomizations of the benchmarking dataset. The M@CBETH web service intends to introduce an optimal use of clinical microarray data classification.

    Tutorial: ftp://ftp.esat.kuleuven.be/sista/npochet/tutorial.pdf (Web service previously at: http://www.esat.kuleuven.be/MACBETH/)
    Assessing the role of non-linearity and dimensionality reduction in microarray data classification: Pochet N, De Smet F, Suykens JAK, De Moor BLR. (2004) Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction. Bioinformatics, 20(17):3185-3195.
    Conference contribution: Pochet N, Janssens FAL, De Smet F, Marchal K, Vergote IB, Suykens JAK, De Moor BLR (2005) M@CBETH: Optimizing clinical microarray classifcation. Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference (CSB2005, Stanford), 89-90.