HOME RESEARCH SOFTWARE *AMARETTO PUBLICATIONS PEOPLE CONTACT

The *AMARETTO framework: multiscale and multimodal inference of regulatory networks to identify cell circuits and their drivers shared and distinct across biological systems of human disease


The *AMARETTO framework in GenePattern Notebook

The *AMARETTO framework in GenePattern Notebook provides users with a complete analysis pipeline that enables running AMARETTO on one or multiple data cohorts and connecting them using Community-AMARETTO via GenePattern and GenomeSpace.

Mohsen Nabian#, Celine Everaert#, Jayendra Shinde#, Shaimaa Bakr#, Ted Liefeld#, Mikel Hernaez, Thomas Baumert, Michael Reich, Jill Mesirov*, Vincent Carey*, Olivier Gevaert*, Nathalie Pochet*
Preview: https://notebook.genepattern.org/services/sharing/notebooks/334/preview/
https://notebook.genepattern.org/user/amaretto-team/notebooks/The *AMARETTO framework in GenePattern Notebook.ipynb
(** Please note that you should login to https://notebook.genepattern.org with your own GenePattern user account, and then search for "The *AMARETTO framework in GenePattern Notebook" in the publicly available Notebooks. This Notebook runs directly on the GenePattern Amazon Cloud servers.)

The *AMARETTO framework in R Jupyter Notebook

The *AMARETTO framework in R Jupyter Notebook provides users with a complete analysis pipeline that enables running AMARETTO on one or multiple data cohorts and connecting them using Community-AMARETTO via GitHub and Bioconductor.

Mohsen Nabian#, Jayendra Shinde#, Celine Everaert#, Shaimaa Bakr#, Ted Liefeld, Thorin Tabor, Charles Blatti, Thomas Baumert, Michael Reich, Jill Mesirov, Mikel Hernaez*, Vincent Carey*, Olivier Gevaert*, Nathalie Pochet*
https://colab.research.google.com/drive/1JfnRoNgTVX_7VEGAAmjGjwP_yX2tdDxs
(** Please note that you can run "The *AMARETTO framework in R via GitHub and Bioconductor" directly on Google Colaboratory or your own servers.)

NIH NCI ITCR meeting, Salt Lake City, 2019: Abstract Poster Slides

R/BioC Meetup at Dana-Farber Cancer Institute / Harvard Medical School, Boston, 2019: Slides


Abstract

Computational inference of regulatory networks underlying complex human diseases is one of the fundamental goals of systems biology and has shown great promise for deciphering the regulatory cell circuits driving complex disease biology, including cancer.

The availability of increasing volumes of multimodal data (i.e., from multi-omics - genetic, epigenetic, transcriptomic and proteomic - to imaging - non-invasive and histopathology - and to clinical data) across multiscale systems (i.e., from model systems to patient studies, and from in vitro to in vivo systems) promises to improve our understanding of the regulatory mechanisms underlying complex human diseases. The main challenges are to integrate the multiple levels of multimodal data and to translate them across multiscale biological systems to decipher the underpinnings of human disease.

Here we present the *AMARETTO framework that addresses some of the remaining challenges in this field by formulating novel approaches to multimodal and multiscale inference of regulatory networks. Specifically, the AMARETTO algorithm provides a multimodal reformulation to inferring regulatory networks across multimodal data levels within one biological system and the Community-AMARETTO algorithm is a multiscale enhancement for learning how regulatory networks are shared and distinct across biological systems.

The *AMARETTO framework is available as user-friendly tools within GitHub, Bioconductor, GenePattern, GenomeSpace and GenePattern Notebook.

The *AMARETTO framework

We recently developed the *AMARETTO framework as a tool for multiscale and multimodal inference of regulatory networks to identify cell circuits and their drivers that are shared and distinct across biological systems of human disease. Our framework learns how regulatory networks - cell circuits and their drivers - are shared and distinct across biological systems with a broad range of applications, from diagnostic and prognostic disease subtyping, to driver discovery, and to drug discovery in studies of human disease, including cancer.

The *AMARETTO framework is formulated by means of two algorithmic components:

(1) The AMARETTO algorithm that facilitates multimodal inference of regulatory networks within one biological system via multimodal data fusion (i.e., from multi-omics - genetic, epigenetic, transcriptomic, proteomic - to imaging - non-invasive and histopathology - and to clinical data).

(2) The Community-AMARETTO algorithm that enables multiscale inference to learn how these regulatory networks are shared and distinct across biological systems and diseases via multiscale modeling across systems (i.e., from model systems to patient studies, and from in vitro to in vivo systems).

Current *AMARETTO and downstream analytic functionalities

First, AMARETTO infers regulatory networks within each biological system via multi-omics data fusion. Specifically, AMARETTO identifies potential cancer drivers by identifying genes whose genetic and epigenetic cancer aberrations have a direct functional impact on their own transcriptomic or proteomic expression. These (epi)genetic drivers can be augmented, intersected or replaced with predefined candidate drivers with known regulatory function (e.g., transcription factors from TFutils). AMARETTO then connects these drivers in a regulatory program with modules of co-expressed target genes that they putatively control, defined as regulatory modules or cell circuits, using a penalized regulatory program. Next, Community-AMARETTO learns communities or subnetworks by connecting regulatory networks inferred from different systems using an edge betweenness community detection algorithm to identify cell circuits and drivers that are shared and distinct across biological systems and diseases.

The *AMARETTO framework additionally offers tools for downstream analytic functionalities on both module and community levels, including functional annotation of modules and communities (e.g., using known functional categories from MSigDB), stratifying modules and communities for increasingly specific phenotypes (e.g., patient characteristics such as survival, molecular subclasses, known (epi)genetic cancer aberrations, or features derived from non-invasive or histopathology imaging, as well as in-depth studies of etiologies of cancer via spatiotemporal - time course and single-cell - studies in model systems), validation of predicted drivers (e.g., using genetic perturbation studies in model systems – knockdown or overexpression experiments of driver genes), discovering drugs targeting drivers and their predicted target genes (e.g., using chemical perturbation studies in model systems), and systematic assessment and benchmarking of the networks for generalized prediction performance of the (sub)networks.

Beyond our recent applications to studies of cancer the *AMARETTO software tools are more generally applicable to studies of human disease, including cancer, infectious, neurologic and immune-mediated diseases.

Resources

The source code of the *AMARETTO framework and future developments are available via GitHub and disseminated as user-friendly software tools via Bioconductor to enable further algorithm and software development and via GenePattern, GenomeSpace and GenePattern Notebook to reach a broad audience of biomedical researchers.

*AMARETTO in GitHub

- AMARETTO in GitHub: https://github.com/gevaertlab/AMARETTO
- Community-AMARETTO in GitHub: https://github.com/broadinstitute/CommunityAMARETTO

*AMARETTO in Bioconductor

- AMARETTO in Bioconductor: https://bioconductor.org/packages/release/bioc/html/AMARETTO.html
- Community-AMARETTO in Bioconductor: in preparation for submission

*AMARETTO in GenePattern

- AMARETTO in GenePattern: https://cloud.genepattern.org/gp/pages/index.jsf?lsid=urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00378
- Community-AMARETTO in GenePattern: https://cloud.genepattern.org/gp/pages/index.jsf?lsid=urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00381

*AMARETTO in GenomeSpace

The AMARETTO and Community-AMARETTO modules in GenePattern are also available within GenomeSpace: http://www.genomespace.org/

*AMARETTO in GenePattern Notebook and R Jupyter Notebook

The *AMARETTO framework in GenePattern Notebook and in R Jupyter Notebook provides users with a complete analysis pipeline that enables running AMARETTO on one or multiple data cohorts and connecting them using Community-AMARETTO. Each AMARETTO and Community-AMARETTO analysis generates a detailed report of genome-wide networks inferred from one cohort and/or shared/distinct across multiple cohorts. These reports include queryable tables and visualizations (heatmaps and network graphs) of shared/distinct cell circuits and their drivers, as well as their functional and phenotypic characterizations.

The GenePattern Notebook runs the *AMARETTO framework directly on the GenePattern Amazon Cloud servers.
https://notebook.genepattern.org/services/sharing/notebooks/334/preview/
https://notebook.genepattern.org/user/amaretto-team/notebooks/The *AMARETTO framework in GenePattern Notebook.ipynb

The R Jupyter Notebook runs the *AMARETTO framework via GitHub or Bioconductor on Google Colaboratory or your own servers.
https://colab.research.google.com/drive/1JfnRoNgTVX_7VEGAAmjGjwP_yX2tdDxs

*AMARETTO example reports

Studying hepatitis C & B virus-induced hepatocellular carcinoma using *AMARETTO:
- An example report that learns regulatory networks from multi-omics - genetic, epigenic and functional genomics - data of the hepatocellular carcinoma patient cohort from TCGA and integrates them with regulatory networks learned from functional genomics data of liver cancer cell lines from CCLE: Community-AMARETTO Report Liver 2 data sets and also available from NDEx: NDEx Community-AMARETTO Network Liver 2 data sets
- An example report that integrates regulatory networks derived from >6 liver data sources (multi-omics hepatocellular carcinoma patient data from TCGA, ~25 liver cell line models from CCLE, time course hepatitis C virus infection data in Huh7 models, time course hepatitis B virus infection data in HepG2 models, single-cell hepatitis C virus infection data in Huh7 models, single-cell hepatitis B virus infection data in HepG2 models, further augmented with previously published prognostic network models that were derived from hepatocellular carcinoma patient data): Community-AMARETTO Report Liver 6 data sets and also available from NDEx: NDEx Community-AMARETTO Network Liver 6 data sets
- An example of ongoing work on developing gene-level ontology network representations from *AMARETTO modules and communities: Shiny App

Multi-omics & imaging data fusion for glioblastoma multiforme and low grade gliomas using *AMARETTO:
- An example report that integrates imaging data into the multi-omics regulatory networks for glioblastoma multiforme and low grade gliomas based on multi-omics and non-invasive imaging data from TCGA/TCIA (that we will later connect with networks learned from integrating RNA-Seq refined for anatomic structures and stem cells with histopathology imaging data from IvyGAP and that we will subsequently further refine based on single-cell RNA-Seq studies): Community-AMARETTO Report Brain 2 data sets and also available from NDEx: NDEx Community-AMARETTO Network Brain 2 data sets

Questions?

For any questions with the *AMARETTO framework, please contact Nathalie Pochet (npochet@broadinstitute.org) and Olivier Gevaert (ogevaert@stanford.edu). See also gevaertlab.stanford.edu and http://med.stanford.edu/gevaertlab/software.html.

Funding

This work was supported by grants from NIH NCI ITCR R21 CA209940 (Pochet), NIH NCI ITCR U01 CA214846 Collaborative Supplement (Carey/Pochet) and NIH NIAID R03 AI131066 (Pochet).