Human Quantitative Dynamics Workshop

December 1 & 2, 2015

Bethesda MD

This is an abbreviated version of the conference grant application that was funded to NIGMS

Overview: With the accumulation of vast genomic and epigenomic understanding there is a growing realization that we need detailed and comprehensive quantitative understanding of the components of various cell types and tissues in terms of their concentrations and kinetic interactions. This detailed knowledge is necessary to connect the genomic and epigenomic characteristics to biochemical and physiological functions, a prerequisite for deep predictive understanding of our body in health and disease and for drug development for a wide range of complex diseases. This transformative knowledge will be of direct relevance to the research missions of most institutes of the NIH and has the potential to revolutionize our basic understanding of the human body. We propose a set of 3-4 workshops to develop a community-based initial plan for obtaining a comprehensive catalog of quantitative dynamics of cellular components in the human body. Here, we seek partial support for the first workshop. The long term goal of the workshops is to evaluate the feasibility of a plan for a distributed systems biology project to quantify all cellular components, map their molecular connectivity and dynamic functions that lead to emergent physiological functions of major organs in the human body. This could be envisaged as a unique comprehensive quantitative project rooted in biochemistry and cell physiology that will complement the genetic/genomics molecular biology projects such as ENCODE and the 1000 Human Genome Project and the recently initiated Phase 2 projects in the LINCS (Library of Integrated Network Cellular Signatures).

A substantive difference between the genomic technology based projects like ENCODE and GTeX and the one we seek to develop is the diversity of expertise in cell types and their physiological responses that will be required. The diversity will have to be complemented by a standardized approach so that data obtained is valid and reproducible across different cell types. Consequently, it would be prudent to start with an evaluation for the need for such a project, the successes and failures of the organizational formats of other large-scale data gathering projects and how such a future project can be structured to maximize participation of the R01/R21 biochemistry/cell physiology community through a distributed grant funding model for technology development and data gathering. Approaches utilized by BRAIN initiative may be an operational model. The deliverables from the first workshop will be 1) an assessment of need for such a large-scale project, and if there is consensus on need, then, 2) an initial plan for a Human Quantitative Dynamics project, which will experimentally catalog the concentrations of all proteins and other key cellular components and the kinetic parameters that govern their interactions in cell types from major organs of the human body. This quantitative experimental cataloging of cellular reactions will be anchored in the human genome and transcriptome at one end, and human cell physiological responses at the other end and 3) the contours of a pilot project to test feasibility of the HQD project

The HQD experimental data can be organized in a computable database to enable the construction of pathways and networks for graph theory-based analyses, and for building small or large well-constrained dynamical models for predictive simulations of physiological and pathophysiological behaviors. A successful HQD informatics approach will produce fully organized and functionally annotated lists of the biochemical properties of all cellular proteins (and other cellular components), their participation in biochemical reactions and the ability of these reactions to produce emergent cellular physiology. This knowledge when integrated with genomic and epigenomic information will enable the development of computer-based algorithms for personalized and precision medicine. For evaluation of the feasibility of a HQD project, we will bring together experimental, bioinformatics and computational experts along with renowned cell biologists and physiologists with expertise in multiple cell types, tissues and organs to evaluate current status and chart a path forward.

ORGANIZATION: This three-day workshop will be widely advertised, free and open to all, and will be held at the NIH and will be complemented by web-based discussion The workshop will focus on

(a) Analyzing the strengths, weaknesses and cost /benefit ratios of recently completed large scale projects

(b) Evaluating the current status of knowledge – i.e. gap analysis for development of a quantitative dynamic interactome with comprehensive coverage of proteins, lipids and other metabolites in human cells such as neurons, cardiac myocytes, T and B cells and in model organisms like the worm for identity & concentrations of components and kinetic parameters that govern interactions between components

(c) Developing mock-ups of pilot projects at the cell (human) and whole organism level (C. elegans)

(d) Assessing current biochemistry and cell physiology technologies and determining how these can be further developed in a high throughput manner.

(e) Identifying the key metrics required to evaluate the pilot projects.

(f) Developing plans for a “Lessons Learnt” analysis of the successes and failures in the pilot project

Deliverables Within two months of the first workshop we will produce two detailed reports 1) the strength and weaknesses of large-scale projects and funding models 2) the gap analysis and an initial work plan to identify the technologies both experimental and computational that need to be developed for a successful HQD project and contours of a pilot project. These documents which will be made freely available could serve as the basis for additional workshops focused on technology development, engaging the appropriate research communities and detailed work plan for the pilot projects if the community chooses to go forward.

The ability to diagnose and stage complex diseases that are progressive and treat them with drugs requires a mechanistic understanding of the underlying physiology and pathophysiology. After a period of exponential growth in genomic knowledge, we may eventually plateau where the qualitative cataloging of various genomic, epigenomic and postgenomic characteristics and their relationships provide some predictive capabilities; but do not allow us to fully understand the mechanisms underlying the origins of progression of complex diseases such as cancers, heart failure, diabetes, chronic kidney disease and aneurysms, to develop therapeutic strategies or drugs that can halt or reverse their course. Even in cancers where progress in understanding cell regulatory pathways and development of targeted therapeutics including biologics have been substantial (1) progress is difficult, with new drugs work incrementally by raising survival by months rather than years (2-4). One underlying cause for this slow progress is our lack of global (genome wide) mechanistic understanding of the quantitative reactions between proteins and their other cellular components in terms of their interactions and enzymatic reactions and how these contribute to normal cell physiology and pathophysiology in the context of all the other ongoing interactions. The lack of global mechanistic understanding arises from an incomplete cataloging of quantitative characteristics of the interactions between gene products in the various cell types, and how these interactions give rise to emergent physiological functions. The situation is akin to what human genetics was in the eighties and nineties, when many disease genes were being identified and sequenced, but this knowledge could not be fully understood or used due to a lack of context. The sequencing and annotation of the human genome changed this in a fundamental manner. Similarly, a comprehensive quantitative catalog of the dynamics of interactions of components within different human cells in the context of emergent physiological functions will greatly advance our knowledge of human diseases and enable new therapeutic strategies.
This workshop could be innovative for the following reasons: 1) Evaluation of the pros and cons of large-scale projects and new models of distribute funding 2) Development of a broad bottom-up transdisciplinary intellectual framework to evaluate how mapping comprehensive of concentrations of cellular components, their interactions and enzymatic reactions in various cell types in the human body can be transformative in enabling the use of genomics and epigenomics in personalized and precision medicine and 3) Development of approaches to integrate and empower the large number of R01-type researchers and research projects in biochemistry, cell biology and cell physiology with the ‘’omics” community.

Rationale for the Workshop

Evaluation of Past Large Scale Projects for Future Projects.

Biomedical Sciences in the US has for over 50 years been largely conducted as an entrepreneurial enterprise where an individual investigator has an idea, writes a grant proposal, gets funded & runs a lab with about 3-4 people for the project, and publishes peer-reviewed papers. One of us, Ravi Iyengar has used this model with continuous R01 funding since 1980. This model has been enormously successful in advancing biomedical knowledge. However as research has progressed it has become clear that large-scale projects such as the human genome project greatly enhance our knowledge base. Successful large-scale research projects include the Human Genome Project (5, 6) and Protein Structure Initiative that has yielded over 6300 structures in 13 years (7).Nevertheless, the latter project has been criticized for underutilization by the community (8). More recently the ENCODE project (9,10) has focused on mapping function to all regions of the human genome (11). Some claims in the ENCODE papers have been challenged (12) and there is evolving consensus that some of early claims may have been overstated. However, even serious critics have acknowledged the substantial value of the ENCODE project (13). The completion of the major portion of the ENCODE project and the ongoing Library of Network Based Cellular Signatures (LINCS) program that, as its website states “aims to catalog the changes in gene expression and other cellular process that occur when cells are exposed to a variety of perturbing agents, could provide a base to embark on a project to quantitatively measure, in detail, the molecular characteristics of proteins in different types of cells in humans along with their dynamic relationships to cellular and when feasible to tissue/organ phenotypes. In the proposed workshop, we will critically evaluate the value of the large-scale projects and how such projects can be integrated with science supported by individual research grants. Both supporters and critics of large-scale projects will participate and the session will be divided between talks and open discussion forums.

Evaluation of Why a Human Quantitative Dynamics Project Now?

We now have substantial knowledge of genomic and epigenomic characteristics of the human genome and the computational capabilities that in theory could allow us to move explicitly from genes to proteins to cellular phenotypes in a comprehensive manner which can lead to organ and organismal phenotypes However, the ability to translate this vast genomic information into predictive physiology is limited by the lack of quantitative data on the concentrations of cellular components and their reaction rates. Currently such information is sparse and has been gathered episodically for some cell subsystems. We hypothesize that there is a real need for experimental data constrained dynamical models of cellular biochemical systems to connect genomic differences to alter physiology in a reliable and predictive manner. To do this in a systematic and standardized manner, a well-thoughtout large-scale project could be one approach. A thoughtful essay about ENCODE (13) by Eddy lays outs some potential benefits of large-scale project issues involved.

Distributed Models of Funding in Large-Scale Projects . One of the enduring criticisms of large scale projects is that these projects siphon money from smaller individual research grants. While this may have been partially true in the past, the structuring of funding opportunities for the recent BRAIN initiative as U01 grants similar in size to a typical R01 potentially provides a mechanism for development of a cohesive large-scale project that distributes resources broadly and draws upon a more diverse set of expertise. We anticipate this could be relevant for the HQD project. So we will discuss the pros and cons of distributed funding models.

Scientific Need for Planning The proposed workshops are critical to assess the feasibility of and plan for the HQD project. Several issues need to be discussed. They are 1) Integrative logic for different biomedical disciplines. HQD will need researchers from several disciplines doing small and large scale science within an integrated framework. Typically cellular neuroscientists, immunologists and nephrologists do not go to the same meetings. Neither do biochemists, cell biologists and bioinformaticists. Yet we need researchers from all these fields to work together. The first workshop will bring researchers from these disparate fields together. 2) Scalable technologies: Many of the technologies needed for the HQD are used for small–scale experiments. These include surface plasma resonance experiments, enzymatic assays, and cell physiological assays. We need to figure out how these technologies can be further developed for higher throughput measurements and standardized across cell types and different kinds of measurements. We will need metadata specification in using these technologies for determining the reliability and reproducibility of HQD data. 3) Versatility in Informatics: currently, there are few databases that provide the type of quantitative and functional information of the type that will be gathered in the HQD project. Most large databases are useful for statistical and graph theory based analysis. We need to bring together different types of experts in informatics and database schema with the experimentalists gathering the functional data and the modelers of dynamical systems so that the data gathering and informatics are seamlessly integrated. If the first workshop is successful, then the second workshop could focus on experimental technologies and the third workshop on informatics and computational technologies.

Scientific issues to be evaluated for the development of the HQD Project. Here we describe an initial list of objectives for the HQD project that need discussion and analysis. It is intended to provide an initial framework for the first workshop to enable the community to develop a bottom-up approach on key requirements and approaches. We anticipate a plan will arise from the workshops and from the community discussions that will follow. To start the discussion we propose the following

A. Mapping Experiments – Five types of molecular measurements can be envisaged in the HQD project. Four of these measurements will utilize chemical/biochemical assays. The fifth will utilize a combination of siRNA based knockdown and measurement of physiological/pathophysiological responses at the whole cell level to connect cellular components and interactions to emergent cell physiology. Each of these measurements could be carried out with and without an appropriate perturbant for the cell type being studied. The goal will be to achieve a comprehensive dynamic mapping with adequate quality and reproducibility.

I. Levels (cellular concentrations) of proteins, and selected lipids, sugars, nucleotides, and ions Mapping of proteins in each cell type will be comprehensive and rooted in the genome and ENCODE data. Both mass spectrometry and quantitative immunoblottting approaches will be utilized to measure levels of proteins. If needed additional antibodies to human proteins will be produced. Other cellular components such as lipids, sugars, nucleotides and ions will be cataloged using mass spectrometry on an “as needed” basis, guided by relationships between proteins and the emergent phenotype of the cell type being studied.

ii. Marcomolecular Protein Complexes Protein complexes in the cytoplasm and the various organelles will be mapped in a comprehensive manner in each cell type. Immunoprecipitation followed by mass spectrometry is a feasible approach that has been successfully used to map human nuclear co-regulator complexes.

iii. Interactions between Cellular Components Protein-protein interactions, ligand-(drug)-protein interactions, and interactions between proteins and other cellular components will be measured. The interactions between proteins will be characterized qualitatively using yeast two-hybrid measurements. Quantitative interactions between proteins and other cellular components will be obtained by multiple approaches that will include quantitative measurements of kon and koff rates by surface plasmon resonance.

iv Enzymatic Functions Enzyme activities in each cell type can be obtained utilizing standardized assays to determine Km, Vmax and Kcat for all enzymes in each cell type using fluorescent or colorimetric assays.

v. Physiological Functions A combination of how siRNA knock-down and overexpression of the protein(s) of interest in both population of cells and at the signal cell level affect measurements of whole cell physiological responses, such as hormone secretion by endocrine cells or action potential by excitable cells such as myocytes, will be used to connect individual cellular components and interactions to emergent physiological functions. State-of-the-art technologies such as mass spectrometery-based imaging and optogenetics may be adapted for quantitative measurement of whole cell physiological responses.

B. Gap Analysis of Experiments - Each of the measurements listed above has been carried out in small research projects in some cell types. There are some larger scale studies such as in protein-protein interactions, but none that are comprehensive for any cell type. We need gap analysis to identify what is available and how these available resources can be leveraged to achieve medium to high-throughput capabilities during the production phase of a pilot project, which could also be used to develop and evaluate the various methods for their ability to produce high quality data (and metadata) that is reproducible in well–defined standard formats. Experimental optimization would include sample preparation and treatment under well-defined conditions (metadata) to ensure reproducibility in small-scale follow-up experiments. Optimization should focus on getting comprehensive measurements to provide a detailed description of the interactions underlying the emergent physiological and pathophysiological behaviors

C. Gap Analysis for Scalability and Technology Development - Technology development for high throughput measurements in biochemistry and physiology has lagged behind those in genomics. The first workshop will analyze what is out there and what is needed. The second and third workshop can then focus on the technologies that need to be developed. The HQD project could invest in technology development for new methods as well as adaptation and optimization of current methods. There are large and vibrant chemical, electrical and mechanical engineering communities capable of such technology development in collaboration with biochemists and cell biologists. The second and third workshops can serve as forums to engage the engineering communities for technology development.

D. Storage and Integration of Data

i. Informatics The data from the various types of measurements may be integrated into a single HQD database organized by cell type and cellular phenotype. There are very few databases that provide quantitative information of the type described here for mammalian systems. There is likely to be extensive discussions and analyses to develop suitable databases schema for HQD. These workshops can help in getting this discussion going. We envisage that the gene products in the HQD database will be anchored in the human genome database available on NCBI resources and UniProtKB,(14) and could be linked to other databases like Genecards(15). HQD database should allow the user to track the flow of relationship between genes, proteins, other cellular components and phenotypic behaviors using the framework of functional pathways for each cell type to make genotype-to-phenotype connections. As biochemical and functional experiments can give variable results discussions focused on metadata that define experimental conditions in sufficient depth to enable reproducibility will be important. The HQD database will integrate the data collected within the project with data in the literature and other data portals such as ENCODE, LINCS and TCGA. The pathway framework for HQD project database can utilize prior knowledge for organization of the pathways and networks as basis for integrating experimental data obtained by HQD with data from the published literature. Since the HQD data will be obtained under standardized formats, it is likely to be different quantitatively and sometimes qualitatively from the data in the published literature. The data from the literature can be annotated to highlight similarities and differences with the HQD data.

ii. Computation The HQD data will enable building of independent computational models by users world-wide. The HQD database schema can be optimized to enable two types of computations: a) Building and analysis of networks using graph theory to map network topology and identify the various regulatory motifs such as feedback, feed-forward and bi-fan motifs within cellular networks. As such motifs give rise to emergent physiological and pathophysiological functions; the HQD database should enable a wide range of R21/ R01 studies on disease mechanisms b) Construction of dynamical models for predictive simulations. The HQD database will have quantitative information for the construction of both deterministic and stochastic models that can then be used for simulations of phenotypic behavior such as disease progression and enhanced PK/PD models for drug development.

E. Pilot Projects: i Worm Quantitative Dynamics Model organisms studies provide deep understanding of basic genetics and biochemistry and serve as test beds for development of systems-level projects. For example, whole genome sequencing was pioneered yeast worms and flies. Studies in worm enabled development of genome-scale open reading frame cloning (16), yeast two-hybrid protein-protein interaction mapping (17), yeast one-hybrid protein-DNA interaction mapping (18), in vivo expression pattern mapping (19- 21) and RNAi screening (22-24). C. elegans is a relatively simple animal with a fixed lineage of fewer than 1000 somatic cells, yet is has many of the complexities of human biology and physiology. Fundamental biological processes such as programmed cell death, regulation by non-coding RNA such as microRNA and Ras signaling have been either first described in the worm, and/or major advances in our understanding have been achieved by using this model organism. Recent work of Bargmann (25) and Walhout (26) have shown how relationships from receptor polymorphisms to cellular regulatory networks to feeding behavior can be analyzed. In a pilot project we could determine how to comprehensively quantitatively characterize 2-3 relevant cells and relate their activity to organismal behavior like feeding. ii Human Cell Pilot The workshop will analyze and discuss the pros and cons of a human cell type (primary or established) that can be used to conduct all the mapping experiments and the informatics and computational tools to organize the experimental data to connect the biochemical and cell physiological measurements. Several cell types such as cortical neurons, T or B cells, skin fibroblasts and cardiac myocytes can be evaluated. The choice will be a bottom-up community driven decision. The workshop and organizers will facilitate the discussion and selection process.

6 Detailed Description of the Workshop Only details of the first workshop is given here. The second and third workshop plans will be developed later depending on the success of the first workshop.. There will be five sessions with 4-6 relatively short (20 min) presentations followed by a 30-45 minute discussion forum. Both the talks and the discussion forums will be structured with suggested topics and questions that will be pre-distributed to all participants to guide the presentation and discussions. There will be 5 discussion forums- one after each session. Not all participants will give talks but hopefully all participants will take part in the discussion forums. If we get funding, we plan to hold the workshops in late 2015 early 2016. The first workshop will provide a forum to critically evaluate the successes and failures of large-scale projects and the need and potential of the HQD project. The first workshop will evaluate the current status of biochemistry and cell physiology in terms of strengths and weaknesses with respect to their ability to drive a genome-wide proteomics and quantitative functional mapping project. Pilot projects and demonstration projects in model organisms and current status of experimental and computational technologies will be broadly assessed. The types of new technologies that are needed and how they can be developed will be analyzed at a conceptual level. An output of this workshop will be well articulated goals for workshops 2 and 3.

Organizing Committee: Lead Organizers: Ravi Iyengar and Marian Walhout; Organizing Committee: Ravi Iyengar (Biochemistry, systems biology) Marian Walhout (Large Scale Interaction mapping, model organisms), Garret FitzGerald (Human Biochemistry, Physiology and Disease Mechanisms) Kara Dolinski (Bioinformatics, Quantitative Data Curation), Olga Troyanskaya (Bioinformatics, Statistical and Network modeling) Marc Vidal (Systems Biology , Genome wide protein-protein interaction mapping)

The organizing committee will develop the detail program for the workshop. This committee will make the final decisions regarding who to invite to speak and the selection of participants from the applicants. The selection process will take into account multiple criteria rooted in scientific excellence to ensure that there is adequate representation of women and underrepresented minorities. The committee will be responsible that the reports are written and released within two months.

Meeting Reports– White papers and Publication in Science Signaling

The two reports from first workshop will be published as white papers that will be freely available to the community. Report 1 will be general and deal with the Pros and cons of large-scale projects and funding models. Report 2 will focus on HQD and include gap analysis, technologies needed plans for a Pilot Project and topics for subsequent workshops. We will request permission from NIGMS to post these on NIGMS website where the Quantitative and Systems Pharmacology white paper that I co-authored was posted. It has been widely read and has become a very influential document in pharmacology. We will also make the reports freely available through the National Centers for Systems Biology portal (www.systemscenters.org). Both sites are freely accessible without registrations. We plan to publish the reports within two months of the workshop. The report will have an Executive Summary and detailed report will be 5-10 pages including all charts and tables. We plan to submit the HQD report for publication in Science Signaling. Please see letter from the Editor Dr. Nancy Gough that she will consider such a submission.

Community Engagement during the Planning Process . In spite of our best reach we anticipate that only a small fraction of the interested researchers will be able to attend the workshops. To get broader community input we will develop a website dedicated to the HQD workshop and it will be hosted within the Systems Biology Center-New York web site (www.sbcny.org) which gets 1500-2000 unique visitors per month. The HQD website will have a discussion forum where interested researchers can participate in asynchronous discussions to comment on the HQD project as well as the workshops. We will videotape the workshops and make them available on this site. If there is sufficient interest, a Google Hangout can be arranged for interactions within workshop organizers.

Usefulness of the Conference to the Scientific Community

The usefulness of the workshops in the long run will be the successful start of the HQD project. In the short run these workshops will be useful in bringing together multiple research fields including systems biology researchers who focus on different cell types and physiological functions. Since each human cell type has its own characteristics and distinct cell physiological functions researchers from different fields such as neurochemists, immunologists, cardiovascular biologists and endocrine cell biologists could benefit from the workshop. The three classes of researchers

Basic Scientists and support for research grants . Basic research into scalable mechanisms of physiology and pathophysiology is just in its infancy, and the workshops have the potential to be a major enabler of mechanistic research projects funded by the different institutes at the NIH.

Translational Researchers At a translational level, the workshops can help identify technologies that provide deep insight into disease mechanisms, catalyzing transformative change in identification of biomarker sets. Most complex conditions currently lack reliable biomarker sets for disease progression. .

Pharmaceutical Industry Researchers The workshop should be able to help the pharmaceutical/biotech industry in several ways including how projects such as the HQD can help inspire a new generation of PK/PD models and help mine the druggable genome

Diversity of Workshop invitees We will reach out to researchers from underrepresented communities to encourage participation in the workshops. We have started to reach out colleagues in biochemistry, cell biology and physiology departments of historically African-American colleges and medical schools to identify potential workshop participants. An initial list for potential star women participants is listed above. We will continue to add to this list. Dr Terry Krulwich, a close colleague of Ravi Iyengar has run a highly regarded PREP program for over 12 years now. Many of the alumni of this program have gone on to medical and graduate school across the country and now joining leading academic departments and industry. The process essentially involves phone calls or e-mails to identify leads to the individual who might be interested. It takes time the approach works as our efforts over the last six years in early recruitment for careers in biological mathematical modeling shows. Through Systems Biology Center New York we have recruited students from the non-research minority focused CUNY colleges for our summer research programs that develop expertise in mathematical modeling. These students many first in their families to go to college and rarely thinking about post graduate degrees have responded to our contacts with enthusiasm and over 40% of our summer trainees choose to go graduate or professional schools that utilize quantitative reasoning. We will use a similar approach in reaching out in an ongoing manner to early career basic and translational scientists to involve them in these planning meetings. We have every expectation and are confident that our workshops will as diverse as New York City and America.

7. Access to Individuals with Disabilities and Day Care arrangements We propose to hold the first workshop on the NIH campus, if possible in the Lister Auditorium or in Natcher Building. In order to make the workshop accessible to individuals with disabilities, we have initiated initial conversations with officials at the Bethesda Hyatt Regency which located on top of the Bethesda stop on the Red line of the DC Metro. The NIH campus is one stop away on the same line. Both stations have elevators and are ADA compliant as is the NIH campus. The Bethesda Hyatt Regency through its concierge services will allow us to hire licensed baby sitters who can provide childcare services. If we get 3 or more requests for child care we can obtain a small room at the hotel for daycare. Alternatively the children can be cared for in the hotel room of the participant. The cost of room rental and daycare will be borne as part of Workshop organization expenses from support funds provided by the Icahn School of Medicine Dean’s Office.