• Cell type ontologies of the Human Cell Atlas.
    Osumi-Sutherland D, Xu C, Keays M, Levine AP, Kharchenko PV, Regev A, Lein E, Teichmann SA.
    Nat Cell Biol. 2021 Nov;23(11):1129-1135. doi: 10.1038/s41556-021-00787-7.
    Abstract: Massive single-cell profiling efforts have accelerated our discovery of the cellular composition of the human body while at the same time raising the need to formalize this new knowledge. Here, we discuss current efforts to harmonize and integrate different sources of annotations of cell types and states into a reference cell ontology. We illustrate with examples how a unified ontology can consolidate and advance our understanding of cell types across scientific communities and biological domains.
  • Human prostate cancer bone metastases have an actionable immunosuppressive microenvironment
    Kfoury Y, Baryawno N, Severe N, Mei S, Gustafsson K, Hirz T, Brouse T, Scadden EW, Igolkina AA, Kokkaliaris K, Choi BD, Barkas N, Randolph MA, Shin JH, Saylor PJ, Scadden DT, Sykes DB, Kharchenko PV; as part of the Boston Bone Metastases Consortium.
    Cancer Cell. 2021 Oct 15:S1535-6108(21)00494-3. doi: 10.1016/j.ccell.2021.09.005.
    Abstract: Bone metastases are devastating complications of cancer. They are particularly common in prostate cancer (PCa), represent incurable disease, and are refractory to immunotherapy. We seek to define distinct features of the bone marrow (BM) microenvironment by analyzing single cells from bone metastatic prostate tumors, involved BM, uninvolved BM, and BM from cancer-free, orthopedic patients, and healthy individuals. Metastatic PCa is associated with multifaceted immune distortion, specifically exhaustion of distinct T cell subsets, appearance of macrophages with states specific to PCa bone metastases. The chemokine CCL20 is notably overexpressed by myeloid cells, as is its cognate CCR6 receptor on T cells. Disruption of the CCL20-CCR6 axis in mice with syngeneic PCa bone metastases restores T cell reactivity and significantly prolongs animal survival. Comparative high-resolution analysis of PCa bone metastases shows a targeted approach for relieving local immunosuppression for therapeutic effect.
  • Cell segmentation in imaging-based spatial transcriptomics
    Petukhov V, Xu RJ, Soldatov RA, Cadinu P, Khodosevich K, Moffitt JR, Kharchenko PV.
    Nat Biotechnol. 2021 Oct 14. doi: 10.1038/s41587-021-01044-w.
    Abstract: Single-molecule spatial transcriptomics protocols based on in situ sequencing or multiplexed RNA fluorescent hybridization can reveal detailed tissue organization. However, distinguishing the boundaries of individual cells in such data is challenging and can hamper downstream analysis. Current methods generally approximate cells positions using nuclei stains. We describe a segmentation method, Baysor, that optimizes two-dimensional (2D) or three-dimensional (3D) cell boundaries considering joint likelihood of transcriptional composition and cell morphology. While Baysor can take into account segmentation based on co-stains, it can also perform segmentation based on the detected transcripts alone. To evaluate performance, we extend multiplexed error-robust fluorescence in situ hybridization (MERFISH) to incorporate immunostaining of cell boundaries. Using this and other benchmarks, we show that Baysor segmentation can, in some cases, nearly double the number of cells compared to existing tools while reducing segmentation artifacts. We demonstrate that Baysor performs well on data acquired using five different protocols, making it a useful general tool for analysis of imaging-based spatial transcriptomics.
  • A multimodal cell census and atlas of the mammalian primary motor cortex.
    BRAIN Initiative Cell Census Network (BICCN).
    Nature. 2021 Oct;598(7879):86-102. doi: 10.1038/s41586-021-03950-0.
    Abstract: Here we report the generation of a multimodal cell census and atlas of the mammalian primary motor cortex as the initial product of the BRAIN Initiative Cell Census Network (BICCN). This was achieved by coordinated large-scale analyses of single-cell transcriptomes, chromatin accessibility, DNA methylomes, spatially resolved single-cell transcriptomes, morphological and electrophysiological properties and cellular resolution input-output mapping, integrated through cross-modal computational analysis. Our results advance the collective knowledge and understanding of brain cell-type organization1-5. First, our study reveals a unified molecular genetic landscape of cortical cell types that integrates their transcriptome, open chromatin and DNA methylation maps. Second, cross-species analysis achieves a consensus taxonomy of transcriptomic types and their hierarchical organization that is conserved from mouse to marmoset and human. Third, in situ single-cell transcriptomics provides a spatially resolved cell-type atlas of the motor cortex. Fourth, cross-modal analysis provides compelling evidence for the transcriptomic, epigenomic and gene regulatory basis of neuronal phenotypes such as their physiological and anatomical properties, demonstrating the biological validity and genomic underpinning of neuron types. We further present an extensive genetic toolset for targeting glutamatergic neuron types towards linking their molecular and developmental identity to their circuit function. Together, our results establish a unifying and mechanistic framework of neuronal cell-type organization that integrates multi-layered molecular genetic and spatial information with multi-faceted phenotypic properties.
  • A transcriptomic and epigenomic cell atlas of the mouse primary motor cortex
    Yao Z, Liu H, Xie F, Fischer S, Adkins RS, Aldridge AI, Ament SA, Bartlett A, Behrens MM, Van den Berge K, Bertagnolli D, de Bézieux HR, Biancalani T, Booeshaghi AS, Bravo HC, Casper T, Colantuoni C, Crabtree J, Creasy H, Crichton K, Crow M, Dee N, Dougherty EL, Doyle WI, Dudoit S, Fang R, Felix V, Fong O, Giglio M, Goldy J, Hawrylycz M, Herb BR, Hertzano R, Hou X, Hu Q, Kancherla J, Kroll M, Lathia K, Li YE, Lucero JD, Luo C, Mahurkar A, McMillen D, Nadaf NM, Nery JR, Nguyen TN, Niu SY, Ntranos V, Orvis J, Osteen JK, Pham T, Pinto-Duarte A, Poirion O, Preissl S, Purdom E, Rimorin C, Risso D, Rivkin AC, Smith K, Street K, Sulc J, Svensson V, Tieu M, Torkelson A, Tung H, Vaishnav ED, Vanderburg CR, van Velthoven C, Wang X, White OR, Huang ZJ, Kharchenko PV, Pachter L, Ngai J, Regev A, Tasic B, Welch JD, Gillis J, Macosko EZ, Ren B, Ecker JR, Zeng H, Mukamel EA.
    Nature. 2021 Oct;598(7879):103-110. doi: 10.1038/s41586-021-03500-8
    Abstract: Single-cell transcriptomics can provide quantitative molecular signatures for large, unbiased samples of the diverse cell types in the brain1-3. With the proliferation of multi-omics datasets, a major challenge is to validate and integrate results into a biological understanding of cell-type organization. Here we generated transcriptomes and epigenomes from more than 500,000 individual cells in the mouse primary motor cortex, a structure that has an evolutionarily conserved role in locomotion. We developed computational and statistical methods to integrate multimodal data and quantitatively validate cell-type reproducibility. The resulting reference atlas-containing over 56 neuronal cell types that are highly replicable across analysis methods, sequencing technologies and modalities-is a comprehensive molecular and genomic account of the diverse neuronal and non-neuronal cell types in the mouse primary motor cortex. The atlas includes a population of excitatory neurons that resemble pyramidal cells in layer 4 in other cortical regions4. We further discovered thousands of concordant marker genes and gene regulatory elements for these cell types. Our results highlight the complex molecular regulation of cell types in the brain and will directly enable the design of reagents to target specific cell types in the mouse primary motor cortex for functional analysis.
  • Comparative cellular analysis of motor cortex in human, marmoset and mouse
    Bakken TE, Jorstad NL, Hu Q, Lake BB, Tian W, Kalmbach BE, Crow M, Hodge RD, Krienen FM, Sorensen SA, Eggermont J, Yao Z, Aevermann BD, Aldridge AI, Bartlett A, Bertagnolli D, Casper T, Castanon RG, Crichton K, Daigle TL, Dalley R, Dee N, Dembrow N, Diep D, Ding SL, Dong W, Fang R, Fischer S, Goldman M, Goldy J, Graybuck LT, Herb BR, Hou X, Kancherla J, Kroll M, Lathia K, van Lew B, Li YE, Liu CS, Liu H, Lucero JD, Mahurkar A, McMillen D, Miller JA, Moussa M, Nery JR, Nicovich PR, Niu SY, Orvis J, Osteen JK, Owen S, Palmer CR, Pham T, Plongthongkum N, Poirion O, Reed NM, Rimorin C, Rivkin A, Romanow WJ, Sedeño-Cortés AE, Siletti K, Somasundaram S, Sulc J, Tieu M, Torkelson A, Tung H, Wang X, Xie F, Yanny AM, Zhang R, Ament SA, Behrens MM, Bravo HC, Chun J, Dobin A, Gillis J, Hertzano R, Hof PR, Höllt T, Horwitz GD, Keene CD, Kharchenko PV, Ko AL, Lelieveldt BP, Luo C, Mukamel EA, Pinto-Duarte A, Preissl S, Regev A, Ren B, Scheuermann RH, Smith K, Spain WJ, White OR, Koch C, Hawrylycz M, Tasic B, Macosko EZ, McCarroll SA, Ting JT, Zeng H, Zhang K, Feng G, Ecker JR, Linnarsson S, Lein ES.
    Nature. 2021 Oct;598(7879):111-119. doi: 10.1038/s41586-021-03465-8. Epub 2021 Oct 6.
    Abstract: The primary motor cortex (M1) is essential for voluntary fine-motor control and is functionally conserved across mammals1. Here, using high-throughput transcriptomic and epigenomic profiling of more than 450,000 single nuclei in humans, marmoset monkeys and mice, we demonstrate a broadly conserved cellular makeup of this region, with similarities that mirror evolutionary distance and are consistent between the transcriptome and epigenome. The core conserved molecular identities of neuronal and non-neuronal cell types allow us to generate a cross-species consensus classification of cell types, and to infer conserved properties of cell types across species. Despite the overall conservation, however, many species-dependent specializations are apparent, including differences in cell-type proportions, gene expression, DNA methylation and chromatin state. Few cell-type marker genes are conserved across species, revealing a short list of candidate genes and regulatory mechanisms that are responsible for conserved features of homologous cell types, such as the GABAergic chandelier cells. This consensus transcriptomic classification allows us to use patch-seq (a combination of whole-cell patch-clamp recordings, RNA sequencing and morphological characterization) to identify corticospinal Betz cells from layer 5 in non-human primates and humans, and to characterize their highly specialized physiology and anatomy. These findings highlight the robust molecular underpinnings of cell-type diversity in M1 across mammals, and point to the genes and regulatory pathways responsible for the functional identity of cell types and their species-specific adaptations.
  • Single-nuclei transcriptomes from human adrenal gland reveal distinct cellular identities of low and high-risk neuroblastoma tumors
    Bedoya-Reina OC, Li W, Arceo M, Plescher M, Bullova P, Pui H, Kaucka M, Kharchenko P, Martinsson T, Holmberg J, Adameyko I, Deng Q, Larsson C, Juhlin CC, Kogner P, Schlisio S.
    Nat Commun. 2021 Sep 7;12(1):5309. doi: 10.1038/s41467-021-24870-7.
    Abstract: Childhood neuroblastoma has a remarkable variability in outcome. Age at diagnosis is one of the most important prognostic factors, with children less than 1 year old having favorable outcomes. Here we study single-cell and single-nuclei transcriptomes of neuroblastoma with different clinical risk groups and stages, including healthy adrenal gland. We compare tumor cell populations with embryonic mouse sympatho-adrenal derivatives, and post-natal human adrenal gland. We provide evidence that low and high-risk neuroblastoma have different cell identities, representing two disease entities. Low-risk neuroblastoma presents a transcriptome that resembles sympatho- and chromaffin cells, whereas malignant cells enriched in high-risk neuroblastoma resembles a subtype of TRKB+ cholinergic progenitor population identified in human post-natal gland. Analyses of these populations reveal different gene expression programs for worst and better survival in correlation with age at diagnosis. Our findings reveal two cellular identities and a composition of human neuroblastoma tumors reflecting clinical heterogeneity and outcome.
  • Rewiring of human neurodevelopmental gene regulatory programs by human accelerated regions
    Girskis KM, Stergachis AB, DeGennaro EM, Doan RN, Qian X, Johnson MB, Wang PP, Sejourne GM, Nagy MA, Pollina EA, Sousa AMM, Shin T, Kenny CJ, Scotellaro JL, Debo BM, Gonzalez DM, Rento LM, Yeh RC, Song JHT, Beaudin M, Fan J, Kharchenko PV, Sestan N, Greenberg ME, Walsh CA.
    Neuron. 2021 Oct 20;109(20):3239-3251.e7. doi: 10.1016/j.neuron.2021.08.005.
    Abstract: Human accelerated regions (HARs) are the fastest-evolving regions of the human genome, and many are hypothesized to function as regulatory elements that drive human-specific gene regulatory programs. We interrogate the in vitro enhancer activity and in vivo epigenetic landscape of more than 3,100 HARs during human neurodevelopment, demonstrating that many HARs appear to act as neurodevelopmental enhancers and that sequence divergence at HARs has largely augmented their neuronal enhancer activity. Furthermore, we demonstrate PPP1R17 to be a putative HAR-regulated gene that has undergone remarkable rewiring of its cell type and developmental expression patterns between non-primates and primates and between non-human primates and humans. Finally, we show that PPP1R17 slows neural progenitor cell cycle progression, paralleling the cell cycle length increase seen predominantly in primate and especially human neurodevelopment. Our findings establish HARs as key components in rewiring human-specific neurodevelopmental gene regulatory programs and provide an integrated resource to study enhancer activity of specific HARs.
  • Diversity of developing peripheral glia revealed by single-cell RNA sequencing.
    Tasdemir-Yilmaz OE, Druckenbrod NR, Olukoya OO, Dong W, Yung AR, Bastille I, Pazyra-Murphy MF, Sitko AA, Hale EB, Vigneau S, Gimelbrant AA, Kharchenko PV, Goodrich LV, Segal RA.
    Dev Cell. 2021 Sep 13;56(17):2516-2535.e8. doi: 10.1016/j.devcel.2021.08.005.
    Abstract: The peripheral nervous system responds to a wide variety of sensory stimuli, a process that requires great neuronal diversity. These diverse neurons are closely associated with glial cells originating from the neural crest. However, the molecular nature and diversity among peripheral glia are not understood. Here, we used single-cell RNA sequencing to profile developing and mature glia from somatosensory dorsal root ganglia and auditory spiral ganglia. We found that glial precursors (GPs) in these two systems differ in their transcriptional profiles. Despite their unique features, somatosensory and auditory GPs undergo convergent differentiation to generate molecularly uniform myelinating and non-myelinating Schwann cells. By contrast, somatosensory and auditory satellite glial cells retain system-specific features. Lastly, we identified a glial signature gene set, providing new insights into commonalities among glia across the nervous system. This survey of gene expression in peripheral glia constitutes a resource for understanding functions of glia across different sensory modalities.
  • RNA velocity-current challenges and future perspectives
    Bergen V, Soldatov RA, Kharchenko PV, Theis FJ.
    Mol Syst Biol 2021 Aug;17(8):e10282.
    Abstract: RNA velocity has enabled the recovery of directed dynamic information from single-cell transcriptomics by connecting measurements to the underlying kinetics of gene expression. This approach has opened up new ways of studying cellular dynamics. Here, we review the current state of RNA velocity modeling approaches, discuss various examples illustrating limitations and potential pitfalls, and provide guidance on how the ensuing challenges may be addressed. We then outline future directions on how to generalize the concept of RNA velocity to a wider variety of biological systems and modalities.
  • Population sequencing data reveal a compendium of mutational processes in the human germ line
    Seplyarskiy VB, Soldatov RA, Koch E, McGinty RJ, Goldmann JM, Hernandez RD, Barnes K, Correa A, Burchard EG, Ellinor PT, McGarvey ST, Mitchell BD, Vasan RS, Redline S, Silverman E, Weiss ST, Arnett DK, Blangero J, Boerwinkle E, He J, Montgomery C, Rao DC, Rotter JI, Taylor KD, Brody JA, Chen YI, de Las Fuentes L, Hwu CM, Rich SS, Manichaikul AW, Mychaleckyj JC, Palmer ND, Smith JA, Kardia SLR, Peyser PA, Bielak LF, O'Connor TD, Emery LS; NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium; TOPMed Population Genetics Working Group, Gilissen C, Wong WSW, Kharchenko PV, Sunyaev S.
    Science. 2021 Aug 12:eaba7408. doi: 10.1126/science.aba7408.
    Abstract: Biological mechanisms underlying human germline mutations remain largely unknown. We statistically decompose variation in the rate and spectra of mutations along the genome using volume-regularized nonnegative matrix factorization. The analysis of a sequencing dataset (TOPMed) reveals nine processes that explain the variation in mutation properties between loci. We provide a biological interpretation for seven of these processes. We associate one process with bulky DNA lesions that resolve asymmetrically with respect to transcription and replication. Two processes track direction of replication fork and replication timing, respectively. We identify a mutagenic effect of active demethylation primarily acting in regulatory regions and a mutagenic effect of LINE repeats. We localize a mutagenic process specific to oocytes from population sequencing data. This process appears transcriptionally asymmetric.
  • The triumphs and limitations of computational methods for scRNA-seq
    Kharchenko PV
    Nat Methods. 2021 Jul;18(7):723-732. doi: 10.1038/s41592-021-01171-x
    Abstract: The rapid progress of protocols for sequencing single-cell transcriptomes over the past decade has been accompanied by equally impressive advances in the computational methods for analysis of such data. As capacity and accuracy of the experimental techniques grew, the emerging algorithm developments revealed increasingly complex facets of the underlying biology, from cell type composition to gene regulation to developmental dynamics. At the same time, rapid growth has forced continuous reevaluation of the underlying statistical models, experimental aims, and sheer volumes of data processing that are handled by these computational tools. Here, I review key computational steps of single-cell RNA sequencing (scRNA-seq) analysis, examine assumptions made by different approaches, and highlight successes, remaining ambiguities, and limitations that are important to keep in mind as scRNA-seq becomes a mainstream technique for studying biology.
  • Evolutionary switch in expression of key markers between mouse and human leads to mis-assignment of cell types in developing adrenal medulla
    Kameneva P, V Artemov A, Kastriti ME, Sundström E, Kharchenko PV, Adameyko I
    Cancer Cell . 2021 May 10;39(5):590-591. doi: 10.1016/j.ccell.2021.04.009. Epub 2021 Apr 29.
  • Single-cell transcriptomics of human embryos identifies multiple sympathoblast lineages with potential implications for neuroblastoma origin
    Kameneva P, Artemov AV, Kastriti ME, Faure L, Olsen TK, Otte J, Erickson A, Semsch B, Andersson ER, Ratz M, Frisén J, Tischler AS, de Krijger RR, Bouderlique T, Akkuratova N, Vorontsova M, Gusev O, Fried K, Sundström E, Mei S, Kogner P, Baryawno N, Kharchenko PV*, Adameyko I*
    Nat Genet. 2021 May;53(5):694-706. doi: 10.1038/s41588-021-00818-x
    Abstract: Characterization of the progression of cellular states during human embryogenesis can provide insights into the origin of pediatric diseases. We examined the transcriptional states of neural crest- and mesoderm-derived lineages differentiating into adrenal glands, kidneys, endothelium and hematopoietic tissue between post-conception weeks 6 and 14 of human development. Our results reveal transitions connecting the intermediate mesoderm and progenitors of organ primordia, the hematopoietic system and endothelial subtypes. Unexpectedly, by using a combination of single-cell transcriptomics and lineage tracing, we found that intra-adrenal sympathoblasts at that stage are directly derived from nerve-associated Schwann cell precursors, similarly to local chromaffin cells, whereas the majority of extra-adrenal sympathoblasts arise from the migratory neural crest. In humans, this process persists during several weeks of development within the large intra-adrenal ganglia-like structures, which may also serve as reservoirs of originating cells in neuroblastoma.
  • Single cell transcriptomics of primate sensory neurons identifies cell types associated with chronic pain
    Kupari J, Usoskin D, Parisien M, Lou D, Hu Y, Fatt M, Lönnerberg P, Spångberg M, Eriksson B, Barkas N, Kharchenko PV, Loré K, Khoury S, Diatchenko L, Ernfors P.
    Nat Commun . 2021 Mar 8;12(1):1510. doi: 10.1038/s41467-021-21725-z.
    Abstract: Distinct types of dorsal root ganglion sensory neurons may have unique contributions to chronic pain. Identification of primate sensory neuron types is critical for understanding the cellular origin and heritability of chronic pain. However, molecular insights into the primate sensory neurons are missing. Here we classify non-human primate dorsal root ganglion sensory neurons based on their transcriptome and map human pain heritability to neuronal types. First, we identified cell correlates between two major datasets for mouse sensory neuron types. Machine learning exposes an overall cross-species conservation of somatosensory neurons between primate and mouse, although with differences at individual gene level, highlighting the importance of primate data for clinical translation. We map genomic loci associated with chronic pain in human onto primate sensory neuron types to identify the cellular origin of chronic pain. Genome-wide associations for chronic pain converge on two different neuronal types distributed between pain disorders that display different genetic susceptibilities, suggesting both unique and shared mechanisms between different pain conditions.
  • Single-cell proteomic and transcriptomic analysis of macrophage heterogeneity using SCoPE2
    Specht H, Emmott E, Petelski AA, Huffman RG, Perlman DH, Serra M, Kharchenko P, Koller A, Slavov N.
    Genome Biol . 2021 Jan 27;22(1):50. doi: 10.1186/s13059-021-02267-5.
    Abstract: Background: Macrophages are innate immune cells with diverse functional and molecular phenotypes. This diversity is largely unexplored at the level of single-cell proteomes because of the limitations of quantitative single-cell protein analysis.
    Results: To overcome this limitation, we develop SCoPE2, which substantially increases quantitative accuracy and throughput while lowering cost and hands-on time by introducing automated and miniaturized sample preparation. These advances enable us to analyze the emergence of cellular heterogeneity as homogeneous monocytes differentiate into macrophage-like cells in the absence of polarizing cytokines. SCoPE2 quantifies over 3042 proteins in 1490 single monocytes and macrophages in 10 days of instrument time, and the quantified proteins allow us to discern single cells by cell type. Furthermore, the data uncover a continuous gradient of proteome states for the macrophages, suggesting that macrophage heterogeneity may emerge in the absence of polarizing cytokines. Parallel measurements of transcripts by 10× Genomics suggest that our measurements sample 20-fold more protein copies than RNA copies per gene, and thus, SCoPE2 supports quantification with improved count statistics. This allowed exploring regulatory interactions, such as interactions between the tumor suppressor p53, its transcript, and the transcripts of genes regulated by p53.
    Conclusions: Even in a homogeneous environment, macrophage proteomes are heterogeneous. This heterogeneity correlates to the inflammatory axis of classically and alternatively activated macrophages. Our methodology lays the foundation for automated and quantitative single-cell analysis of proteins by mass spectrometry and demonstrates the potential for inferring transcriptional and post-transcriptional regulation from variability across single cells.
  • Identification of epilepsy-associated neuronal subtypes and gene expression underlying epileptogenesis
    Pfisterer U, Petukhov V, Demharter S, Meichsner J, Thompson JJ, Batiuk MY, Asenjo-Martinez A, Vasistha NA, Thakur A, Mikkelsen J, Adorjan I, Pinborg LH, Pers TH, von Engelhardt J, Kharchenko PV*, Khodosevich K*
    Nat Commun . 2020 Oct 7;11(1):5038. doi: 10.1038/s41467-020-18752-7.
    Abstract: Epilepsy is one of the most common neurological disorders, yet its pathophysiology is poorly understood due to the high complexity of affected neuronal circuits. To identify dysfunctional neuronal subtypes underlying seizure activity in the human brain, we have performed single-nucleus transcriptomics analysis of >110,000 neuronal transcriptomes derived from temporal cortex samples of multiple temporal lobe epilepsy and non-epileptic subjects. We found that the largest transcriptomic changes occur in distinct neuronal subtypes from several families of principal neurons (L5-6_Fezf2 and L2-3_Cux2) and GABAergic interneurons (Sst and Pvalb), whereas other subtypes in the same families were less affected. Furthermore, the subtypes with the largest epilepsy-related transcriptomic changes may belong to the same circuit, since we observed coordinated transcriptomic shifts across these subtypes. Glutamate signaling exhibited one of the strongest dysregulations in epilepsy, highlighted by layer-wise transcriptional changes in multiple glutamate receptor genes and strong upregulation of genes coding for AMPA receptor auxiliary subunits. Overall, our data reveal a neuronal subtype-specific molecular phenotype of epilepsy.
  • Dental cell type atlas reveals stem and differentiated cell types in mouse and human teeth
    Krivanek J, Soldatov RA, Kastriti ME, Chontorotzea T, Herdina AN, Petersen J, Szarowska B, Landova M, Matejova VK, Holla LI, Kuchler U, Zdrilic IV, Vijaykumar A, Balic A, Marangoni P, Klein OD, Neves VCM, Yianni V, Sharpe PT, Harkany T, Metscher BD, Bajénoff M, Mina M, Fried K, Kharchenko PV*, Adameyko I*
    Nat Commun . 2020 Sep 23;11(1):4816. doi: 10.1038/s41467-020-18512-7.
    Abstract: Understanding cell types and mechanisms of dental growth is essential for reconstruction and engineering of teeth. Therefore, we investigated cellular composition of growing and non-growing mouse and human teeth. As a result, we report an unappreciated cellular complexity of the continuously-growing mouse incisor, which suggests a coherent model of cell dynamics enabling unarrested growth. This model relies on spatially-restricted stem, progenitor and differentiated populations in the epithelial and mesenchymal compartments underlying the coordinated expansion of two major branches of pulpal cells and diverse epithelial subtypes. Further comparisons of human and mouse teeth yield both parallelisms and differences in tissue heterogeneity and highlight the specifics behind growing and non-growing modes. Despite being similar at a coarse level, mouse and human teeth reveal molecular differences and species-specific cell subtypes suggesting possible evolutionary divergence. Overall, here we provide an atlas of human and mouse teeth with a focus on growth and differentiation.
  • Distinct evolutionary paths in chronic lymphocytic leukemia during resistance to the graft-versus-leukemia effect
    Bachireddy P, Ennis C, Nguyen VN, Gohil SH, Clement K, Shukla SA, Forman J, Barkas N, Freeman S, Bavli N, Elagina L, Leshchiner I, Mohammad AW, Mathewson ND, Keskin DB, Rassenti LZ, Kipps TJ, Brown JR, Getz G, Ho VT, Gnirke A, Neuberg D, Soiffer RJ, Ritz J, Alyea EP, Kharchenko PV, Wu CJ.
    Sci Transl Med . 2020 Sep 16;12(561):eabb7661. doi: 10.1126/scitranslmed.abb7661.
    Abstract: Leukemic relapse remains a major barrier to successful allogeneic hematopoietic stem cell transplantation (allo-HSCT) for aggressive hematologic malignancies. The basis for relapse of advanced lymphoid malignancies remains incompletely understood and may involve escape from the graft-versus-leukemia (GvL) effect. We hypothesized that for patients with chronic lymphocytic leukemia (CLL) treated with allo-HSCT, leukemic cell-intrinsic features influence transplant outcomes by directing the evolutionary trajectories of CLL cells. Integrated genetic, transcriptomic, and epigenetic analyses of CLL cells from 10 patients revealed that the clinical kinetics of post-HSCT relapse are shaped by distinct molecular dynamics. Early relapses after allo-HSCT exhibited notable genetic stability; single CLL cell transcriptional analysis demonstrated a cellular heterogeneity that was static over time. In contrast, CLL cells relapsing late after allo-HSCT displayed notable genetic evolution and evidence of neoantigen depletion, consistent with marked single-cell transcriptional shifts that were unique to each patient. We observed a greater rate of epigenetic change for late relapses not seen in early relapses or relapses after chemotherapy alone, suggesting that the selection pressures of the GvL bottleneck are unlike those imposed by chemotherapy. No selective advantage for human leukocyte antigen (HLA) loss was observed, even when present in pretransplant subpopulations. Gain of stem cell modules was a common signature associated with leukemia relapse regardless of posttransplant relapse kinetics. These data elucidate the biological pathways that underlie GvL resistance and posttransplant relapse.
  • Joint analysis of heterogeneous single-cell RNA-seq dataset collections
    Barkas N, Petukhov V, Nikolaeva D, Lozinsky Y, Demharter S, Khodosevich K, Kharchenko PV.
    Nat Methods. 2019 Jul 15. doi: 10.1038/s41592-019-0466-z.
    Abstract: Single-cell RNA sequencing is often applied in study designs that include multiple individuals, conditions or tissues. To identify recurrent cell subpopulations in such heterogeneous collections, we developed Conos, an approach that relies on multiple plausible inter-sample mappings to construct a global graph connecting all measured cells. The graph enables identification of recurrent cell clusters and propagation of information between datasets in multi-sample or atlas-scale collections.
  • Spatiotemporal structure of cell fate decisions in murine neural crest
    Soldatov R, Kaucka M, Kastriti ME, Petersen J, Chontorotzea T, Englmaier L, Akkuratova N, Yang Y, Häring M, Dyachuk V, Bock C, Farlik M, Piacentino ML, Boismoreau F, Hilscher MM, Yokota C, Qian X, Nilsson M, Bronner ME, Croci L, Hsiao WY, Guertin DA, Brunet JF, Consalez GG, Ernfors P, Fried K, Kharchenko PV*, Adameyko I*.
    Science. 2019 Jun 7;364(6444). pii: eaas9536. doi: 10.1126/science.aas9536.
    Abstract: Neural crest cells are embryonic progenitors that generate numerous cell types in vertebrates. With single-cell analysis, we show that mouse trunk neural crest cells become biased toward neuronal lineages when they delaminate from the neural tube, whereas cranial neural crest cells acquire ectomesenchyme potential dependent on activation of the transcription factor Twist1. The choices that neural crest cells make to become sensory, glial, autonomic, or mesenchymal cells can be formalized as a series of sequential binary decisions. Each branch of the decision tree involves initial coactivation of bipotential properties followed by gradual shifts toward commitment. Competing fate programs are coactivated before cells acquire fate-specific phenotypic traits. Determination of a specific fate is achieved by increased synchronization of relevant programs and concurrent repression of competing fate programs.
  • RNA velocity of single cells
    La Manno G, Soldatov R, Zeisel A, Braun E, Hochgerner H, Petukhov V, Lidschreiber K, Kastriti ME, Lönnerberg P, Furlan A, Fan J, Borm LE, Liu Z, van Bruggen D, Guo J, He X, Barker R, Sundström E, Castelo-Branco G, Cramer P, Adameyko I, Linnarsson S*, Kharchenko PV*
    Nature. 2018, doi: 10.1038/s41586-018-0414-6
    Abstract: RNA abundance is a powerful indicator of the state of individual cells. Single-cell RNA sequencing can reveal RNA abundance with high quantitative accuracy, sensitivity and throughput1. However, this approach captures only a static snapshot at a point in time, posing a challenge for the analysis of time-resolved phenomena such as embryogenesis or tissue regeneration. Here we show that RNA velocity—the time derivative of the gene expression state can be directly estimated by distinguishing between unspliced and spliced mRNAs in common single-cell RNA sequencing protocols. RNA velocity is a high-dimensional vector that predicts the future state of individual cells on a timescale of hours. We validate its accuracy in the neural crest lineage, demonstrate its use on multiple published datasets and technical platforms, reveal the branching lineage tree of the developing mouse hippocampus, and examine the kinetics of transcription in human embryonic brain. We expect RNA velocity to greatly aid the analysis of developmental lineages and cellular dynamics, particularly in humans.
  • dropEst: pipeline for accurate estimation of molecular counts in droplet-based single-cell RNA-seq experiments
    Petukhov V, Guo J, Baryawno N, Severe N, Scadden DT, Samsonova MG, Kharchenko PV
    Genome Biol. 2018 Jun 19;19(1):78. doi: 10.1186/s13059-018-1449-6
    Abstract: Recent single-cell RNA-seq protocols based on droplet microfluidics use massively multiplexed barcoding to enable simultaneous measurements of transcriptomes for thousands of individual cells. The increasing complexity of such data creates challenges for subsequent computational processing and troubleshooting of these experiments, with few software options currently available. Here, we describe a flexible pipeline for processing droplet-based transcriptome data that implements barcode corrections, classification of cell quality, and diagnostic information about the droplet libraries. We introduce advanced methods for correcting composition bias and sequencing errors affecting cellular and molecular barcodes to provide more accurate estimates of molecular counts in individual cells.
  • Linking transcriptional and genetic tumor heterogeneity through allele analysis of single-cell RNA-seq data
    Fan J, Lee HO, Lee S, Ryu DE, Lee S, Xue C, Kim SJ, Kim K, Barkas N, Park PJ, Park WY, Kharchenko PV
    Genome Res. 2018 Aug;28(8):1217-1227. doi: 10.1101/gr.228080.117
    Abstract: Characterization of intratumoral heterogeneity is critical to cancer therapy, as the presence of phenotypically diverse cell populations commonly fuels relapse and resistance to treatment. Although genetic variation is a well-studied source of intratumoral heterogeneity, the functional impact of most genetic alterations remains unclear. Even less understood is the relative importance of other factors influencing heterogeneity, such as epigenetic state or tumor microenvironment. To investigate the relationship between genetic and transcriptional heterogeneity in a context of cancer progression, we devised a computational approach called HoneyBADGER to identify copy number variation and loss of heterozygosity in individual cells from single-cell RNA-sequencing data. By integrating allele and normalized expression information, HoneyBADGER is able to identify and infer the presence of subclone-specific alterations in individual cells and reconstruct the underlying subclonal architecture. By examining several tumor types, we show that HoneyBADGER is effective at identifying deletions, amplifications, and copy-neutral loss-of-heterozygosity events and is capable of robustly identifying subclonal focal alterations as small as 10 megabases. We further apply HoneyBADGER to analyze single cells from a progressive multiple myeloma patient to identify major genetic subclones that exhibit distinct transcriptional signatures relevant to cancer progression. Other prominent transcriptional subpopulations within these tumors did not line up with the genetic subclonal structure and were likely driven by alternative, nonclonal mechanisms. These results highlight the need for integrative analysis to understand the molecular and phenotypic heterogeneity in cancer.
  • Integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain
    Lake BB, Chen S, Sos BC, Fan J, Kaeser GE, Yung YC, Duong TE, Gao D, Chun J, Kharchenko PV, Zhang K.
    Nat Biotechnol. 2018 Jan;36(1):70-80. doi: 10.1038/nbt.4038
    Abstract: Detailed characterization of the cell types in the human brain requires scalable experimental approaches to examine multiple aspects of the molecular state of individual cells, as well as computational integration of the data to produce unified cell-state annotations. Here we report improved high-throughput methods for single-nucleus droplet-based sequencing (snDrop-seq) and single-cell transposome hypersensitive site sequencing (scTHS-seq). We used each method to acquire nuclear transcriptomic and DNA accessibility maps for >60,000 single cells from human adult visual cortex, frontal cortex, and cerebellum. Integration of these data revealed regulatory elements and transcription factors that underlie cell-type distinctions, providing a basis for the study of complex processes in the brain, such as genetic programs that coordinate adult remyelination. We also mapped disease-associated risk variants to specific cellular populations, which provided insights into normal and pathogenic cellular processes in the human brain. This integrative multi-omics approach permits more detailed single-cell interrogation of complex organs and tissues.
  • Rapid Mobilization Reveals a Highly Engraftable Hematopoietic Stem Cell
    Hoggatt J, Singh P, Tate TA, Chou BK, Datari SR, Fukuda S, Liu L, Kharchenko PV, Schajnovitz A, Baryawno N, Mercier FE, Boyer J, Gardner J, Morrow DM, Scadden DT, Pelus LM.
    Cell. 2018 Jan 11;172(1-2):191-204.e10
    Abstract: Hematopoietic stem cell transplantation is a potential curative therapy for malignant and nonmalignant diseases. Improving the efficiency of stem cell collection and the quality of the cells acquired can broaden the donor pool and improve patient outcomes. We developed a rapid stem cell mobilization regimen utilizing a unique CXCR2 agonist, GROβ, and the CXCR4 antagonist AMD3100. A single injection of both agents resulted in stem cell mobilization peaking within 15 min that was equivalent in magnitude to a standard multi-day regimen of granulocyte colony-stimulating factor (G-CSF). Mechanistic studies determined that rapid mobilization results from synergistic signaling on neutrophils, resulting in enhanced MMP-9 release, and unexpectedly revealed genetic polymorphisms in MMP-9 that alter activity. This mobilization regimen results in preferential trafficking of stem cells that demonstrate a higher engraftment efficiency than those mobilized by G-CSF. Our studies suggest a potential new strategy for the rapid collection of an improved hematopoietic graft.
  • Multipotent peripheral glial cells generate neuroendocrine cells of the adrenal medulla
    Furlan A, Dyachuk V, Kastriti ME, Calvo-Enrique L, Abdo H, Hadjab S, Chontorotzea T, Akkuratova N, Usoskin D, Kamenev D, Petersen J, Sunadome K, Memic F, Marklund U, Fried K, Topilko P, Lallemend F, Kharchenko PV, Ernfors P, Adameyko I.
    Science. 2017 Jul 7;357(6346)
    Abstract: Adrenaline is a fundamental circulating hormone for bodily responses to internal and external stressors. Chromaffin cells of the adrenal medulla (AM) represent the main neuroendocrine adrenergic component and are believed to differentiate from neural crest cells. We demonstrate that large numbers of chromaffin cells arise from peripheral glial stem cells, termed Schwann cell precursors (SCPs). SCPs migrate along the visceral motor nerve to the vicinity of the forming adrenal gland, where they detach from the nerve and form postsynaptic neuroendocrine chromaffin cells. An intricate molecular logic drives two sequential phases of gene expression, one unique for a distinct transient cellular state and another for cell type specification. Subsequently, these programs down-regulate SCP-gene and up-regulate chromaffin cell-gene networks. The AM forms through limited cell expansion and requires the recruitment of numerous SCPs. Thus, peripheral nerves serve as a stem cell niche for neuroendocrine system development.
  • A comparative strategy for single-nucleus and single-cell transcriptomes confirms accuracy in predicted cell-type expression from nuclear RNA
    Lake BB, Codeluppi S, Yung YC, Gao D, Chun J, Kharchenko PV, Linnarsson S, Zhang K.
    Sci Rep. 2017 Jul 20;7(1):6031.
    Abstract: Significant heterogeneities in gene expression among individual cells are typically interrogated using single whole cell approaches. However, tissues that have highly interconnected processes, such as in the brain, present unique challenges. Single-nucleus RNA sequencing (SNS) has emerged as an alternative method of assessing a cell's transcriptome through the use of isolated nuclei. However, studies directly comparing expression data between nuclei and whole cells are lacking. Here, we have characterized nuclear and whole cell transcriptomes in mouse single neurons and provided a normalization strategy to reduce method-specific differences related to the length of genic regions. We confirmed a high concordance between nuclear and whole cell transcriptomes in the expression of cell type and metabolic modeling markers, but less so for a subset of genes associated with mitochondrial respiration. Therefore, our results indicate that single-nucleus transcriptome sequencing provides an effective means to profile cell type expression dynamics in previously inaccessible tissues.
  • Integrated single-cell genetic and transcriptional analysis suggests novel drivers of chronic lymphocytic leukemia
    Wang L, Fan J, Francis JM, Georghiou G, Hergert S, Li S, Gambe R, Zhou CW, Yang C, Xiao S, Cin PD, Bowden M, Kotliar D, Shukla SA, Brown JR, Neuberg D, Alessi DR, Zhang CZ, Kharchenko PV, Livak KJ, Wu CJ.
    Genome Res. 2017 Jul 5.
    Abstract: Intra-tumoral genetic heterogeneity has been characterized across cancers by genome sequencing of bulk tumors, including chronic lymphocytic leukemia (CLL). In order to more accurately identify subclones, define phylogenetic relationships, and probe genotype-phenotype relationships, we developed methods for targeted mutation detection in DNA and RNA isolated from thousands of single cells from five CLL samples. By clearly resolving phylogenic relationships, we uncovered mutated LCP1 and WNK1 as novel CLL drivers, supported by functional evidence demonstrating their impact on CLL pathways. Integrative analysis of somatic mutations with transcriptional states prompts the idea that convergent evolution generates phenotypically similar cells in distinct genetic branches, thus creating a cohesive expression profile in each CLL sample despite the presence of genetic heterogeneity. Our study highlights the potential for single-cell RNA-based targeted analysis to sensitively determine transcriptional and mutational profiles of individual cancer cells, leading to increased understanding of driving events in malignancy.
  • Challenges and emerging directions in single-cell analysis
    Yuan GC, Cai L, Elowitz M, Enver T, Fan G, Guo G, Irizarry R, Kharchenko P, Kim J, Orkin S, Quackenbush J, Saadatpour A, Schroeder T, Shivdasani R, Tirosh I.
    Genome Biol. 2017 May 8;18(1):84.
    Abstract: Single-cell analysis is a rapidly evolving approach to characterize genome-scale molecular information at the individual cell level. Development of single-cell technologies and computational methods has enabled systematic investigation of cellular heterogeneity in a wide range of tissues and cell populations, yielding fresh insights into the composition, dynamics, and regulatory mechanisms of cell states in development and disease. Despite substantial advances, significant challenges remain in the analysis, integration, and interpretation of single-cell omics data. Here, we discuss the state of the field and recent advances and look to future opportunities.
  • Ectopic protein interactions within BRD4-chromatin complexes drive oncogenic megadomain formation in NUT midline carcinoma
    Alekseyenko AA, Walsh EM, Zee BM, Pakozdi T, Hsi P, Lemieux ME, Dal Cin P, Ince TA, Kharchenko PV, Kuroda MI, French CA.
    Proc Natl Acad Sci U S A. 2017 May 23;114(21):E4184-E4192
    Abstract: To investigate the mechanism that drives dramatic mistargeting of active chromatin in NUT midline carcinoma (NMC), we have identified protein interactions unique to the BRD4-NUT fusion oncoprotein compared with wild-type BRD4. Using cross-linking, affinity purification, and mass spectrometry, we identified the EP300 acetyltransferase as uniquely associated with BRD4 through the NUT fusion in both NMC and non-NMC cell types. We also discovered ZNF532 associated with BRD4-NUT in NMC patient cells but not detectable in 293T cells. EP300 and ZNF532 are both implicated in feed-forward regulatory loops leading to propagation of the oncogenic chromatin complex in BRD4-NUT patient cells. Adding key functional significance to our biochemical findings, we independently discovered a ZNF532-NUT translocation fusion in a newly diagnosed NMC patient. ChIP sequencing of the major players NUT, ZNF532, BRD4, EP300, and H3K27ac revealed the formation of ZNF532-NUT-associated hyperacetylated megadomains, distinctly localized but otherwise analogous to those found in BRD4-NUT patient cells. Our results support a model in which NMC is dependent on ectopic NUT-mediated interactions between EP300 and components of BRD4 regulatory complexes, leading to a cascade of misregulation.
  • Epigenetic Memory Underlies Cell-Autonomous Heterogeneous Behavior of Hematopoietic Stem Cells
    Yu VW, Yusuf RZ, Oki T, Wu J, Saez B, Wang X, Cook C, Baryawno N, Ziller MJ, Lee E, Gu H, Meissner A, Lin CP, Kharchenko PV*, Scadden DT*
    Cell. 2016 Nov 17;167(5):1310-1322.e17
    Abstract: Stem cells determine homeostasis and repair of many tissues and are increasingly recognized as functionally heterogeneous. To define the extent of-and molecular basis for-heterogeneity, we overlaid functional, transcriptional, and epigenetic attributes of hematopoietic stem cells (HSCs) at a clonal level using endogenous fluorescent tagging. Endogenous HSC had clone-specific functional attributes over time in vivo. The intra-clonal behaviors were highly stereotypic, conserved under the stress of transplantation, inflammation, and genotoxic injury, and associated with distinctive transcriptional, DNA methylation, and chromatin accessibility patterns. Further, HSC function corresponded to epigenetic configuration but not always to transcriptional state. Therefore, hematopoiesis under homeostatic and stress conditions represents the integrated action of highly heterogeneous clones of HSC with epigenetically scripted behaviors. This high degree of epigenetically driven cell autonomy among HSCs implies that refinement of the concepts of stem cell plasticity and of the stem cell niche is warranted.
  • Transcriptomic Characterization of SF3B1 Mutation Reveals Its Pleiotropic Effects in Chronic Lymphocytic Leukemia
    Wang L, Brooks AN, Fan J, Wan Y, Gambe R, Li S, Hergert S, Yin S, Freeman SS, Levin JZ, Fan L, Seiler M, Buonamici S, Smith PG, Chau KF, Cibulskis CL, Zhang W, Rassenti LZ, Ghia EM, Kipps TJ, Fernandes S, Bloch DB, Kotliar D, Landau DA, Shukla SA, Aster JC, Reed R, DeLuca DS, Brown JR, Neuberg D, Getz G, Livak KJ, Meyerson MM, Kharchenko PV, Wu CJ.
    Cancer Cell. 2016 Nov 14;30(5):750-763.
    Abstract: Mutations in SF3B1, which encodes a spliceosome component, are associated with poor outcome in chronic lymphocytic leukemia (CLL), but how these contribute to CLL progression remains poorly understood. We undertook a transcriptomic characterization of primary human CLL cells to identify transcripts and pathways affected by SF3B1 mutation. Splicing alterations, identified in the analysis of bulk cells, were confirmed in single SF3B1-mutated CLL cells and also found in cell lines ectopically expressing mutant SF3B1. SF3B1 mutation was found to dysregulate multiple cellular functions including DNA damage response, telomere maintenance, and Notch signaling (mediated through KLF8 upregulation, increased TERC and TERT expression, or altered splicing of DVL2 transcript, respectively). SF3B1 mutation leads to diverse changes in CLL-related pathways.
  • Cell-Type-Specific Alternative Splicing Governs Cell Fate in the Developing Cerebral Cortex
    Zhang X, Chen MH, Wu X, Kodani A, Fan J, Doan R, Ozawa M, Ma J, Yoshida N, Reiter JF, Black DL, Kharchenko PV, Sharp PA, Walsh CA.
    Cell. 2016 Aug 25;166(5):1147-1162.e15.
    Abstract: Alternative splicing is prevalent in the mammalian brain. To interrogate the functional role of alternative splicing in neural development, we analyzed purified neural progenitor cells (NPCs) and neurons from developing cerebral cortices, revealing hundreds of differentially spliced exons that preferentially alter key protein domains-especially in cytoskeletal proteins-and can harbor disease-causing mutations. We show that Ptbp1 and Rbfox proteins antagonistically govern the NPC-to-neuron transition by regulating neuron-specific exons. Whereas Ptbp1 maintains apical progenitors partly through suppressing a poison exon of Flna in NPCs, Rbfox proteins promote neuronal differentiation by switching Ninein from a centrosomal splice form in NPCs to a non-centrosomal isoform in neurons. We further uncover an intronic human mutation within a PTBP1-binding site that disrupts normal skipping of the FLNA poison exon in NPCs and causes a brain-specific malformation. Our study indicates that dynamic control of alternative splicing governs cell fate in cerebral cortical development.
  • Proximity-Based Differential Single-Cell Analysis of the Niche to Identify Stem/Progenitor Cell Regulators
    Silberstein L, Goncalves KA, Kharchenko PV, Turcotte R, Kfoury Y, Mercier F, Baryawno N, Severe N, Bachand J, Spencer JA, Papazian A, Lee D, Chitteti BR, Srour EF, Hoggatt J, Tate T, Lo Celso C, Ono N, Nutt S, Heino J, Sipilä K, Shioda T, Osawa M, Lin CP, Hu GF, Scadden DT.
    Cell Stem Cell. 2016 Oct 6;19(4):530-543.
    Abstract: Physiological stem cell function is regulated by secreted factors produced by niche cells. In this study, we describe an unbiased approach based on the differential single-cell gene expression analysis of mesenchymal osteolineage cells close to, and further removed from, hematopoietic stem/progenitor cells (HSPCs) to identify candidate niche factors. Mesenchymal cells displayed distinct molecular profiles based on their relative location. We functionally examined, among the genes that were preferentially expressed in proximal cells, three secreted or cell-surface molecules not previously connected to HSPC biology-the secreted RNase angiogenin, the cytokine IL18, and the adhesion molecule Embigin-and discovered that all of these factors are HSPC quiescence regulators. Therefore, our proximity-based differential single-cell approach reveals molecular heterogeneity within niche cells and can be used to identify novel extrinsic stem/progenitor cell regulators. Similar approaches could also be applied to other stem cell/niche pairs to advance the understanding of microenvironmental regulation of stem cell function.
  • Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis
    Fan J, Salathia N, Liu R, Kaeser GE, Yung YC, Herman JL, Kaper F, Fan J-B, Zhang K, Chun J, Kharchenko PV
    Nature Methods 2016 doi:10.1038/nmeth.3734
    Abstract: The transcriptional state of a cell reflects a variety of biological factors, from cell-type-specific features to transient processes such as the cell cycle, all of which may be of interest. However, identifying such aspects from noisy single-cell RNA-seq data remains challenging. We developed pathway and gene set overdispersion analysis (PAGODA) to resolve multiple, potentially overlapping aspects of transcriptional heterogeneity by testing gene sets for coordinated variability among measured cells.
  • Transcription factors LRF and BCL11A independently repress expression of fetal hemoglobin
    Masuda T, Wang X, Maeda M, Canver MC, Sher F, Funnell APW, Fisher C, Suciu M, Martyn GE, Norton LJ, Zhu C, Kurita R, Nakamura Y, Xu J, Higgs DR, Crossley M, Bauer DE, Orkin SH, Kharchenko PV*, Maeda T*
    Science 2016 Jan 15; 6270(351):285-289. doi: 10.1126/science.aad3312
    Abstract: Genes encoding human β-type globin undergo a developmental switch from embryonic to fetal to adult-type expression. Mutations in the adult form cause inherited hemoglobinopathies or globin disorders, including sickle cell disease and thalassemia. Some experimental results have suggested that these diseases could be treated by induction of fetal-type hemoglobin (HbF). However, the mechanisms that repress HbF in adults remain unclear. We found that the LRF/ZBTB7A transcription factor occupies fetal γ-globin genes and maintains the nucleosome density necessary for γ-globin gene silencing in adults, and that LRF confers its repressive activity through a NuRD repressor complex independent of the fetal globin repressor BCL11A. Our study may provide additional opportunities for therapeutic targeting in the treatment of hemoglobinopathies.
  • Pericentromeric satellite repeat expansions through RNA-derived DNA intermediates in cancer
    Bersani F, Lee E, Kharchenko PV, Xu AW, Liu M, Xega K, MacKenzie OC, Brannigan BW, Wittner BS, Jung H, Ramaswamy S, Park PJ, Maheswaran S, Ting DT, Haber DA.
    Proc Natl Acad Sci U S A. 2015 Dec 8;112(49):15148-53. doi: 10.1073/pnas.1518008112.
    Abstract: Aberrant transcription of the pericentromeric human satellite II (HSATII) repeat is present in a wide variety of epithelial cancers. In deriving experimental systems to study its deregulation, we observed that HSATII expression is induced in colon cancer cells cultured as xenografts or under nonadherent conditions in vitro, but it is rapidly lost in standard 2D cultures. Unexpectedly, physiological induction of endogenous HSATII RNA, as well as introduction of synthetic HSATII transcripts, generated cDNA intermediates in the form of DNA/RNA hybrids. Single molecule sequencing of tumor xenografts showed that HSATII RNA-derived DNA (rdDNA) molecules are stably incorporated within pericentromeric loci. Suppression of RT activity using small molecule inhibitors reduced HSATII copy gain. Analysis of whole-genome sequencing data revealed that HSATII copy number gain is a common feature in primary human colon tumors and is associated with a lower overall survival. Together, our observations suggest that cancer-associated derepression of specific repetitive sequences can promote their RNA-driven genomic expansion, with potential implications on pericentromeric architecture.
  • The oncogenic BRD4-NUT chromatin regulator drives aberrant transcription within large topological domains
    Alekseyenko AA, Walsh EM, Wang X, Grayson AR, Hsi PT, Kharchenko PV, Kuroda MI, French CA.
    Genes Dev. 2015 Jul 15;29(14):1507-23. doi: 10.1101/gad.267583.115.
    Abstract: NUT midline carcinoma (NMC), a subtype of squamous cell cancer, is one of the most aggressive human solid malignancies known. NMC is driven by the creation of a translocation oncoprotein, BRD4-NUT, which blocks differentiation and drives growth of NMC cells. BRD4-NUT forms distinctive nuclear foci in patient tumors, which we found correlate with ~100 unprecedented, hyperacetylated expanses of chromatin that reach up to 2 Mb in size. These "megadomains" appear to be the result of aberrant, feed-forward loops of acetylation and binding of acetylated histones that drive transcription of underlying DNA in NMC patient cells and naïve cells induced to express BRD4-NUT. Megadomain locations are typically cell lineage-specific; however, the cMYC and TP63 regions are targeted in all NMCs tested and play functional roles in tumor growth. Megadomains appear to originate from select pre-existing enhancers that progressively broaden but are ultimately delimited by topologically associating domain (TAD) boundaries. Therefore, our findings establish a basis for understanding the powerful role played by large-scale chromatin organization in normal and aberrant lineage-specific gene transcription.
  • DAZL regulates Tet1 translation in murine embryonic stem cells
    Welling M, Chen HH, Munoz J, Musheev MU, Kester L, Junker JP, Mischerikow N, Arbab M, Kuijk E, Silberstein L, Kharchenko PV, Geens M, Niehrs C, van de Velde H, van Oudenaarden A, Heck AJ, Geijsen N.
    EMBO Rep. 2015 Jul;16(7):791-802. doi: 10.15252/embr.201540538. Epub 2015 Jun 15.
    Abstract: Embryonic stem cell (ESC) cultures display a heterogeneous gene expression profile, ranging from a pristine naïve pluripotent state to a primed epiblast state. Addition of inhibitors of GSK3-beta and MEK (so-called 2i conditions) pushes ESC cultures toward a more homogeneous naïve pluripotent state, but the molecular underpinnings of this naïve transition are not completely understood. Here, we demonstrate that DAZL, an RNA-binding protein known to play a key role in germ-cell development, marks a subpopulation of ESCs that is actively transitioning toward naïve pluripotency. Moreover, DAZL plays an essential role in the active reprogramming of cytosine methylation. We demonstrate that DAZL associates with mRNA of Tet1, a catalyst of 5-hydroxylation of methyl-cytosine, and enhances Tet1 mRNA translation. Overexpression of DAZL in heterogeneous ESC cultures results in elevated TET1 protein levels as well as increased global hydroxymethylation. Conversely, null mutation of Dazl severely stunts 2i-mediated TET1 induction and hydroxymethylation. Our results provide insight into the regulation of the acquisition of naïve pluripotency and demonstrate that DAZL enhances TET1-mediated cytosine hydroxymethylation in ESCs that are actively reprogramming to a pluripotent ground state.
  • Epstein-Barr virus oncoprotein super-enhancers control B cell growth
    Zhou H, Schmidt SC, Jiang S, Willox B, Bernhardt K, Liang J, Johannsen EC, Kharchenko P, Gewurz BE, Kieff E, Zhao B.
    Cell Host Microbe. 2015 Feb 11;17(2):205-16. doi: 10.1016/j.chom.2014.12.013.
    Abstract: Super-enhancers are clusters of gene-regulatory sites bound by multiple transcription factors that govern cell transcription, development, phenotype, and oncogenesis. By examining Epstein-Barr virus (EBV)-transformed lymphoblastoid cell lines (LCLs), we identified four EBV oncoproteins and five EBV-activated NF-κB subunits co-occupying ~1,800 enhancer sites. Of these, 187 had markedly higher and broader histone H3K27ac signals, characteristic of super-enhancers, and were designated "EBV super-enhancers." EBV super-enhancer-associated genes included the MYC and BCL2 oncogenes, which enable LCL proliferation and survival. EBV super-enhancers were enriched for B cell transcription factor motifs and had high co-occupancy of STAT5 and NFAT transcription factors (TFs). EBV super-enhancer-associated genes were more highly expressed than other LCL genes. Disrupting EBV super-enhancers by the bromodomain inhibitor JQ1 or conditionally inactivating an EBV oncoprotein or NF-κB decreased MYC or BCL2 expression and arrested LCL growth. These findings provide insight into mechanisms of EBV-induced lymphoproliferation and identify potential therapeutic interventions.
  • BioTAP-XL: Cross-linking/Tandem Affinity Purification to Study DNA Targets, RNA, and Protein Components of Chromatin-Associated Complexes
    Alekseyenko AA, McElroy KA, Kang H, Zee BM, Kharchenko PV, Kuroda MI.
    Curr Protoc Mol Biol. 2015 Jan 5;109:21.30.1-21.30.32. doi: 10.1002/0471142727.mb2130s109.
    Abstract: In order to understand how chromatin complexes function in the nucleus, it is important to obtain a comprehensive picture of their protein, DNA, and RNA components, as well as their mutual interactions. This unit presents a chromatin cross-linking approach (BioTAP-XL) that utilizes a special BioTAP-tagged transgenic protein bait along with mass spectrometry to identify protein complex components, and high-throughput sequencing to identify RNA components and DNA binding sites. Full protocols are provided for Drosophila cells and for human cells in culture, along with an additional protocol for Drosophila embryos as the source material. A key element of the approach in all cases is the generation of control data from input chromatin samples.
  • Epstein-Barr virus nuclear antigen 3A partially coincides with EBNA3C genome-wide and is tethered to DNA through BATF complexes
    Schmidt SC, Jiang S, Zhou H, Willox B, Holthaus AM, Kharchenko PV, Johannsen EC, Kieff E, Zhao B.
    Proc Natl Acad Sci U S A. 2015 Jan 13;112(2):554-9. doi: 10.1073/pnas.1422580112
    Abstract: Epstein-Barr Virus (EBV) conversion of B-lymphocytes to Lymphoblastoid Cell Lines (LCLs) requires four EBV nuclear antigen (EBNA) oncoproteins: EBNA2, EBNALP, EBNA3A, and EBNA3C. EBNA2 and EBNALP associate with EBV and cell enhancers, up-regulate the EBNA promoter, MYC, and EBV Latent infection Membrane Proteins (LMPs), which up-regulate BCL2 to protect EBV-infected B-cells from MYC proliferation-induced cell death. LCL proliferation induces p16(INK4A) and p14(ARF)-mediated cell senescence. EBNA3A and EBNA3C jointly suppress p16(INK4A) and p14(ARF), enabling continuous cell proliferation. Analyses of the EBNA3A human genome-wide ChIP-seq landscape revealed 37% of 10,000 EBNA3A sites to be at strong enhancers; 28% to be at weak enhancers; 4.4% to be at active promoters; and 6.9% to be at weak and poised promoters. EBNA3A colocalized with BATF-IRF4, ETS-IRF4, RUNX3, and other B-cell Transcription Factors (TFs). EBNA3A sites clustered into seven unique groups, with differing B-cell TFs and epigenetic marks. EBNA3A coincidence with BATF-IRF4 or RUNX3 was associated with stronger EBNA3A ChIP-Seq signals. EBNA3A was at MYC, CDKN2A/B, CCND2, CXCL9/10, and BCL2, together with RUNX3, BATF, IRF4, and SPI1. ChIP-re-ChIP revealed complexes of EBNA3A on DNA with BATF. These data strongly support a model in which EBNA3A is tethered to DNA through a BATF-containing protein complexes to enable continuous cell proliferation.
  • Locally disordered methylation forms the basis of intratumor methylome variation in chronic lymphocytic leukemia
    Landau DA, Clement K, Ziller MJ, Boyle P, Fan J, Gu H, Stevenson K, Sougnez C, Wang L, Li S, Kotliar D, Zhang W, Ghandi M, Garraway L, Fernandes SM, Livak KJ, Gabriel S, Gnirke A, Lander ES, Brown JR, Neuberg D, Kharchenko PV, Hacohen N, Getz G, Meissner A, Wu CJ.
    Cancer Cell. 2014 Dec 8;26(6):813-25. doi: 10.1016/j.ccell.2014.10.012.
    Abstract: Intratumoral heterogeneity plays a critical role in tumor evolution. To define the contribution of DNA methylation to heterogeneity within tumors, we performed genome-scale bisulfite sequencing of 104 primary chronic lymphocytic leukemias (CLLs). Compared with 26 normal B cell samples, CLLs consistently displayed higher intrasample variability of DNA methylation patterns across the genome, which appears to arise from stochastically disordered methylation in malignant cells. Transcriptome analysis of bulk and single CLL cells revealed that methylation disorder was linked to low-level expression. Disordered methylation was further associated with adverse clinical outcome. We therefore propose that disordered methylation plays a similar role to that of genetic instability, enhancing the ability of cancer cells to search for superior evolutionary trajectories.
  • Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing
    Usoskin D, Furlan A, Islam S, Abdo H, Lönnerberg P, Lou D, Hjerling-Leffler J, Haeggstrom J, Kharchenko O, Kharchenko PV, Linnarsson S, Ernfors P.
    Nature Neurosci. 2015 Jan;18(1):145-53. doi: 10.1038/nn.3881. Epub 2014 Nov 24.
    Abstract: The primary sensory system requires the integrated function of multiple cell types, although its full complexity remains unclear. We used comprehensive transcriptome analysis of 622 single mouse neurons to classify them in an unbiased manner, independent of any a priori knowledge of sensory subtypes. Our results reveal eleven types: three distinct low-threshold mechanoreceptive neurons, two proprioceptive, and six principal types of thermosensitive, itch sensitive, type C low-threshold mechanosensitive and nociceptive neurons with markedly different molecular and operational properties. Confirming previously anticipated major neuronal types, our results also classify and provide markers for new, functionally distinct subtypes. For example, our results suggest that itching during inflammatory skin diseases such as atopic dermatitis is linked to a distinct itch-generating type. We demonstrate single-cell RNA-seq as an effective strategy for dissecting sensory responsive cells into distinct neuronal types. The resulting catalog illustrates the diversity of sensory types and the cellular complexity underlying somatic sensation.
  • Comparative analysis of metazoan chromatin organization
    Ho JW, Jung YL, Liu T, Alver BH, Lee S, Ikegami K, Sohn KA, Minoda A, Tolstorukov MY, Appert A, Parker SC, Gu T, Kundaje A, Riddle NC, Bishop E, Egelhofer TA, Hu SS, Alekseyenko AA, Rechtsteiner A, Asker D, Belsky JA, Bowman SK, Chen QB, Chen RA, Day DS, Dong Y, Dose AC, Duan X, Epstein CB, Ercan S, Feingold EA, Ferrari F, Garrigues JM, Gehlenborg N, Good PJ, Haseley P, He D, Herrmann M, Hoffman MM, Jeffers TE, Kharchenko PV, Kolasinska-Zwierz P, Kotwaliwale CV, Kumar N, Langley SA, Larschan EN, Latorre I, Libbrecht MW, Lin X, Park R, Pazin MJ, Pham HN, Plachetka A, Qin B, Schwartz YB, Shoresh N, Stempor P, Vielle A, Wang C, Whittle CM, Xue H, Kingston RE, Kim JH, Bernstein BE, Dernburg AF, Pirrotta V, Kuroda MI, Noble WS, Tullius TD, Kellis M, MacAlpine DM, Strome S, Elgin SC, Liu XS, Lieb JD, Ahringer J, Karpen GH, Park PJ.
    Nature. 2014 Aug 28;512(7515):449-52. doi: 10.1038/nature13415.
    Abstract: Genome function is dynamically regulated in part by chromatin, which consists of the histones, non-histone proteins and RNA molecules that package DNA. Studies in Caenorhabditis elegans and Drosophila melanogaster have contributed substantially to our understanding of molecular mechanisms of genome function in humans, and have revealed conservation of chromatin components and mechanisms. Nevertheless, the three organisms have markedly different genome sizes, chromosome architecture and gene organization. On human and fly chromosomes, for example, pericentric heterochromatin flanks single centromeres, whereas worm chromosomes have dispersed heterochromatin-like regions enriched in the distal chromosomal 'arms', and centromeres distributed along their lengths. To systematically investigate chromatin organization and associated gene regulation across species, we generated and analysed a large collection of genome-wide chromatin data sets from cell lines and developmental stages in worm, fly and human. Here we present over 800 new data sets from our ENCODE and modENCODE consortia, bringing the total to over 1,400. Comparison of combinatorial patterns of histone modifications, nuclear lamina-associated domains, organization of large-scale topological domains, chromatin environment at promoters and enhancers, nucleosome positioning, and DNA replication patterns reveals many conserved features of chromatin organization among the three organisms. We also find notable differences in the composition and locations of repressive chromatin. These data sets and analyses provide a rich resource for comparative and species-specific investigations of chromatin composition, organization and function.
  • Heterochromatin-associated interactions of Drosophila HP1a with dADD1, HIPP1, and repetitive RNAs
    Alekseyenko AA, Gorchakov AA, Zee BM, Fuchs SM, Kharchenko PV, Kuroda MI.
    Genes Dev. 2014 Jul 1;28(13):1445-60. doi: 10.1101/gad.241950.114
    Abstract: Heterochromatin protein 1 (HP1a) has conserved roles in gene silencing and heterochromatin and is also implicated in transcription, DNA replication, and repair. Here we identify chromatin-associated protein and RNA interactions of HP1a by BioTAP-XL mass spectrometry and sequencing from Drosophila S2 cells, embryos, larvae, and adults. Our results reveal an extensive list of known and novel HP1a-interacting proteins, of which we selected three for validation. A strong novel interactor, dADD1 (Drosophila ADD1) (CG8290), is highly enriched in heterochromatin, harbors an ADD domain similar to human ATRX, displays selective binding to H3K9me2 and H3K9me3, and is a classic genetic suppressor of position-effect variegation. Unexpectedly, a second hit, HIPP1 (HP1 and insulator partner protein-1) (CG3680), is strongly connected to CP190-related complexes localized at putative insulator sequences throughout the genome in addition to its colocalization with HP1a in heterochromatin. A third interactor, the histone methyltransferase MES-4, is also enriched in heterochromatin. In addition to these protein-protein interactions, we found that HP1a selectively associated with a broad set of RNAs transcribed from repetitive regions. We propose that this rich network of previously undiscovered interactions will define how HP1a complexes perform their diverse functions in cells and developing organisms.
  • Bayesian approach to single-cell differential expression analysis
    Kharchenko PV, Silberstein L, Scadden DT.
    Nat Methods. 2014 Jul;11(7):740-2. doi: 10.1038/nmeth.2967
    Abstract: Single-cell data provide a means to dissect the composition of complex tissues and specialized cellular environments. However, the analysis of such measurements is complicated by high levels of technical noise and intrinsic biological variability. We describe a probabilistic model of expression-magnitude distortions typical of single-cell RNA-sequencing measurements, which enables detection of differential expression signatures and identification of subpopulations of cells in a way that is more tolerant of noise.
  • Reciprocal interactions of human C10orf12 and C17orf96 with PRC2 revealed by BioTAP-XL cross-linking and affinity purification
    Alekseyenko AA, Gorchakov AA, Kharchenko PV, Kuroda MI.
    Proc Natl Acad Sci U S A. 2014 Feb 18;111(7):2488-93. doi: 10.1073/pnas.1400648111
    Abstract: Understanding the composition of epigenetic regulators remains an important challenge in chromatin biology. Traditional biochemical analysis of chromatin-associated complexes requires their release from DNA under conditions that can also disrupt key interactions. Here we develop a complementary approach (BioTAP-XL), in which cross-linking (XL) enhances the preservation of protein interactions and also allows the analysis of DNA targets under the same tandem affinity purification (BioTAP) regimen. We demonstrate the power of BioTAP-XL through analysis of human EZH2, a core subunit of polycomb repressive complex 2 (PRC2). We identify and validate two strong interactors, C10orf12 and C17orf96, which display enrichment with EZH2-BioTAP at levels similar to canonical PRC2 components (SUZ12, EED, MTF2, JARID2, PHF1, and AEBP2). ChIP-seq analysis of BioTAP-tagged C10orf12 or C17orf96 revealed the similarity of each binding pattern with the location of EZH2 and the H3K27me3-silencing mark, validating their physical interaction with PRC2 components. Interestingly, analysis by mass spectrometry of C10orf12 and C17orf96 interactions revealed that these proteins may be mutually exclusive PRC2 subunits that fail to interact with each other or with JARID2 and AEBP2. C10orf12, in addition, shows a strong and unexpected association with components of the EHMT1/2 complex, thus potentially connecting PRC2 to another histone methyltransferase. Similarly, results from CBX4-BioTAP protein pulldowns are consistent with reports of a diversity of PRC1 complexes. Our results highlight the importance of reciprocal analyses of multiple subunits and suggest that iterative use of BioTAP-XL has strong potential to reveal networks of chromatin-based interactions in higher organisms.
  • Chromatin proteins captured by ChIP-mass spectrometry are linked to dosage compensation in Drosophila
    Wang CI, Alekseyenko AA, Leroy G, Elia AE, Gorchakov AA, Britton LM, Elledge SJ, Kharchenko PV, Garcia BA, Kuroda MI.
    Nature Struct Mol Biol. 2013 Jan 6. doi: 10.1038/nsmb.2477.
    Abstract: X-chromosome dosage compensation by the MSL (male-specific lethal) complex is required in Drosophila melanogaster to increase gene expression from the single male X to equal that of both female X chromosomes. Instead of focusing solely on protein complexes released from DNA, here we used chromatin-interacting protein MS (ChIP-MS) to identify MSL interactions on cross-linked chromatin. We identified MSL-enriched histone modifications, including histone H4 Lys16 acetylation and histone H3 Lys36 methylation, and CG4747, a putative Lys36-trimethylated histone H3 (H3K36me3)-binding protein. CG4747 is associated with the bodies of active genes, coincident with H3K36me3, and is mislocalized in the Set2 mutant lacking H3K36me3. CG4747 loss of function in vivo results in partial mislocalization of the MSL complex to autosomes, and RNA interference experiments confirm that CG4747 and Set2 function together to facilitate targeting of the MSL complex to active genes, validating the ChIP-MS approach.
  • Enrichment of HP1a on Drosophila Chromosome 4 Genes Creates an Alternate Chromatin Structure Critical for Regulation in this Heterochromatic Domain
    Riddle NC, Jung YL, Gu T, Alekseyenko AA, Asker D, Gui H, Kharchenko PV, Minoda A, Plachetka A, Schwartz YB, Tolstorukov MY, Kuroda MI, Pirrotta V, Karpen GH, Park PJ, Elgin SC.
    PLoS Genet. 2012 Sep;8(9):e1002954
    Abstract: Chromatin environments differ greatly within a eukaryotic genome, depending on expression state, chromosomal location, and nuclear position. In genomic regions characterized by high repeat content and high gene density, chromatin structure must silence transposable elements but permit expression of embedded genes. We have investigated one such region, chromosome 4 of Drosophila melanogaster. Using chromatin-immunoprecipitation followed by microarray (ChIP-chip) analysis, we examined enrichment patterns of 20 histone modifications and 25 chromosomal proteins in S2 and BG3 cells, as well as the changes in several marks resulting from mutations in key proteins. Active genes on chromosome 4 are distinct from those in euchromatin or pericentric heterochromatin: while there is a depletion of silencing marks at the transcription start sites (TSSs), HP1a and H3K9me3, but not H3K9me2, are enriched strongly over gene bodies. Intriguingly, genes on chromosome 4 are less frequently associated with paused polymerase. However, when the chromatin is altered by depleting HP1a or POF, the RNA pol II enrichment patterns of many chromosome 4 genes shift, showing a significant decrease over gene bodies but not at TSSs, accompanied by lower expression of those genes. Chromosome 4 genes have a low incidence of TRL/GAGA factor binding sites and a low T(m) downstream of the TSS, characteristics that could contribute to a low incidence of RNA polymerase pausing. Our data also indicate that EGG and POF jointly regulate H3K9 methylation and promote HP1a binding over gene bodies, while HP1a targeting and H3K9 methylation are maintained at the repeats by an independent mechanism. The HP1a-enriched, POF-associated chromatin structure over the gene bodies may represent one type of adaptation for genes embedded in repetitive DNA.
  • ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia
    Landt SG, Marinov GK, Kundaje A, Kheradpour P, Pauli F, Batzoglou S, Bernstein BE, Bickel P, Brown JB, Cayting P, Chen Y, DeSalvo G, Epstein C, Fisher-Aylor KI, Euskirchen G, Gerstein M, Gertz J, Hartemink AJ, Hoffman MM, Iyer VR, Jung YL, Karmakar S, Kellis M, Kharchenko PV, Li Q, Liu T, Liu XS, Ma L, Milosavljevic A, Myers RM, Park PJ, Pazin MJ, Perry MD, Raha D, Reddy TE, Rozowsky J, Shoresh N, Sidow A, Slattery M, Stamatoyannopoulos JA, Tolstorukov MY, White KP, Xi S, Farnham PJ, Lieb JD, Wold BJ, Snyder M.
    Genome Res. 2012 Sep;22(9):1813-31.
    Abstract: Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.
  • An integrated encyclopedia of DNA elements in the human genome
    ENCODE consortium
    Nature. 2012 Sep 6;489(7414):57-74.
    Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.
  • Nature and function of insulator protein binding sites in the Drosophila genome
    Schwartz YB, Linder-Basso D*, Kharchenko PV*, Tolstorukov MY*, Kim M, Li HB, Gorchakov AA, Minoda A, Shanower G, Alekseyenko AA, Riddle NC, Jung YL, Gu T, Plachetka A, Elgin SC, Kuroda MI, Park PJ, Savitsky M, Karpen GH, Pirrotta V.
    Genome Res. 2012 Jul 5 [Epub ahead of print]
    Abstract: Chromatin insulator elements and associated proteins have been proposed to partition eukaryotic genomes into sets of independently regulated domains. Here we test this hypothesis by quantitative genome-wide analysis of insulator protein binding to Drosophila chromatin. We find distinct combinatorial binding of insulator proteins to different classes of sites and uncover a novel type of insulator element that binds CP190 but not any other known insulator proteins. Functional characterization of different classes of binding sites indicates that only a small fraction act as robust insulators in standard enhancer-blocking assays. We show that insulators restrict the spreading of the H3K27me3 mark but only at a small number of Polycomb target regions and only to prevent repressive histone methylation within adjacent genes that are already transcriptionally inactive. RNAi knockdown of insulator proteins in cultured cells does not lead to major alterations in genome expression. Taken together these observations argue against the concept of a genome partitioned by specialized boundary elements and suggest that insulators are reserved for specific regulation of selected genes.
  • Landscape of somatic retrotransposition in human cancers
    Lee E, Iskow R, Yang L, Gokcumen O, Haseley P, Luquette III LJ, Lohr JG, Harris CC, Ding L, Wilson RK, Wheeler DA, Gibbs RA, Kucherlapati R, Lee C, Kharchenko PV*, Park PJ*, and The Cancer Genome Atlas Research Network
    Science. 2012 Aug 24;337(6097):967-71
    Abstract: Transposable elements (TEs) are abundant in the human genome, and some are capable of generating new insertions through RNA intermediates. In cancer, the disruption of cellular mechanisms that normally suppress TE activity may facilitate mutagenic retrotranspositions. We performed single-nucleotide resolution analysis of TE insertions in 43 high-coverage whole-genome sequencing datasets from five cancer types. We identified 194 high-confidence somatic TE insertions, as well as thousands of polymorphic TE insertions in matched normal genomes. Somatic insertions were present in epithelial tumors but not in blood or brain cancers. Somatic L1 insertions tend to occur in genes that are commonly mutated in cancer, disrupt the expression of the target genes, and are biased toward regions of cancer-specific DNA hypomethylation, highlighting their potential impact in tumorigenesis.
  • Heterogeneity of the transition/transversion ratio in Drosophila and Hominidae genomes
    Seplyarskiy VB, Kharchenko P, Kondrashov AS, Bazykin GA.
    Mol Biol Evol. 2012 Feb 15
    Abstract: Mutation rate varies between sites in the genome. Part of this variation can be explained by well-recognized short nucleotide contexts, but a large component of this variation remains cryptic. We used data on interspecies divergence and intraspecies polymorphism in Drosophila and Hominidae to analyze variation of the average rate of the 12 possible kinds of single-nucleotide mutations and in the transition/transversion ratio k at single-nucleotide resolution. Both the average mutation rate and k vary by a factor of ~3 between nucleotide sites. The characteristic scale of variation in k is up to at least ~30 nucleotides in Drosophila and ~5 nucleotides in Hominidae. Genome segments with locally elevated mutation rates possess lower values of k; however, a substantial fraction of variation in k cannot be directly explained by the local mutation rates.
  • The genomic binding sites of a noncoding RNA
    Simon MD, Wang CI, Kharchenko PV, West JA, Chapman BA, Alekseyenko AA, Borowsky ML, Kuroda MI, Kingston RE.
    Proc Natl Acad Sci USA. 2011 Dec 20;108(51):20497-502
    Abstract: Long noncoding RNAs (lncRNAs) have important regulatory roles and can function at the level of chromatin. To determine where lncRNAs bind to chromatin, we developed capture hybridization analysis of RNA targets (CHART), a hybridization-based technique that specifically enriches endogenous RNAs along with their targets from reversibly cross-linked chromatin extracts. CHART was used to enrich the DNA and protein targets of endogenous lncRNAs from flies and humans. This analysis was extended to genome-wide mapping of roX2, a well-studied ncRNA involved in dosage compensation in Drosophila. CHART revealed that roX2 binds at specific genomic sites that coincide with the binding sites of proteins from the male-specific lethal complex that affects dosage compensation. These results reveal the genomic targets of roX2 and demonstrate how CHART can be used to study RNAs in a manner analogous to chromatin immunoprecipitation for proteins.
  • PI's Postdoctoral & graduate work

  • Evidence for dosage compensation between X and autosomes in mammals
    Kharchenko PV, Xi R, Park PJ.
    Nature Genetics, 2011 Nov 28;43(12):1167-9
  • Comprehensive analysis of the Drosophila melanogaster chromatin landscape differentiates among chromosomes, genes, and regulatory elements
    Kharchenko PV, Alekseyenko AA, Schwartz YB, Minoda A, Riddle NC, Ernst J, Sabo PJ, Larschan E, Gorchakov AA, Gu T, Linder-Basso D, Plachetka A, Shanower G, Tolstorukov MY, Bishop EP, Canfield TP, Sandstrom R, Thurman RE, Stamatoyannopoulos JA, Kellis M, Elgin SC, Kuroda MI, Pirrotta V, Karpen GH*, Park PJ*.
    Nature, 2011 Mar 24;471(7339):480-5
    Abstract: Chromatin is composed of DNA and a variety of modified histones and non-histone proteins, which have an impact on cell differentiation, gene regulation and other key cellular processes. Here we present a genome-wide chromatin landscape for Drosophila melanogaster based on eighteen histone modifications, summarized by nine prevalent combinatorial patterns. Integrative analysis with other data (non-histone chromatin proteins, DNase I hypersensitivity, GRO-Seq reads produced by engaged polymerase, short/long RNA products) reveals discrete characteristics of chromosomes, genes, regulatory elements and other functional domains. We find that active genes display distinct chromatin signatures that are correlated with disparate gene lengths, exon patterns, regulatory functions and genomic contexts. We also demonstrate a diversity of signatures among Polycomb targets that include a subset with paused polymerase. This systematic profiling and integrative analysis of chromatin signatures provides insights into how genomic elements are regulated, and will serve as a resource for future experimental investigations of genome structure and function.
  • X chromosome dosage compensation via enhanced transcriptional elongation in Drosophila males
    Larschan E*, Bishop EP*, Kharchenko PV, Core L, Lis JT, Park PJ, Kuroda MI.
    Nature, 2011 Mar 3;471(7336):115-8.
    Abstract: The evolution of sex chromosomes has resulted in numerous species in which females inherit two X chromosomes but males have a single X, thus requiring dosage compensation. MSL (Male-specific lethal) complex increases transcription on the single X chromosome of Drosophila males to equalize expression of X-linked genes between the sexes. The biochemical mechanisms used for dosage compensation must function over a wide dynamic range of transcription levels and differential expression patterns. It has been proposed that the MSL complex regulates transcriptional elongation to control dosage compensation, a model subsequently supported by mapping of the MSL complex and MSL-dependent histone 4 lysine 16 acetylation to the bodies of X-linked genes in males, with a bias towards 3' ends. However, experimental analysis of MSL function at the mechanistic level has been challenging owing to the small magnitude of the chromosome-wide effect and the lack of an in vitro system for biochemical analysis. Here we use global run-on sequencing (GRO-seq) to examine the specific effect of the MSL complex on RNA Polymerase II (RNAP II) on a genome-wide level. Results indicate that the MSL complex enhances transcription by facilitating the progression of RNAP II across the bodies of active X-linked genes. Improving transcriptional output downstream of typical gene-specific controls may explain how dosage compensation can be imposed on the diverse set of genes along an entire chromosome.
  • ChIP-chip versus ChIP-seq: A systematic comparison of two technologies
    Ho JW, Bishop EP, Kharchenko PV, Nègre N, White KP, Park PJ.
    BMC Genomics, 2011, 12:134
    Abstract: BACKGROUND: Chromatin immunoprecipitation (ChIP) followed by microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) allows genome-wide discovery of protein-DNA interactions such as transcription factor bindings and histone modifications. Previous reports only compared a small number of profiles, and little has been done to compare histone modification profiles generated by the two technologies or to assess the impact of input DNA libraries in ChIP-seq analysis. Here, we performed a systematic analysis of a modENCODE dataset consisting of 31 pairs of ChIP-chip/ChIP-seq profiles of the coactivator CBP, RNA polymerase II (RNA PolII), and six histone modifications across four developmental stages of Drosophila melanogaster. RESULTS: Both technologies produce highly reproducible profiles within each platform, ChIP-seq generally produces profiles with a better signal-to-noise ratio, and allows detection of more peaks and narrower peaks. The set of peaks identified by the two technologies can be significantly different, but the extent to which they differ varies depending on the factor and the analysis algorithm. Importantly, we found that there is a significant variation among multiple sequencing profiles of input DNA libraries and that this variation most likely arises from both differences in experimental condition and sequencing depth. We further show that using an inappropriate input DNA profile can impact the average signal profiles around genomic features and peak calling results, highlighting the importance of having high quality input DNA data for normalization in ChIP-seq analysis. CONCLUSIONS: Our findings highlight the biases present in each of the platforms, show the variability that can arise from both technology and analysis methods, and emphasize the importance of obtaining high quality and deeply sequenced input DNA libraries for ChIP-seq analysis.
  • Chromatin signatures of the Drosophila replication program
    Eaton ML, Prinz JA, MacAlpine HK, Tretyakov G, Kharchenko PV and MacAlpine DM.
    Genome Research. 2011 Feb;21(2):164-74
    Abstract: DNA replication initiates from thousands of start sites throughout the Drosophila genome and must be coordinated with other ongoing nuclear processes such as transcription to ensure genetic and epigenetic inheritance. Considerable progress has been made toward understanding how chromatin modifications regulate the transcription program; in contrast, we know relatively little about the role of the chromatin landscape in defining how start sites of DNA replication are selected and regulated. Here, we describe the Drosophila replication program in the context of the chromatin and transcription landscape for multiple cell lines using data generated by the modENCODE consortium. We find that while the cell lines exhibit similar replication programs, there are numerous cell line-specific differences that correlate with changes in the chromatin architecture. We identify chromatin features that are associated with replication timing, early origin usage, and ORC binding. Primary sequence, activating chromatin marks, and DNA-binding proteins (including chromatin remodelers) contribute in an additive manner to specify ORC-binding sites. We also generate accurate and predictive models from the chromatin data to describe origin usage and strength between cell lines. Multiple activating chromatin modifications contribute to the function and relative strength of replication origins, suggesting that the chromatin environment does not regulate origins of replication as a simple binary switch, but rather acts as a tunable rheostat to regulate replication initiation events.
  • Plasticity in patterns of histone modifications and chromosomal proteins in the Drosophila heterochromatin
    Riddle NC*, Minoda A*, Kharchenko PV*, Alekseyenko AA, Schwartz YB, Tolstorukov MY, Gorchakov AA, Kennedy C, Linder-Basso D, Jaffe JD, Shanower G, Kuroda MI, Pirrotta V, Park PJ, Elgin SC, Karpen GH.
    Genome Research. 2011 Feb;21(2):147-63
    Abstract: Eukaryotic genomes are packaged in two basic forms, euchromatin and heterochromatin. We have examined the composition and organization of Drosophila melanogaster heterochromatin in different cell types using ChIP-array analysis of histone modifications and chromosomal proteins. As anticipated, the pericentric heterochromatin and chromosome 4 are on average enriched for the "silencing" marks H3K9me2, H3K9me3, HP1a, and SU(VAR)3-9, and are generally depleted for marks associated with active transcription. The locations of the euchromatin-heterochromatin borders identified by these marks are similar in animal tissues and most cell lines, although the amount of heterochromatin is variable in some cell lines. Combinatorial analysis of chromatin patterns reveals distinct profiles for euchromatin, pericentric heterochromatin, and the 4th chromosome. Both silent and active protein-coding genes in heterochromatin display complex patterns of chromosomal proteins and histone modifications; a majority of the active genes exhibit both "activation" marks (e.g., H3K4me3 and H3K36me3) and "silencing" marks (e.g., H3K9me2 and HP1a). The hallmark of active genes in heterochromatic domains appears to be a loss of H3K9 methylation at the transcription start site. We also observe complex epigenomic profiles of intergenic regions, repeated transposable element (TE) sequences, and genes in the heterochromatic extensions. An unexpectedly large fraction of sequences in the euchromatic chromosome arms exhibits a heterochromatic chromatin signature, which differs in size, position, and impact on gene expression among cell types. We conclude that patterns of heterochromatin/euchromatin packaging show greater complexity and plasticity than anticipated. This comprehensive analysis provides a foundation for future studies of gene activity and chromosomal functions that are influenced by or dependent upon heterochromatin.
  • Identification of functional elements and regulatory circuits in Drosophila by large-scale data integration
    modENCODE Consortium, Roy S, Ernst J, Kharchenko PV, Kheradpour P, Negre N, Eaton ML, Landolin JM, Bristow CA, Ma L, Lin MF, Washietl S, Arshinoff BI, Ay F, Meyer PE, Robine N, Washington NL, Di Stefano L, Berezikov E, Brown CD, Candeias R, Carlson JW, Carr A, Jungreis I, Marbach D, Sealfon R, Tolstorukov MY, Will S, Alekseyenko AA, Artieri C, Booth BW, Brooks AN, Dai Q, Davis CA, Duff MO, Feng X, Gorchakov AA, Gu T, Henikoff JG, Kapranov P, Li R, MacAlpine HK, Malone J, Minoda A, Nordman J, Okamura K, Perry M, Powell SK, Riddle NC, Sakai A, Samsonova A, Sandler JE, Schwartz YB, Sher N, Spokony R, Sturgill D, van Baren M, Wan KH, Yang L, Yu C, Feingold E, Good P, Guyer M, Lowdon R, Ahmad K, Andrews J, Berger B, Brenner SE, Brent MR, Cherbas L, Elgin SC, Gingeras TR, Grossman R, Hoskins RA, Kaufman TC, Kent W, Kuroda MI, Orr-Weaver T, Perrimon N, Pirrotta V, Posakony JW, Ren B, Russell S, Cherbas P, Graveley BR, Lewis S, Micklem G, Oliver B, Park PJ, Celniker SE, Henikoff S, Karpen GH, Lai EC, MacAlpine DM, Stein LD, White KP, Kellis M.
    Science. 2010;330(6012):1787-97
    Abstract: To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.
  • Estimating enrichment of repetitive elements from high-throughput sequence data
    Day DS, Luquette LJ, Park PJ, Kharchenko PV.
    Genome Biol. 2010;11(6):R69.
    Abstract: We describe computational methods for analysis of repetitive elements from short-read sequencing data, and apply them to study histone modifications associated with the repetitive elements in human and mouse cells. Our results demonstrate that while accurate enrichment estimates can be obtained for individual repeat types and small sets of repeat instances, there are distinct combinatorial patterns of chromatin marks associated with major annotated repeat families, including H3K27me3/H3K9me3 differences among the endogenous retroviral element classes.
  • A region of human HoxD that confers Polycomb-group responsiveness
    Woo CJ, Kharchenko PV, Daheron L, Park PJ, Kingston RE.
    Cell. 2010 Jan 8;140(1):99-110
    Abstract: Polycomb group (PcG) proteins are essential for accurate axial body patterning during embryonic development. PcG-mediated repression is conserved in metazoans and is targeted in Drosophila by Polycomb response elements (PREs). However, targeting sequences in humans have not been described. While analyzing chromatin architecture in the context of human embryonic stem cell (hESC) differentiation, we discovered a 1.8kb region between HOXD11 and HOXD12 (D11.12) that is associated with PcG proteins, becomes nuclease hypersensitive, and then shows alteration in nuclease sensitivity as hESCs differentiate. The D11.12 element repressed luciferase expression from a reporter construct and full repression required a highly conserved region and YY1 binding sites. Furthermore, repression was dependent on the PcG proteins BMI1 and EED and a YY1-interacting partner, RYBP. We conclude that D11.12 is a Polycomb-dependent regulatory region with similarities to Drosophila PREs, indicating conservation in the mechanisms that target PcG function in mammals and flies.
  • Long-range dosage compensation in Drosophila captures transcribed autosomal genes inserted on X
    Gorchakov A.A, Alekseyenko AA, Kharchenko PV, Park PJ, Kuroda MI.
    Genes & Dev. 2009; 23 (19).
    Abstract: Dosage compensation in Drosophila melanogaster males is achieved via targeting of male-specific lethal (MSL) complex to X-linked genes. This is proposed to involve sequence-specific recognition of the X at approximately 150-300 chromatin entry sites, and subsequent spreading to active genes. Here we ask whether the spreading step requires transcription and is sequence-independent. We find that MSL complex binds, acetylates, and up-regulates autosomal genes inserted on X, but only if transcriptionally active. We conclude that a long-sought specific DNA sequence within X-linked genes is not obligatory for MSL binding. Instead, linkage and transcription play the pivotal roles in MSL targeting irrespective of gene origin and DNA sequence.
  • Comparative analysis of H2A.Z nucleosome organization in the human and yeast genomes
    Tolstorukov MY, Kharchenko PV, Goldman JA, Kingston RE, Park PJ.
    Genome Res. 2009 Jun;19(6):967-77. Epub 2009 Feb 26.
    Abstract: Eukaryotic DNA is wrapped around a histone protein core to constitute the fundamental repeating units of chromatin, the nucleosomes. The affinity of the histone core for DNA depends on the nucleotide sequence; however, it is unclear to what extent DNA sequence determines nucleosome positioning in vivo, and if the same rules of sequence-directed positioning apply to genomes of varying complexity. Using the data generated by high-throughput DNA sequencing combined with chromatin immunoprecipitation, we have identified positions of nucleosomes containing the H2A.Z histone variant and histone H3 trimethylated at lysine 4 in human CD4(+) T-cells. We find that the 10-bp periodicity observed in nucleosomal sequences in yeast and other organisms is not pronounced in human nucleosomal sequences. This result was confirmed for a broader set of mononucleosomal fragments that were not selected for any specific histone variant or modification. We also find that human H2A.Z nucleosomes protect only approximately 120 bp of DNA from MNase digestion and exhibit specific sequence preferences, suggesting a novel mechanism of nucleosome organization for the H2A.Z variant.
  • Design and analysis of ChIP-seq experiments for DNA-binding proteins
    Kharchenko PV, Tolstorukov MY, Park PJ.
    Nat Biotechnol. 2008 Dec;26(12):1351-9
    Abstract: Recent progress in massively parallel sequencing platforms has enabled genome-wide characterization of DNA-associated proteins using the combination of chromatin immunoprecipitation and sequencing (ChIP-seq). Although a variety of methods exist for analysis of the established alternative ChIP microarray (ChIP-chip), few approaches have been described for processing ChIP-seq data. To fill this gap, we propose an analysis pipeline specifically designed to detect protein-binding positions with high accuracy. Using previously reported data sets for three transcription factors, we illustrate methods for improving tag alignment and correcting for background signals. We compare the sensitivity and spatial precision of three peak detection algorithms with published methods, demonstrating gains in spatial precision when an asymmetric distribution of tags on positive and negative strands is considered. We also analyze the relationship between the depth of sequencing and characteristics of the detected binding positions, and provide a method for estimating the sequencing depth necessary for a desired coverage of protein binding sites.
  • Nucleosome positioning in human HOX gene clusters
    Kharchenko PV, Woo CJ, Tolstorukov MY, Kingston RE, Park PJ.
    Genome Res. 2008 Oct;18(10):1554-61.
    Abstract: The distribution of nucleosomes along the genome is a significant aspect of chromatin structure and is thought to influence gene regulation through modulation of DNA accessibility. However, properties of nucleosome organization remain poorly understood, particularly in mammalian genomes. Toward this goal we used tiled microarrays to identify stable nucleosome positions along the HOX gene clusters in human cell lines. We show that nucleosome positions exhibit sequence properties and long-range organization that are different from those characterized in other organisms. Despite overall variability of internucleosome distances, specific loci contain regular nucleosomal arrays with 195-bp periodicity. Moreover, such arrays tend to occur preferentially toward the 3' ends of genes. Through comparison of different cell lines, we find that active transcription is correlated with increased positioning of nucleosomes, suggesting an unexpected role for transcription in the establishment of well-positioned nucleosomes.
  • Differential H3K4 methylation identifies developmentally poised hematopoietic genes
    Orford K*, Kharchenko P*, Lai W, Dao MC, Worhunsky DJ, Ferro A, Janzen V, Park PJ, Scadden DT.
    Dev Cell. 2008 May;14(5):798-809
    Abstract: Throughout development, cell fate decisions are converted into epigenetic information that determines cellular identity. Covalent histone modifications are heritable epigenetic marks and are hypothesized to play a central role in this process. In this report, we assess the concordance of histone H3 lysine 4 dimethylation (H3K4me2) and trimethylation (H3K4me3) on a genome-wide scale in erythroid development by analyzing pluripotent, multipotent, and unipotent cell types. Although H3K4me2 and H3K4me3 are concordant at most genes, multipotential hematopoietic cells have a subset of genes that are differentially methylated (H3K4me2+/me3-). These genes are transcriptionally silent, highly enriched in lineage-specific hematopoietic genes, and uniquely susceptible to differentiation-induced H3K4 demethylation. Self-renewing embryonic stem cells, which restrict H3K4 methylation to genes that contain CpG islands (CGIs), lack H3K4me2+/me3- genes. These data reveal distinct epigenetic regulation of CGI and non-CGI genes during development and indicate an interactive relationship between DNA sequence and differential H3K4 methylation in lineage-specific differentiation.
  • Chromosomal periodicity of evolutionarily conserved gene pairs.
    Wright MA, Kharchenko P, Church GM, Segrè D.
    Proc Natl Acad Sci U S A. 2007 Jun 19
    Abstract: Chromosomes are compacted hundreds of times to fit in the cell, packaged into dynamic folds whose structures are largely unknown. Here, we examine patterns in gene locations to infer large-scale features of bacterial chromosomes. Specifically, we analyzed >100 genomes and identified thousands of gene pairs that display two types of evolutionary correlations: a tendency to co-occur and a tendency to be located close together in many genomes. We then analyzed the detailed distribution of these pairs in Escherichia coli and found that genes in a pair tend to be separated by integral multiples of 117 kb along the genome and to be positioned in a 117-kb grid of genomic locations. In addition, the most pair-dense locations coincide with regions of intense transcriptional activity and the positions of top transcribed and conserved genes. These patterns suggest that the E. coli chromosome may be organized into a 117-kb helix-like topology that localizes a subset of the most essential and highly transcribed genes along a specific face of this structure. Our approach indicates an evolutionarily maintained preference in the spacing of genes along the chromosome and offers a general comparative genomics framework for studying chromosome structure, broadly applicable to other organisms.
  • Identifying metabolic enzymes with multiple types of association evidence
    Kharchenko P, Chen L, Freund Y, Vitkup D, Church GM
    BMC Bioinformatics. 2006 Mar 29;7:177
    Abstract: BACKGROUND: Existing large-scale metabolic models of sequenced organisms commonly include enzymatic functions which can not be attributed to any gene in that organism. Existing computational strategies for identifying such missing genes rely primarily on sequence homology to known enzyme-encoding genes. RESULTS: We present a novel method for identifying genes encoding for a specific metabolic function based on a local structure of metabolic network and multiple types of functional association evidence, including clustering of genes on the chromosome, similarity of phylogenetic profiles, gene expression, protein fusion events and others. Using E. coli and S. cerevisiae metabolic networks, we illustrate predictive ability of each individual type of association evidence and show that significantly better predictions can be obtained based on the combination of all data. In this way our method is able to predict 60% of enzyme-encoding genes of E. coli metabolism within the top 10 (out of 3551) candidates for their enzymatic function, and as a top candidate within 43% of the cases. CONCLUSION: We illustrate that a combination of genome context and other functional association evidence is effective in predicting genes encoding metabolic enzymes. Our approach does not rely on direct sequence homology to known enzyme-encoding genes, and can be used in conjunction with traditional homology-based metabolic reconstruction methods. The method can also be used to target orphan metabolic activities.