• Epigenetic Memory Underlies Cell-Autonomous Heterogeneous Behavior of Hematopoietic Stem Cells.
    Yu VW, Yusuf RZ, Oki T, Wu J, Saez B, Wang X, Cook C, Baryawno N, Ziller MJ, Lee E, Gu H, Meissner A, Lin CP, Kharchenko PV*, Scadden DT*
    Cell. 2016 Nov 17;167(5):1310-1322.e17
    Abstract: Stem cells determine homeostasis and repair of many tissues and are increasingly recognized as functionally heterogeneous. To define the extent of-and molecular basis for-heterogeneity, we overlaid functional, transcriptional, and epigenetic attributes of hematopoietic stem cells (HSCs) at a clonal level using endogenous fluorescent tagging. Endogenous HSC had clone-specific functional attributes over time in vivo. The intra-clonal behaviors were highly stereotypic, conserved under the stress of transplantation, inflammation, and genotoxic injury, and associated with distinctive transcriptional, DNA methylation, and chromatin accessibility patterns. Further, HSC function corresponded to epigenetic configuration but not always to transcriptional state. Therefore, hematopoiesis under homeostatic and stress conditions represents the integrated action of highly heterogeneous clones of HSC with epigenetically scripted behaviors. This high degree of epigenetically driven cell autonomy among HSCs implies that refinement of the concepts of stem cell plasticity and of the stem cell niche is warranted.
  • Transcriptomic Characterization of SF3B1 Mutation Reveals Its Pleiotropic Effects in Chronic Lymphocytic Leukemia.
    Wang L, Brooks AN, Fan J, Wan Y, Gambe R, Li S, Hergert S, Yin S, Freeman SS, Levin JZ, Fan L, Seiler M, Buonamici S, Smith PG, Chau KF, Cibulskis CL, Zhang W, Rassenti LZ, Ghia EM, Kipps TJ, Fernandes S, Bloch DB, Kotliar D, Landau DA, Shukla SA, Aster JC, Reed R, DeLuca DS, Brown JR, Neuberg D, Getz G, Livak KJ, Meyerson MM, Kharchenko PV, Wu CJ.
    Cancer Cell. 2016 Nov 14;30(5):750-763.
    Abstract: Mutations in SF3B1, which encodes a spliceosome component, are associated with poor outcome in chronic lymphocytic leukemia (CLL), but how these contribute to CLL progression remains poorly understood. We undertook a transcriptomic characterization of primary human CLL cells to identify transcripts and pathways affected by SF3B1 mutation. Splicing alterations, identified in the analysis of bulk cells, were confirmed in single SF3B1-mutated CLL cells and also found in cell lines ectopically expressing mutant SF3B1. SF3B1 mutation was found to dysregulate multiple cellular functions including DNA damage response, telomere maintenance, and Notch signaling (mediated through KLF8 upregulation, increased TERC and TERT expression, or altered splicing of DVL2 transcript, respectively). SF3B1 mutation leads to diverse changes in CLL-related pathways.
  • Cell-Type-Specific Alternative Splicing Governs Cell Fate in the Developing Cerebral Cortex.
    Zhang X, Chen MH, Wu X, Kodani A, Fan J, Doan R, Ozawa M, Ma J, Yoshida N, Reiter JF, Black DL, Kharchenko PV, Sharp PA, Walsh CA.
    Cell. 2016 Aug 25;166(5):1147-1162.e15.
    Abstract: Alternative splicing is prevalent in the mammalian brain. To interrogate the functional role of alternative splicing in neural development, we analyzed purified neural progenitor cells (NPCs) and neurons from developing cerebral cortices, revealing hundreds of differentially spliced exons that preferentially alter key protein domains-especially in cytoskeletal proteins-and can harbor disease-causing mutations. We show that Ptbp1 and Rbfox proteins antagonistically govern the NPC-to-neuron transition by regulating neuron-specific exons. Whereas Ptbp1 maintains apical progenitors partly through suppressing a poison exon of Flna in NPCs, Rbfox proteins promote neuronal differentiation by switching Ninein from a centrosomal splice form in NPCs to a non-centrosomal isoform in neurons. We further uncover an intronic human mutation within a PTBP1-binding site that disrupts normal skipping of the FLNA poison exon in NPCs and causes a brain-specific malformation. Our study indicates that dynamic control of alternative splicing governs cell fate in cerebral cortical development.
  • Proximity-Based Differential Single-Cell Analysis of the Niche to Identify Stem/Progenitor Cell Regulators.
    Silberstein L, Goncalves KA, Kharchenko PV, Turcotte R, Kfoury Y, Mercier F, Baryawno N, Severe N, Bachand J, Spencer JA, Papazian A, Lee D, Chitteti BR, Srour EF, Hoggatt J, Tate T, Lo Celso C, Ono N, Nutt S, Heino J, Sipil K, Shioda T, Osawa M, Lin CP, Hu GF, Scadden DT.
    Cell Stem Cell. 2016 Oct 6;19(4):530-543.
    Abstract:Physiological stem cell function is regulated by secreted factors produced by niche cells. In this study, we describe an unbiased approach based on the differential single-cell gene expression analysis of mesenchymal osteolineage cells close to, and further removed from, hematopoietic stem/progenitor cells (HSPCs) to identify candidate niche factors. Mesenchymal cells displayed distinct molecular profiles based on their relative location. We functionally examined, among the genes that were preferentially expressed in proximal cells, three secreted or cell-surface molecules not previously connected to HSPC biology-the secreted RNase angiogenin, the cytokine IL18, and the adhesion molecule Embigin-and discovered that all of these factors are HSPC quiescence regulators. Therefore, our proximity-based differential single-cell approach reveals molecular heterogeneity within niche cells and can be used to identify novel extrinsic stem/progenitor cell regulators. Similar approaches could also be applied to other stem cell/niche pairs to advance the understanding of microenvironmental regulation of stem cell function.
  • Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis
    Fan J, Salathia N, Liu R, Kaeser GE, Yung YC, Herman JL, Kaper F, Fan J-B, Zhang K, Chun J, Kharchenko PV
    Nature Methods 2016 doi:10.1038/nmeth.3734
    Abstract: The transcriptional state of a cell reflects a variety of biological factors, from cell-type-specific features to transient processes such as the cell cycle, all of which may be of interest. However, identifying such aspects from noisy single-cell RNA-seq data remains challenging. We developed pathway and gene set overdispersion analysis (PAGODA) to resolve multiple, potentially overlapping aspects of transcriptional heterogeneity by testing gene sets for coordinated variability among measured cells.
  • Transcription factors LRF and BCL11A independently repress expression of fetal hemoglobin
    Masuda T, Wang X, Maeda M, Canver MC, Sher F, Funnell APW, Fisher C, Suciu M, Martyn GE, Norton LJ, Zhu C, Kurita R, Nakamura Y, Xu J, Higgs DR, Crossley M, Bauer DE, Orkin SH, Kharchenko PV*, Maeda T*
    Science 2016 Jan 15; 6270(351):285-289. doi: 10.1126/science.aad3312
    Abstract: Genes encoding human β-type globin undergo a developmental switch from embryonic to fetal to adult-type expression. Mutations in the adult form cause inherited hemoglobinopathies or globin disorders, including sickle cell disease and thalassemia. Some experimental results have suggested that these diseases could be treated by induction of fetal-type hemoglobin (HbF). However, the mechanisms that repress HbF in adults remain unclear. We found that the LRF/ZBTB7A transcription factor occupies fetal γ-globin genes and maintains the nucleosome density necessary for γ-globin gene silencing in adults, and that LRF confers its repressive activity through a NuRD repressor complex independent of the fetal globin repressor BCL11A. Our study may provide additional opportunities for therapeutic targeting in the treatment of hemoglobinopathies.
  • Pericentromeric satellite repeat expansions through RNA-derived DNA intermediates in cancer.
    Bersani F, Lee E, Kharchenko PV, Xu AW, Liu M, Xega K, MacKenzie OC, Brannigan BW, Wittner BS, Jung H, Ramaswamy S, Park PJ, Maheswaran S, Ting DT, Haber DA.
    Proc Natl Acad Sci U S A. 2015 Dec 8;112(49):15148-53. doi: 10.1073/pnas.1518008112.
    Abstract: Aberrant transcription of the pericentromeric human satellite II (HSATII) repeat is present in a wide variety of epithelial cancers. In deriving experimental systems to study its deregulation, we observed that HSATII expression is induced in colon cancer cells cultured as xenografts or under nonadherent conditions in vitro, but it is rapidly lost in standard 2D cultures. Unexpectedly, physiological induction of endogenous HSATII RNA, as well as introduction of synthetic HSATII transcripts, generated cDNA intermediates in the form of DNA/RNA hybrids. Single molecule sequencing of tumor xenografts showed that HSATII RNA-derived DNA (rdDNA) molecules are stably incorporated within pericentromeric loci. Suppression of RT activity using small molecule inhibitors reduced HSATII copy gain. Analysis of whole-genome sequencing data revealed that HSATII copy number gain is a common feature in primary human colon tumors and is associated with a lower overall survival. Together, our observations suggest that cancer-associated derepression of specific repetitive sequences can promote their RNA-driven genomic expansion, with potential implications on pericentromeric architecture.
  • The oncogenic BRD4-NUT chromatin regulator drives aberrant transcription within large topological domains.
    Alekseyenko AA, Walsh EM, Wang X, Grayson AR, Hsi PT, Kharchenko PV, Kuroda MI, French CA.
    Genes Dev. 2015 Jul 15;29(14):1507-23. doi: 10.1101/gad.267583.115.
    Abstract: NUT midline carcinoma (NMC), a subtype of squamous cell cancer, is one of the most aggressive human solid malignancies known. NMC is driven by the creation of a translocation oncoprotein, BRD4-NUT, which blocks differentiation and drives growth of NMC cells. BRD4-NUT forms distinctive nuclear foci in patient tumors, which we found correlate with ∼100 unprecedented, hyperacetylated expanses of chromatin that reach up to 2 Mb in size. These "megadomains" appear to be the result of aberrant, feed-forward loops of acetylation and binding of acetylated histones that drive transcription of underlying DNA in NMC patient cells and naïve cells induced to express BRD4-NUT. Megadomain locations are typically cell lineage-specific; however, the cMYC and TP63 regions are targeted in all NMCs tested and play functional roles in tumor growth. Megadomains appear to originate from select pre-existing enhancers that progressively broaden but are ultimately delimited by topologically associating domain (TAD) boundaries. Therefore, our findings establish a basis for understanding the powerful role played by large-scale chromatin organization in normal and aberrant lineage-specific gene transcription.
  • DAZL regulates Tet1 translation in murine embryonic stem cells.
    Welling M, Chen HH, Munoz J, Musheev MU, Kester L, Junker JP, Mischerikow N, Arbab M, Kuijk E, Silberstein L, Kharchenko PV, Geens M, Niehrs C, van de Velde H, van Oudenaarden A, Heck AJ, Geijsen N.
    EMBO Rep. 2015 Jul;16(7):791-802. doi: 10.15252/embr.201540538. Epub 2015 Jun 15.
    Abstract:Embryonic stem cell (ESC) cultures display a heterogeneous gene expression profile, ranging from a pristine naïve pluripotent state to a primed epiblast state. Addition of inhibitors of GSK3-beta and MEK (so-called 2i conditions) pushes ESC cultures toward a more homogeneous naive pluripotent state, but the molecular underpinnings of this naïve transition are not completely understood. Here, we demonstrate that DAZL, an RNA-binding protein known to play a key role in germ-cell development, marks a subpopulation of ESCs that is actively transitioning toward naïve pluripotency. Moreover, DAZL plays an essential role in the active reprogramming of cytosine methylation. We demonstrate that DAZL associates with mRNA of Tet1, a catalyst of 5-hydroxylation of methyl-cytosine, and enhances Tet1 mRNA translation. Overexpression of DAZL in heterogeneous ESC cultures results in elevated TET1 protein levels as well as increased global hydroxymethylation. Conversely, null mutation of Dazl severely stunts 2i-mediated TET1 induction and hydroxymethylation. Our results provide insight into the regulation of the acquisition of naïve pluripotency and demonstrate that DAZL enhances TET1-mediated cytosine hydroxymethylation in ESCs that are actively reprogramming to a pluripotent ground state.
  • Epstein-Barr virus oncoprotein super-enhancers control B cell growth.
    Zhou H, Schmidt SC, Jiang S, Willox B, Bernhardt K, Liang J, Johannsen EC, Kharchenko P, Gewurz BE, Kieff E, Zhao B.
    Cell Host Microbe. 2015 Feb 11;17(2):205-16. doi: 10.1016/j.chom.2014.12.013.
    Abstract:Super-enhancers are clusters of gene-regulatory sites bound by multiple transcription factors that govern cell transcription, development, phenotype, and oncogenesis. By examining Epstein-Barr virus (EBV)-transformed lymphoblastoid cell lines (LCLs), we identified four EBV oncoproteins and five EBV-activated NF-κB subunits co-occupying ∼1,800 enhancer sites. Of these, 187 had markedly higher and broader histone H3K27ac signals, characteristic of super-enhancers, and were designated "EBV super-enhancers." EBV super-enhancer-associated genes included the MYC and BCL2 oncogenes, which enable LCL proliferation and survival. EBV super-enhancers were enriched for B cell transcription factor motifs and had high co-occupancy of STAT5 and NFAT transcription factors (TFs). EBV super-enhancer-associated genes were more highly expressed than other LCL genes. Disrupting EBV super-enhancers by the bromodomain inhibitor JQ1 or conditionally inactivating an EBV oncoprotein or NF-κB decreased MYC or BCL2 expression and arrested LCL growth. These findings provide insight into mechanisms of EBV-induced lymphoproliferation and identify potential therapeutic interventions.
  • BioTAP-XL: Cross-linking/Tandem Affinity Purification to Study DNA Targets, RNA, and Protein Components of Chromatin-Associated Complexes.
    Alekseyenko AA, McElroy KA, Kang H, Zee BM, Kharchenko PV, Kuroda MI.
    Curr Protoc Mol Biol. 2015 Jan 5;109:21.30.1-21.30.32. doi: 10.1002/0471142727.mb2130s109.
    Abstract:In order to understand how chromatin complexes function in the nucleus, it is important to obtain a comprehensive picture of their protein, DNA, and RNA components, as well as their mutual interactions. This unit presents a chromatin cross-linking approach (BioTAP-XL) that utilizes a special BioTAP-tagged transgenic protein bait along with mass spectrometry to identify protein complex components, and high-throughput sequencing to identify RNA components and DNA binding sites. Full protocols are provided for Drosophila cells and for human cells in culture, along with an additional protocol for Drosophila embryos as the source material. A key element of the approach in all cases is the generation of control data from input chromatin samples.
  • Epstein-Barr virus nuclear antigen 3A partially coincides with EBNA3C genome-wide and is tethered to DNA through BATF complexes.
    Schmidt SC, Jiang S, Zhou H, Willox B, Holthaus AM, Kharchenko PV, Johannsen EC, Kieff E, Zhao B.
    Proc Natl Acad Sci U S A. 2015 Jan 13;112(2):554-9. doi: 10.1073/pnas.1422580112
    Abstract: Epstein-Barr Virus (EBV) conversion of B-lymphocytes to Lymphoblastoid Cell Lines (LCLs) requires four EBV nuclear antigen (EBNA) oncoproteins: EBNA2, EBNALP, EBNA3A, and EBNA3C. EBNA2 and EBNALP associate with EBV and cell enhancers, up-regulate the EBNA promoter, MYC, and EBV Latent infection Membrane Proteins (LMPs), which up-regulate BCL2 to protect EBV-infected B-cells from MYC proliferation-induced cell death. LCL proliferation induces p16(INK4A) and p14(ARF)-mediated cell senescence. EBNA3A and EBNA3C jointly suppress p16(INK4A) and p14(ARF), enabling continuous cell proliferation. Analyses of the EBNA3A human genome-wide ChIP-seq landscape revealed 37% of 10,000 EBNA3A sites to be at strong enhancers; 28% to be at weak enhancers; 4.4% to be at active promoters; and 6.9% to be at weak and poised promoters. EBNA3A colocalized with BATF-IRF4, ETS-IRF4, RUNX3, and other B-cell Transcription Factors (TFs). EBNA3A sites clustered into seven unique groups, with differing B-cell TFs and epigenetic marks. EBNA3A coincidence with BATF-IRF4 or RUNX3 was associated with stronger EBNA3A ChIP-Seq signals. EBNA3A was at MYC, CDKN2A/B, CCND2, CXCL9/10, and BCL2, together with RUNX3, BATF, IRF4, and SPI1. ChIP-re-ChIP revealed complexes of EBNA3A on DNA with BATF. These data strongly support a model in which EBNA3A is tethered to DNA through a BATF-containing protein complexes to enable continuous cell proliferation.
  • Locally disordered methylation forms the basis of intratumor methylome variation in chronic lymphocytic leukemia.
    Landau DA, Clement K, Ziller MJ, Boyle P, Fan J, Gu H, Stevenson K, Sougnez C, Wang L, Li S, Kotliar D, Zhang W, Ghandi M, Garraway L, Fernandes SM, Livak KJ, Gabriel S, Gnirke A, Lander ES, Brown JR, Neuberg D, Kharchenko PV, Hacohen N, Getz G, Meissner A, Wu CJ.
    Cancer Cell. 2014 Dec 8;26(6):813-25. doi: 10.1016/j.ccell.2014.10.012.
    Abstract:Intratumoral heterogeneity plays a critical role in tumor evolution. To define the contribution of DNA methylation to heterogeneity within tumors, we performed genome-scale bisulfite sequencing of 104 primary chronic lymphocytic leukemias (CLLs). Compared with 26 normal B cell samples, CLLs consistently displayed higher intrasample variability of DNA methylation patterns across the genome, which appears to arise from stochastically disordered methylation in malignant cells. Transcriptome analysis of bulk and single CLL cells revealed that methylation disorder was linked to low-level expression. Disordered methylation was further associated with adverse clinical outcome. We therefore propose that disordered methylation plays a similar role to that of genetic instability, enhancing the ability of cancer cells to search for superior evolutionary trajectories.
  • Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing.
    Usoskin D, Furlan A, Islam S, Abdo H, Lonnerberg P, Lou D, Hjerling-Leffler J, Haeggstrom J, Kharchenko O, Kharchenko PV, Linnarsson S, Ernfors P.
    Nature Neurosci. 2015 Jan;18(1):145-53. doi: 10.1038/nn.3881. Epub 2014 Nov 24.
    Abstract: The primary sensory system requires the integrated function of multiple cell types, although its full complexity remains unclear. We used comprehensive transcriptome analysis of 622 single mouse neurons to classify them in an unbiased manner, independent of any a priori knowledge of sensory subtypes. Our results reveal eleven types: three distinct low-threshold mechanoreceptive neurons, two proprioceptive, and six principal types of thermosensitive, itch sensitive, type C low-threshold mechanosensitive and nociceptive neurons with markedly different molecular and operational properties. Confirming previously anticipated major neuronal types, our results also classify and provide markers for new, functionally distinct subtypes. For example, our results suggest that itching during inflammatory skin diseases such as atopic dermatitis is linked to a distinct itch-generating type. We demonstrate single-cell RNA-seq as an effective strategy for dissecting sensory responsive cells into distinct neuronal types. The resulting catalog illustrates the diversity of sensory types and the cellular complexity underlying somatic sensation.
  • Comparative analysis of metazoan chromatin organization.
    Ho JW, Jung YL, Liu T, Alver BH, Lee S, Ikegami K, Sohn KA, Minoda A, Tolstorukov MY, Appert A, Parker SC, Gu T, Kundaje A, Riddle NC, Bishop E, Egelhofer TA, Hu SS, Alekseyenko AA, Rechtsteiner A, Asker D, Belsky JA, Bowman SK, Chen QB, Chen RA, Day DS, Dong Y, Dose AC, Duan X, Epstein CB, Ercan S, Feingold EA, Ferrari F, Garrigues JM, Gehlenborg N, Good PJ, Haseley P, He D, Herrmann M, Hoffman MM, Jeffers TE, Kharchenko PV, Kolasinska-Zwierz P, Kotwaliwale CV, Kumar N, Langley SA, Larschan EN, Latorre I, Libbrecht MW, Lin X, Park R, Pazin MJ, Pham HN, Plachetka A, Qin B, Schwartz YB, Shoresh N, Stempor P, Vielle A, Wang C, Whittle CM, Xue H, Kingston RE, Kim JH, Bernstein BE, Dernburg AF, Pirrotta V, Kuroda MI, Noble WS, Tullius TD, Kellis M, MacAlpine DM, Strome S, Elgin SC, Liu XS, Lieb JD, Ahringer J, Karpen GH, Park PJ.
    Nature. 2014 Aug 28;512(7515):449-52. doi: 10.1038/nature13415.
    Abstract: Genome function is dynamically regulated in part by chromatin, which consists of the histones, non-histone proteins and RNA molecules that package DNA. Studies in Caenorhabditis elegans and Drosophila melanogaster have contributed substantially to our understanding of molecular mechanisms of genome function in humans, and have revealed conservation of chromatin components and mechanisms. Nevertheless, the three organisms have markedly different genome sizes, chromosome architecture and gene organization. On human and fly chromosomes, for example, pericentric heterochromatin flanks single centromeres, whereas worm chromosomes have dispersed heterochromatin-like regions enriched in the distal chromosomal 'arms', and centromeres distributed along their lengths. To systematically investigate chromatin organization and associated gene regulation across species, we generated and analysed a large collection of genome-wide chromatin data sets from cell lines and developmental stages in worm, fly and human. Here we present over 800 new data sets from our ENCODE and modENCODE consortia, bringing the total to over 1,400. Comparison of combinatorial patterns of histone modifications, nuclear lamina-associated domains, organization of large-scale topological domains, chromatin environment at promoters and enhancers, nucleosome positioning, and DNA replication patterns reveals many conserved features of chromatin organization among the three organisms. We also find notable differences in the composition and locations of repressive chromatin. These data sets and analyses provide a rich resource for comparative and species-specific investigations of chromatin composition, organization and function.
  • Heterochromatin-associated interactions of Drosophila HP1a with dADD1, HIPP1, and repetitive RNAs.
    Alekseyenko AA, Gorchakov AA, Zee BM, Fuchs SM, Kharchenko PV, Kuroda MI.
    Genes Dev. 2014 Jul 1;28(13):1445-60. doi: 10.1101/gad.241950.114
    Abstract: Heterochromatin protein 1 (HP1a) has conserved roles in gene silencing and heterochromatin and is also implicated in transcription, DNA replication, and repair. Here we identify chromatin-associated protein and RNA interactions of HP1a by BioTAP-XL mass spectrometry and sequencing from Drosophila S2 cells, embryos, larvae, and adults. Our results reveal an extensive list of known and novel HP1a-interacting proteins, of which we selected three for validation. A strong novel interactor, dADD1 (Drosophila ADD1) (CG8290), is highly enriched in heterochromatin, harbors an ADD domain similar to human ATRX, displays selective binding to H3K9me2 and H3K9me3, and is a classic genetic suppressor of position-effect variegation. Unexpectedly, a second hit, HIPP1 (HP1 and insulator partner protein-1) (CG3680), is strongly connected to CP190-related complexes localized at putative insulator sequences throughout the genome in addition to its colocalization with HP1a in heterochromatin. A third interactor, the histone methyltransferase MES-4, is also enriched in heterochromatin. In addition to these protein-protein interactions, we found that HP1a selectively associated with a broad set of RNAs transcribed from repetitive regions. We propose that this rich network of previously undiscovered interactions will define how HP1a complexes perform their diverse functions in cells and developing organisms.
  • Bayesian approach to single-cell differential expression analysis.
    Kharchenko PV, Silberstein L, Scadden DT.
    Nat Methods. 2014 Jul;11(7):740-2. doi: 10.1038/nmeth.2967
    Abstract: Single-cell data provide a means to dissect the composition of complex tissues and specialized cellular environments. However, the analysis of such measurements is complicated by high levels of technical noise and intrinsic biological variability. We describe a probabilistic model of expression-magnitude distortions typical of single-cell RNA-sequencing measurements, which enables detection of differential expression signatures and identification of subpopulations of cells in a way that is more tolerant of noise.
  • Reciprocal interactions of human C10orf12 and C17orf96 with PRC2 revealed by BioTAP-XL cross-linking and affinity purification.
    Alekseyenko AA, Gorchakov AA, Kharchenko PV, Kuroda MI.
    Proc Natl Acad Sci U S A. 2014 Feb 18;111(7):2488-93. doi: 10.1073/pnas.1400648111
    Abstract: Understanding the composition of epigenetic regulators remains an important challenge in chromatin biology. Traditional biochemical analysis of chromatin-associated complexes requires their release from DNA under conditions that can also disrupt key interactions. Here we develop a complementary approach (BioTAP-XL), in which cross-linking (XL) enhances the preservation of protein interactions and also allows the analysis of DNA targets under the same tandem affinity purification (BioTAP) regimen. We demonstrate the power of BioTAP-XL through analysis of human EZH2, a core subunit of polycomb repressive complex 2 (PRC2). We identify and validate two strong interactors, C10orf12 and C17orf96, which display enrichment with EZH2-BioTAP at levels similar to canonical PRC2 components (SUZ12, EED, MTF2, JARID2, PHF1, and AEBP2). ChIP-seq analysis of BioTAP-tagged C10orf12 or C17orf96 revealed the similarity of each binding pattern with the location of EZH2 and the H3K27me3-silencing mark, validating their physical interaction with PRC2 components. Interestingly, analysis by mass spectrometry of C10orf12 and C17orf96 interactions revealed that these proteins may be mutually exclusive PRC2 subunits that fail to interact with each other or with JARID2 and AEBP2. C10orf12, in addition, shows a strong and unexpected association with components of the EHMT1/2 complex, thus potentially connecting PRC2 to another histone methyltransferase. Similarly, results from CBX4-BioTAP protein pulldowns are consistent with reports of a diversity of PRC1 complexes. Our results highlight the importance of reciprocal analyses of multiple subunits and suggest that iterative use of BioTAP-XL has strong potential to reveal networks of chromatin-based interactions in higher organisms.
  • Chromatin proteins captured by ChIP-mass spectrometry are linked to dosage compensation in Drosophila.
    Wang CI, Alekseyenko AA, Leroy G, Elia AE, Gorchakov AA, Britton LM, Elledge SJ, Kharchenko PV, Garcia BA, Kuroda MI.
    Nature Struct Mol Biol. 2013 Jan 6. doi: 10.1038/nsmb.2477.
    Abstract: X-chromosome dosage compensation by the MSL (male-specific lethal) complex is required in Drosophila melanogaster to increase gene expression from the single male X to equal that of both female X chromosomes. Instead of focusing solely on protein complexes released from DNA, here we used chromatin-interacting protein MS (ChIP-MS) to identify MSL interactions on cross-linked chromatin. We identified MSL-enriched histone modifications, including histone H4 Lys16 acetylation and histone H3 Lys36 methylation, and CG4747, a putative Lys36-trimethylated histone H3 (H3K36me3)-binding protein. CG4747 is associated with the bodies of active genes, coincident with H3K36me3, and is mislocalized in the Set2 mutant lacking H3K36me3. CG4747 loss of function in vivo results in partial mislocalization of the MSL complex to autosomes, and RNA interference experiments confirm that CG4747 and Set2 function together to facilitate targeting of the MSL complex to active genes, validating the ChIP-MS approach.
  • Enrichment of HP1a on Drosophila Chromosome 4 Genes Creates an Alternate Chromatin Structure Critical for Regulation in this Heterochromatic Domain.
    Riddle NC, Jung YL, Gu T, Alekseyenko AA, Asker D, Gui H, Kharchenko PV, Minoda A, Plachetka A, Schwartz YB, Tolstorukov MY, Kuroda MI, Pirrotta V, Karpen GH, Park PJ, Elgin SC.
    PLoS Genet. 2012 Sep;8(9):e1002954
    Abstract: Chromatin environments differ greatly within a eukaryotic genome, depending on expression state, chromosomal location, and nuclear position. In genomic regions characterized by high repeat content and high gene density, chromatin structure must silence transposable elements but permit expression of embedded genes. We have investigated one such region, chromosome 4 of Drosophila melanogaster. Using chromatin-immunoprecipitation followed by microarray (ChIP-chip) analysis, we examined enrichment patterns of 20 histone modifications and 25 chromosomal proteins in S2 and BG3 cells, as well as the changes in several marks resulting from mutations in key proteins. Active genes on chromosome 4 are distinct from those in euchromatin or pericentric heterochromatin: while there is a depletion of silencing marks at the transcription start sites (TSSs), HP1a and H3K9me3, but not H3K9me2, are enriched strongly over gene bodies. Intriguingly, genes on chromosome 4 are less frequently associated with paused polymerase. However, when the chromatin is altered by depleting HP1a or POF, the RNA pol II enrichment patterns of many chromosome 4 genes shift, showing a significant decrease over gene bodies but not at TSSs, accompanied by lower expression of those genes. Chromosome 4 genes have a low incidence of TRL/GAGA factor binding sites and a low T(m) downstream of the TSS, characteristics that could contribute to a low incidence of RNA polymerase pausing. Our data also indicate that EGG and POF jointly regulate H3K9 methylation and promote HP1a binding over gene bodies, while HP1a targeting and H3K9 methylation are maintained at the repeats by an independent mechanism. The HP1a-enriched, POF-associated chromatin structure over the gene bodies may represent one type of adaptation for genes embedded in repetitive DNA.
  • ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia.
    Landt SG, Marinov GK, Kundaje A, Kheradpour P, Pauli F, Batzoglou S, Bernstein BE, Bickel P, Brown JB, Cayting P, Chen Y, DeSalvo G, Epstein C, Fisher-Aylor KI, Euskirchen G, Gerstein M, Gertz J, Hartemink AJ, Hoffman MM, Iyer VR, Jung YL, Karmakar S, Kellis M, Kharchenko PV, Li Q, Liu T, Liu XS, Ma L, Milosavljevic A, Myers RM, Park PJ, Pazin MJ, Perry MD, Raha D, Reddy TE, Rozowsky J, Shoresh N, Sidow A, Slattery M, Stamatoyannopoulos JA, Tolstorukov MY, White KP, Xi S, Farnham PJ, Lieb JD, Wold BJ, Snyder M.
    Genome Res. 2012 Sep;22(9):1813-31.
    Abstract: Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.
  • An integrated encyclopedia of DNA elements in the human genome.
    ENCODE consortium
    Nature. 2012 Sep 6;489(7414):57-74.
    Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.
  • Nature and function of insulator protein binding sites in the Drosophila genome.
    Schwartz YB, Linder-Basso D*, Kharchenko PV*, Tolstorukov MY*, Kim M, Li HB, Gorchakov AA, Minoda A, Shanower G, Alekseyenko AA, Riddle NC, Jung YL, Gu T, Plachetka A, Elgin SC, Kuroda MI, Park PJ, Savitsky M, Karpen GH, Pirrotta V.
    Genome Res. 2012 Jul 5 [Epub ahead of print]
    Abstract: Chromatin insulator elements and associated proteins have been proposed to partition eukaryotic genomes into sets of independently regulated domains. Here we test this hypothesis by quantitative genome-wide analysis of insulator protein binding to Drosophila chromatin. We find distinct combinatorial binding of insulator proteins to different classes of sites and uncover a novel type of insulator element that binds CP190 but not any other known insulator proteins. Functional characterization of different classes of binding sites indicates that only a small fraction act as robust insulators in standard enhancer-blocking assays. We show that insulators restrict the spreading of the H3K27me3 mark but only at a small number of Polycomb target regions and only to prevent repressive histone methylation within adjacent genes that are already transcriptionally inactive. RNAi knockdown of insulator proteins in cultured cells does not lead to major alterations in genome expression. Taken together these observations argue against the concept of a genome partitioned by specialized boundary elements and suggest that insulators are reserved for specific regulation of selected genes.
  • Landscape of somatic retrotransposition in human cancers.
    Lee E, Iskow R, Yang L, Gokcumen O, Haseley P, Luquette III LJ, Lohr JG, Harris CC, Ding L, Wilson RK, Wheeler DA, Gibbs RA, Kucherlapati R, Lee C, Kharchenko PV*, Park PJ*, and The Cancer Genome Atlas Research Network
    Science. 2012 Aug 24;337(6097):967-71
    Abstract: Transposable elements (TEs) are abundant in the human genome, and some are capable of generating new insertions through RNA intermediates. In cancer, the disruption of cellular mechanisms that normally suppress TE activity may facilitate mutagenic retrotranspositions. We performed single-nucleotide resolution analysis of TE insertions in 43 high-coverage whole-genome sequencing datasets from five cancer types. We identified 194 high-confidence somatic TE insertions, as well as thousands of polymorphic TE insertions in matched normal genomes. Somatic insertions were present in epithelial tumors but not in blood or brain cancers. Somatic L1 insertions tend to occur in genes that are commonly mutated in cancer, disrupt the expression of the target genes, and are biased toward regions of cancer-specific DNA hypomethylation, highlighting their potential impact in tumorigenesis.
  • Heterogeneity of the transition/transversion ratio in Drosophila and Hominidae genomes.
    Seplyarskiy VB, Kharchenko P, Kondrashov AS, Bazykin GA.
    Mol Biol Evol. 2012 Feb 15
    Abstract: Mutation rate varies between sites in the genome. Part of this variation can be explained by well-recognized short nucleotide contexts, but a large component of this variation remains cryptic. We used data on interspecies divergence and intraspecies polymorphism in Drosophila and Hominidae to analyze variation of the average rate of the 12 possible kinds of single-nucleotide mutations and in the transition/transversion ratio k at single-nucleotide resolution. Both the average mutation rate and k vary by a factor of ~3 between nucleotide sites. The characteristic scale of variation in k is up to at least ~30 nucleotides in Drosophila and ~5 nucleotides in Hominidae. Genome segments with locally elevated mutation rates possess lower values of k; however, a substantial fraction of variation in k cannot be directly explained by the local mutation rates.
  • The genomic binding sites of a noncoding RNA.
    Simon MD, Wang CI, Kharchenko PV, West JA, Chapman BA, Alekseyenko AA, Borowsky ML, Kuroda MI, Kingston RE.
    Proc Natl Acad Sci USA. 2011 Dec 20;108(51):20497-502
    Abstract: Long noncoding RNAs (lncRNAs) have important regulatory roles and can function at the level of chromatin. To determine where lncRNAs bind to chromatin, we developed capture hybridization analysis of RNA targets (CHART), a hybridization-based technique that specifically enriches endogenous RNAs along with their targets from reversibly cross-linked chromatin extracts. CHART was used to enrich the DNA and protein targets of endogenous lncRNAs from flies and humans. This analysis was extended to genome-wide mapping of roX2, a well-studied ncRNA involved in dosage compensation in Drosophila. CHART revealed that roX2 binds at specific genomic sites that coincide with the binding sites of proteins from the male-specific lethal complex that affects dosage compensation. These results reveal the genomic targets of roX2 and demonstrate how CHART can be used to study RNAs in a manner analogous to chromatin immunoprecipitation for proteins.
  • PI's Postdoctoral & graduate work

  • Evidence for dosage compensation between X and autosomes in mammals.
    Kharchenko PV, Xi R, Park PJ.
    Nature Genetics, 2011 Nov 28;43(12):1167-9
  • Comprehensive analysis of the Drosophila melanogaster chromatin landscape differentiates among chromosomes, genes, and regulatory elements.
    Kharchenko PV, Alekseyenko AA, Schwartz YB, Minoda A, Riddle NC, Ernst J, Sabo PJ, Larschan E, Gorchakov AA, Gu T, Linder-Basso D, Plachetka A, Shanower G, Tolstorukov MY, Bishop EP, Canfield TP, Sandstrom R, Thurman RE, Stamatoyannopoulos JA, Kellis M, Elgin SC, Kuroda MI, Pirrotta V, Karpen GH*, Park PJ*.
    Nature, 2011 Mar 24;471(7339):480-5
    Abstract: Chromatin is composed of DNA and a variety of modified histones and non-histone proteins, which have an impact on cell differentiation, gene regulation and other key cellular processes. Here we present a genome-wide chromatin landscape for Drosophila melanogaster based on eighteen histone modifications, summarized by nine prevalent combinatorial patterns. Integrative analysis with other data (non-histone chromatin proteins, DNase I hypersensitivity, GRO-Seq reads produced by engaged polymerase, short/long RNA products) reveals discrete characteristics of chromosomes, genes, regulatory elements and other functional domains. We find that active genes display distinct chromatin signatures that are correlated with disparate gene lengths, exon patterns, regulatory functions and genomic contexts. We also demonstrate a diversity of signatures among Polycomb targets that include a subset with paused polymerase. This systematic profiling and integrative analysis of chromatin signatures provides insights into how genomic elements are regulated, and will serve as a resource for future experimental investigations of genome structure and function.
  • X chromosome dosage compensation via enhanced transcriptional elongation in Drosophila males.
    Larschan E*, Bishop EP*, Kharchenko PV, Core L, Lis JT, Park PJ, Kuroda MI.
    Nature, 2011 Mar 3;471(7336):115-8.
    Abstract: The evolution of sex chromosomes has resulted in numerous species in which females inherit two X chromosomes but males have a single X, thus requiring dosage compensation. MSL (Male-specific lethal) complex increases transcription on the single X chromosome of Drosophila males to equalize expression of X-linked genes between the sexes. The biochemical mechanisms used for dosage compensation must function over a wide dynamic range of transcription levels and differential expression patterns. It has been proposed that the MSL complex regulates transcriptional elongation to control dosage compensation, a model subsequently supported by mapping of the MSL complex and MSL-dependent histone 4 lysine 16 acetylation to the bodies of X-linked genes in males, with a bias towards 3' ends. However, experimental analysis of MSL function at the mechanistic level has been challenging owing to the small magnitude of the chromosome-wide effect and the lack of an in vitro system for biochemical analysis. Here we use global run-on sequencing (GRO-seq) to examine the specific effect of the MSL complex on RNA Polymerase II (RNAP II) on a genome-wide level. Results indicate that the MSL complex enhances transcription by facilitating the progression of RNAP II across the bodies of active X-linked genes. Improving transcriptional output downstream of typical gene-specific controls may explain how dosage compensation can be imposed on the diverse set of genes along an entire chromosome.
  • ChIP-chip versus ChIP-seq: A systematic comparison of two technologies.
    Ho JW, Bishop EP, Kharchenko PV, Nègre N, White KP, Park PJ.
    BMC Genomics, 2011, 12:134
    Abstract: BACKGROUND: Chromatin immunoprecipitation (ChIP) followed by microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) allows genome-wide discovery of protein-DNA interactions such as transcription factor bindings and histone modifications. Previous reports only compared a small number of profiles, and little has been done to compare histone modification profiles generated by the two technologies or to assess the impact of input DNA libraries in ChIP-seq analysis. Here, we performed a systematic analysis of a modENCODE dataset consisting of 31 pairs of ChIP-chip/ChIP-seq profiles of the coactivator CBP, RNA polymerase II (RNA PolII), and six histone modifications across four developmental stages of Drosophila melanogaster. RESULTS: Both technologies produce highly reproducible profiles within each platform, ChIP-seq generally produces profiles with a better signal-to-noise ratio, and allows detection of more peaks and narrower peaks. The set of peaks identified by the two technologies can be significantly different, but the extent to which they differ varies depending on the factor and the analysis algorithm. Importantly, we found that there is a significant variation among multiple sequencing profiles of input DNA libraries and that this variation most likely arises from both differences in experimental condition and sequencing depth. We further show that using an inappropriate input DNA profile can impact the average signal profiles around genomic features and peak calling results, highlighting the importance of having high quality input DNA data for normalization in ChIP-seq analysis. CONCLUSIONS: Our findings highlight the biases present in each of the platforms, show the variability that can arise from both technology and analysis methods, and emphasize the importance of obtaining high quality and deeply sequenced input DNA libraries for ChIP-seq analysis.
  • Chromatin signatures of the Drosophila replication program.
    Eaton ML, Prinz JA, MacAlpine HK, Tretyakov G, Kharchenko PV and MacAlpine DM.
    Genome Research. 2011 Feb;21(2):164-74
    Abstract: DNA replication initiates from thousands of start sites throughout the Drosophila genome and must be coordinated with other ongoing nuclear processes such as transcription to ensure genetic and epigenetic inheritance. Considerable progress has been made toward understanding how chromatin modifications regulate the transcription program; in contrast, we know relatively little about the role of the chromatin landscape in defining how start sites of DNA replication are selected and regulated. Here, we describe the Drosophila replication program in the context of the chromatin and transcription landscape for multiple cell lines using data generated by the modENCODE consortium. We find that while the cell lines exhibit similar replication programs, there are numerous cell line-specific differences that correlate with changes in the chromatin architecture. We identify chromatin features that are associated with replication timing, early origin usage, and ORC binding. Primary sequence, activating chromatin marks, and DNA-binding proteins (including chromatin remodelers) contribute in an additive manner to specify ORC-binding sites. We also generate accurate and predictive models from the chromatin data to describe origin usage and strength between cell lines. Multiple activating chromatin modifications contribute to the function and relative strength of replication origins, suggesting that the chromatin environment does not regulate origins of replication as a simple binary switch, but rather acts as a tunable rheostat to regulate replication initiation events.
  • Plasticity in patterns of histone modifications and chromosomal proteins in the Drosophila heterochromatin.
    Riddle NC*, Minoda A*, Kharchenko PV*, Alekseyenko AA, Schwartz YB, Tolstorukov MY, Gorchakov AA, Kennedy C, Linder-Basso D, Jaffe JD, Shanower G, Kuroda MI, Pirrotta V, Park PJ, Elgin SC, Karpen GH.
    Genome Research. 2011 Feb;21(2):147-63
    Abstract: Eukaryotic genomes are packaged in two basic forms, euchromatin and heterochromatin. We have examined the composition and organization of Drosophila melanogaster heterochromatin in different cell types using ChIP-array analysis of histone modifications and chromosomal proteins. As anticipated, the pericentric heterochromatin and chromosome 4 are on average enriched for the "silencing" marks H3K9me2, H3K9me3, HP1a, and SU(VAR)3-9, and are generally depleted for marks associated with active transcription. The locations of the euchromatin-heterochromatin borders identified by these marks are similar in animal tissues and most cell lines, although the amount of heterochromatin is variable in some cell lines. Combinatorial analysis of chromatin patterns reveals distinct profiles for euchromatin, pericentric heterochromatin, and the 4th chromosome. Both silent and active protein-coding genes in heterochromatin display complex patterns of chromosomal proteins and histone modifications; a majority of the active genes exhibit both "activation" marks (e.g., H3K4me3 and H3K36me3) and "silencing" marks (e.g., H3K9me2 and HP1a). The hallmark of active genes in heterochromatic domains appears to be a loss of H3K9 methylation at the transcription start site. We also observe complex epigenomic profiles of intergenic regions, repeated transposable element (TE) sequences, and genes in the heterochromatic extensions. An unexpectedly large fraction of sequences in the euchromatic chromosome arms exhibits a heterochromatic chromatin signature, which differs in size, position, and impact on gene expression among cell types. We conclude that patterns of heterochromatin/euchromatin packaging show greater complexity and plasticity than anticipated. This comprehensive analysis provides a foundation for future studies of gene activity and chromosomal functions that are influenced by or dependent upon heterochromatin.
  • Identification of functional elements and regulatory circuits in Drosophila by large-scale data integration.
    modENCODE Consortium, Roy S, Ernst J, Kharchenko PV, Kheradpour P, Negre N, Eaton ML, Landolin JM, Bristow CA, Ma L, Lin MF, Washietl S, Arshinoff BI, Ay F, Meyer PE, Robine N, Washington NL, Di Stefano L, Berezikov E, Brown CD, Candeias R, Carlson JW, Carr A, Jungreis I, Marbach D, Sealfon R, Tolstorukov MY, Will S, Alekseyenko AA, Artieri C, Booth BW, Brooks AN, Dai Q, Davis CA, Duff MO, Feng X, Gorchakov AA, Gu T, Henikoff JG, Kapranov P, Li R, MacAlpine HK, Malone J, Minoda A, Nordman J, Okamura K, Perry M, Powell SK, Riddle NC, Sakai A, Samsonova A, Sandler JE, Schwartz YB, Sher N, Spokony R, Sturgill D, van Baren M, Wan KH, Yang L, Yu C, Feingold E, Good P, Guyer M, Lowdon R, Ahmad K, Andrews J, Berger B, Brenner SE, Brent MR, Cherbas L, Elgin SC, Gingeras TR, Grossman R, Hoskins RA, Kaufman TC, Kent W, Kuroda MI, Orr-Weaver T, Perrimon N, Pirrotta V, Posakony JW, Ren B, Russell S, Cherbas P, Graveley BR, Lewis S, Micklem G, Oliver B, Park PJ, Celniker SE, Henikoff S, Karpen GH, Lai EC, MacAlpine DM, Stein LD, White KP, Kellis M.
    Science. 2010;330(6012):1787-97
    Abstract: To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.
  • Estimating enrichment of repetitive elements from high-throughput sequence data.
    Day DS, Luquette LJ, Park PJ, Kharchenko PV.
    Genome Biol. 2010;11(6):R69.
    Abstract: We describe computational methods for analysis of repetitive elements from short-read sequencing data, and apply them to study histone modifications associated with the repetitive elements in human and mouse cells. Our results demonstrate that while accurate enrichment estimates can be obtained for individual repeat types and small sets of repeat instances, there are distinct combinatorial patterns of chromatin marks associated with major annotated repeat families, including H3K27me3/H3K9me3 differences among the endogenous retroviral element classes.
  • A region of human HoxD that confers Polycomb-group responsiveness.
    Woo CJ, Kharchenko PV, Daheron L, Park PJ, Kingston RE.
    Cell. 2010 Jan 8;140(1):99-110
    Abstract: Polycomb group (PcG) proteins are essential for accurate axial body patterning during embryonic development. PcG-mediated repression is conserved in metazoans and is targeted in Drosophila by Polycomb response elements (PREs). However, targeting sequences in humans have not been described. While analyzing chromatin architecture in the context of human embryonic stem cell (hESC) differentiation, we discovered a 1.8kb region between HOXD11 and HOXD12 (D11.12) that is associated with PcG proteins, becomes nuclease hypersensitive, and then shows alteration in nuclease sensitivity as hESCs differentiate. The D11.12 element repressed luciferase expression from a reporter construct and full repression required a highly conserved region and YY1 binding sites. Furthermore, repression was dependent on the PcG proteins BMI1 and EED and a YY1-interacting partner, RYBP. We conclude that D11.12 is a Polycomb-dependent regulatory region with similarities to Drosophila PREs, indicating conservation in the mechanisms that target PcG function in mammals and flies.
  • Long-range dosage compensation in Drosophila captures transcribed autosomal genes inserted on X.
    Gorchakov A.A, Alekseyenko AA, Kharchenko PV, Park PJ, Kuroda MI.
    Genes & Dev. 2009; 23 (19).
    Abstract: Dosage compensation in Drosophila melanogaster males is achieved via targeting of male-specific lethal (MSL) complex to X-linked genes. This is proposed to involve sequence-specific recognition of the X at approximately 150-300 chromatin entry sites, and subsequent spreading to active genes. Here we ask whether the spreading step requires transcription and is sequence-independent. We find that MSL complex binds, acetylates, and up-regulates autosomal genes inserted on X, but only if transcriptionally active. We conclude that a long-sought specific DNA sequence within X-linked genes is not obligatory for MSL binding. Instead, linkage and transcription play the pivotal roles in MSL targeting irrespective of gene origin and DNA sequence.
  • Comparative analysis of H2A.Z nucleosome organization in the human and yeast genomes.
    Tolstorukov MY, Kharchenko PV, Goldman JA, Kingston RE, Park PJ.
    Genome Res. 2009 Jun;19(6):967-77. Epub 2009 Feb 26.
    Abstract: Eukaryotic DNA is wrapped around a histone protein core to constitute the fundamental repeating units of chromatin, the nucleosomes. The affinity of the histone core for DNA depends on the nucleotide sequence; however, it is unclear to what extent DNA sequence determines nucleosome positioning in vivo, and if the same rules of sequence-directed positioning apply to genomes of varying complexity. Using the data generated by high-throughput DNA sequencing combined with chromatin immunoprecipitation, we have identified positions of nucleosomes containing the H2A.Z histone variant and histone H3 trimethylated at lysine 4 in human CD4(+) T-cells. We find that the 10-bp periodicity observed in nucleosomal sequences in yeast and other organisms is not pronounced in human nucleosomal sequences. This result was confirmed for a broader set of mononucleosomal fragments that were not selected for any specific histone variant or modification. We also find that human H2A.Z nucleosomes protect only approximately 120 bp of DNA from MNase digestion and exhibit specific sequence preferences, suggesting a novel mechanism of nucleosome organization for the H2A.Z variant.
  • Design and analysis of ChIP-seq experiments for DNA-binding proteins.
    Kharchenko PV, Tolstorukov MY, Park PJ.
    Nat Biotechnol. 2008 Dec;26(12):1351-9
    Abstract: Recent progress in massively parallel sequencing platforms has enabled genome-wide characterization of DNA-associated proteins using the combination of chromatin immunoprecipitation and sequencing (ChIP-seq). Although a variety of methods exist for analysis of the established alternative ChIP microarray (ChIP-chip), few approaches have been described for processing ChIP-seq data. To fill this gap, we propose an analysis pipeline specifically designed to detect protein-binding positions with high accuracy. Using previously reported data sets for three transcription factors, we illustrate methods for improving tag alignment and correcting for background signals. We compare the sensitivity and spatial precision of three peak detection algorithms with published methods, demonstrating gains in spatial precision when an asymmetric distribution of tags on positive and negative strands is considered. We also analyze the relationship between the depth of sequencing and characteristics of the detected binding positions, and provide a method for estimating the sequencing depth necessary for a desired coverage of protein binding sites.
  • Nucleosome positioning in human HOX gene clusters.
    Kharchenko PV, Woo CJ, Tolstorukov MY, Kingston RE, Park PJ.
    Genome Res. 2008 Oct;18(10):1554-61.
    Abstract: The distribution of nucleosomes along the genome is a significant aspect of chromatin structure and is thought to influence gene regulation through modulation of DNA accessibility. However, properties of nucleosome organization remain poorly understood, particularly in mammalian genomes. Toward this goal we used tiled microarrays to identify stable nucleosome positions along the HOX gene clusters in human cell lines. We show that nucleosome positions exhibit sequence properties and long-range organization that are different from those characterized in other organisms. Despite overall variability of internucleosome distances, specific loci contain regular nucleosomal arrays with 195-bp periodicity. Moreover, such arrays tend to occur preferentially toward the 3' ends of genes. Through comparison of different cell lines, we find that active transcription is correlated with increased positioning of nucleosomes, suggesting an unexpected role for transcription in the establishment of well-positioned nucleosomes.
  • Differential H3K4 methylation identifies developmentally poised hematopoietic genes.
    Orford K*, Kharchenko P*, Lai W, Dao MC, Worhunsky DJ, Ferro A, Janzen V, Park PJ, Scadden DT.
    Dev Cell. 2008 May;14(5):798-809
    Abstract: Throughout development, cell fate decisions are converted into epigenetic information that determines cellular identity. Covalent histone modifications are heritable epigenetic marks and are hypothesized to play a central role in this process. In this report, we assess the concordance of histone H3 lysine 4 dimethylation (H3K4me2) and trimethylation (H3K4me3) on a genome-wide scale in erythroid development by analyzing pluripotent, multipotent, and unipotent cell types. Although H3K4me2 and H3K4me3 are concordant at most genes, multipotential hematopoietic cells have a subset of genes that are differentially methylated (H3K4me2+/me3-). These genes are transcriptionally silent, highly enriched in lineage-specific hematopoietic genes, and uniquely susceptible to differentiation-induced H3K4 demethylation. Self-renewing embryonic stem cells, which restrict H3K4 methylation to genes that contain CpG islands (CGIs), lack H3K4me2+/me3- genes. These data reveal distinct epigenetic regulation of CGI and non-CGI genes during development and indicate an interactive relationship between DNA sequence and differential H3K4 methylation in lineage-specific differentiation.
  • Chromosomal periodicity of evolutionarily conserved gene pairs.
    Wright MA, Kharchenko P, Church GM, Segrè D.
    Proc Natl Acad Sci U S A. 2007 Jun 19
    Abstract: Chromosomes are compacted hundreds of times to fit in the cell, packaged into dynamic folds whose structures are largely unknown. Here, we examine patterns in gene locations to infer large-scale features of bacterial chromosomes. Specifically, we analyzed >100 genomes and identified thousands of gene pairs that display two types of evolutionary correlations: a tendency to co-occur and a tendency to be located close together in many genomes. We then analyzed the detailed distribution of these pairs in Escherichia coli and found that genes in a pair tend to be separated by integral multiples of 117 kb along the genome and to be positioned in a 117-kb grid of genomic locations. In addition, the most pair-dense locations coincide with regions of intense transcriptional activity and the positions of top transcribed and conserved genes. These patterns suggest that the E. coli chromosome may be organized into a 117-kb helix-like topology that localizes a subset of the most essential and highly transcribed genes along a specific face of this structure. Our approach indicates an evolutionarily maintained preference in the spacing of genes along the chromosome and offers a general comparative genomics framework for studying chromosome structure, broadly applicable to other organisms.
  • Identifying metabolic enzymes with multiple types of association evidence.
    Kharchenko P, Chen L, Freund Y, Vitkup D, Church GM
    BMC Bioinformatics. 2006 Mar 29;7:177
    Abstract:BACKGROUND: Existing large-scale metabolic models of sequenced organisms commonly include enzymatic functions which can not be attributed to any gene in that organism. Existing computational strategies for identifying such missing genes rely primarily on sequence homology to known enzyme-encoding genes. RESULTS: We present a novel method for identifying genes encoding for a specific metabolic function based on a local structure of metabolic network and multiple types of functional association evidence, including clustering of genes on the chromosome, similarity of phylogenetic profiles, gene expression, protein fusion events and others. Using E. coli and S. cerevisiae metabolic networks, we illustrate predictive ability of each individual type of association evidence and show that significantly better predictions can be obtained based on the combination of all data. In this way our method is able to predict 60% of enzyme-encoding genes of E. coli metabolism within the top 10 (out of 3551) candidates for their enzymatic function, and as a top candidate within 43% of the cases. CONCLUSION: We illustrate that a combination of genome context and other functional association evidence is effective in predicting genes encoding metabolic enzymes. Our approach does not rely on direct sequence homology to known enzyme-encoding genes, and can be used in conjunction with traditional homology-based metabolic reconstruction methods. The method can also be used to target orphan metabolic activities.