Our lab uses computational methods to tackle fundamental problems in biomedicine. We study gene regulatory mechanisms by means of computational modeling. To facilitate our data-centric approach, we develop novel methods for analyzing large amounts of biological data, including those produced by cutting-edge high-throughput experiments. Our computational models provide a systematic way to investigate the functional effects of different types of perturbations to regulatory mechanisms, which creates testable hypotheses for studying human diseases and facilitates translational research.
Construction of quantitative models of gene regulatory mechanisms

Gene expression is regulated by a variety of mechanisms. Most existing knowledge about gene regulation is qualitative in nature. For example, promoter DNA methylation is well-recognized to be associated with transcriptional repression, but it would be much more informative to know the amount of differential expression given a specific change of the promoter methylation level. This type of quantitative understanding is key to linking molecular-level events to high-level phenotypes.

To develop quantitative models of gene regulatory mechanisms, it is required to determine the key players and their relationships. In our research, we have invented methods to perform genome-wide identification of functional sequence elements in humans and model organisms. We have developed machine leaning methods to reconstruct networks that connect these functional elements, such as finding target genes of transcriptional enhancers. We have also modeled the detailed quantitative relationships between gene expression and chromatin accessibility, histone modifications, transcription factor binding and DNA methylation. With all these foundations, recently we have started modeling the joint effects of many gene regulatory mechanisms, involving heterogeneous data types, on gene expression levels.

Our current goal is to infer the functional consequences of every genetic variation in the human genome in a cell-type-specific manner. This would require improving and integrating many components that we have developed over the years, such as identification of active enhancers in each cell type, accurate reconstruction of various types of biological networks, and prediction of the direct effects of genetic variants on the chromatin and epigenetic features of individual elements and their downstream indirect effects propagated over the networks.

Representative publications:

Development of analysis methods for new experimental technologies

Our lab has deep interests in developing computational methods for analyzing data from emerging experimental technologies, such as single-cell sequencing and long-read sequencing. One technology that we have championed in the past few years is nano-channel-based optical DNA mapping. In this technology, particular sites of DNA molecules are fluorescently labeled, linearized in nanometer-scale channels, and subsequently imaged using high-resolution microscopy. The resulting data contain the locations of fluorescent labels along long DNA molecules up to one megabase long. Due to the long read length, optical mapping data can provide useful information for applications such as sequence assembly, structural variation (SV) calling and haplotype phasing.

We are a leading group in developing analysis methods for optical mapping data. We have developed pairwise and multiple alignment methods, SV callers, and tools for processing and visualizing optical mapping data. Using our methods, we have first demonstrated the different applications using data produced from a family trio, followed by an extended study of genomes from 26 populations. In this latter study, in addition to a comprehensive analysis of SVs, we have also studied the overall genome structures, genomic contents not present in the human reference sequence, and regions difficult to investigate by short-read sequencing such as sub-telomeric regions.

Our current goal is to develop analysis methods for single-cell sequencing and spatial transcriptomics data.

Representative publications:

Studying disease mechanisms, drug efficacy, and new therapies

Our lab has extensive collaborations with local and international research groups in the study of various human diseases. We are familiar with standard analysis pipelines for various types of data, while at the same time we have been contributing new analysis approaches based on methods we developed.

We have studied various types of human cancer. For example, we have studied the epigenetic landscape of hepatocellular carcinoma (HCC) to identify disrupted gene regulatory elements, with follow-up validation and functional experiments delineating the molecular mechanisms and proving the medical relevance of our findings. Nasopharyngeal carcinoma (NPC) is another cancer type of our focus. We have contributed to the genomics and transcriptomics of this cancer. We have also studied the genome and transcriptome of the Epstein-Barr virus associated with NPC. In addition to cancer, we have also been studying other human diseases, including diabetes, cardiovascular diseases, intervertebral disc degeneration and Hirschsprung disease. For example, we have recently developed a new method that identifies different non-coding genetic variants that have convergent functional effects in different patients on the same genes and pathways, and applied it to identify novel genes associated with Hirschsprung disease.

We are currently focusing on the development of methods that can predict the efficacy of immune checkpoint inhibition (ICI) treatment on cancer patients.

Representative publications: