Understanding gene regulation at the transcriptional level is critical to understanding complex biological systems and human disease. In virtually all organisms gene regulation is mediated by a “regulatory code” in which distinct combinations of specific transcription factors (TFs) collaborate to regulate the expression of individual genes. This code is complex and not readily obvious from sequences alone. It likely involves many cis-regulatory modules (CRMs) that exist both upstream and within genes. Data from the ENCODE and modENCODE projects suggests that the amount of cis-regulatory sequence may exceed that of the genes themselves. In addition, mounting evidence suggests that major differences between individuals and species lies at the level of gene regulation and that changes in cis-regulatory sequences are responsible for these effects. As such, it is important to map and understand how sequence variations in individuals are responsible for mediating differences in gene expression and their phenotypic consequences. The goal of my research is to understand the biological mechanisms underlying transcriptional regulation and how human variation at regulatory regions affects this process.


Targeted characterization of tandem repeats

Known tandem repeats (TRs) makeup 3% of the human genome and are highly variable across individuals. Tandem repeats are intrinsically unstable, and their expansion is known to cause over 50 human diseases including ALS, Ataxia, and Huntington’s Disease. While combined these disorders have a high prevalence, characterization and discovery of these loci have proven elusive with short read sequencing techniques due to their repetitive nature. Long read sequencing technologies such as Oxford Nanopore Technologies produce reads up to 2Mb allowing for sequencing repeat elements in their entirety, however they have relatively high error rates which poses another obstacle to accurate quantification of TR copy number. We aim to more accurately characterize tandem repeat regions, both at healthy and pathogenic lengths, with a combination of targeted Nanopore sequencing and improved computational methods.

Nanopore sequencing for variant characterization

Filler text

The impact of genetic variation on gene regulation

Filler text

Mobile element derived chromatin looping variability in the human population

Filler text

High-throughput inverted reporter assay for characterization of silencers and enhancer blockers

Filler text

Improving breast cancer patient prognosis with targeted therapies

Breast cancer is the most diagnosed cancer in the world, and remains the most deadly for women. Clinical management of breast cancer includes radiation therapy as a mainstay, with upwards of 85% of women receiving radiation therapy as part of their treatment regimen after breast conserving surgery. Although effective, over 10% of women will develop a local recurrence despite radiation therapy. Unfortunately, the molecular mechanisms that underly radiation response and intrinsic radioresistance or radiosensitivity are poorly understood. We are leveraging multi-omics data to interrogate these mechanisms.
Additionally, metastasis is involved in over 90% of cancer-related deaths. Triple-negative breast cancer is known for earlier disease onset and a higher propensity to metastasize relative to other breast cancer subtypes, making it more likely that it will metastasize before it can be diagnosed. Thus, knowing the phenotypic states of cells that can successfully metastasize and the transcriptional regulatory networks that govern them is invaluable knowledge to prevent and reverse metastasis in individuals with this disease, ultimately improving patient prognosis.