From Locus Association to Mechanism of Gene Causality
The Devil Is in the Details
Arguably, one of the greatest advances in biomedical research in the past decades has been the association of human allelic variation with complex human diseases and phenotypes. Certainly, initially these findings were met with skepticism, given the underpowered and misleading number of candidate gene association studies that preceded the genome-wide association (GWA) era. Also, there was considerable incredulity when it became clear that identified variation in reasonably powered GWA studies (GWAS) could not address the bulk of attributable genetic risk for the majority of diseases. However, scientists in academia and industry are now increasingly recognizing the importance of studying these loci for better understanding of disease pathways and developing new therapeutics. This is perhaps most significant for atherosclerotic coronary artery disease (CAD), the primary source of mortality and morbidity worldwide, for which no single drug has yet been developed to target the primary disease process in the vessel wall.
See accompanying article on page 2207
GWAS have identified common variation throughout the genome that associates with specific diseases, but these single-nucleotide polymorphisms (SNPs) provide little information about the mechanism for this association. Lead variants reported in GWAS are commonly tag SNPs chosen to represent regions of linkage disequilibrium (LD) that can be hundreds of thousands of nucleotides in length. The next era of GWAS will be focused on finding the causal variation in these loci, using this information to identify the causal gene, and then elucidating the mechanisms behind disease risk susceptibility. Deciphering precise molecular mechanisms will involve both state-of-the-art computational analysis and in vitro and in vivo experimental validation (Figure). Our future understanding of CAD and development of novel treatments for patients will ultimately depend on how we approach this daunting task.
In an article published in the current volume of ATVB, Braenne et al1 have taken advantage of an extensive array of existing data sets to develop comprehensive annotation for 159 CAD loci.2 Importantly, in this approach they have incorporated several layers of selection and filtering to prioritize candidate genes based on the identification of variants in linkage disequilibrium with lead SNPs that are located in coding region of exons, represent expression quantitative trait loci (eQTL), and reside in regulatory regions (epigenetic features of transcriptional activity). The 3 main criteria for the filtering were (1) nonsynonymous amino acid change, (2) eQTL effect, and (3) overlap with the regulatory region. From the initial 159 lead CAD SNPs, only 33 were exonic (22 nonsynonymous variants associated with the amino acid change), whereas the majority of variants reside within regulatory elements or heterochromatic regions. Sixty-six CAD lead SNPs had eQTL associations with genes located <1 million base (1 Mb) pairs away. These findings bolster previous studies suggesting that the principal mechanism for the majority of noncoding variants at CAD loci involves regulating local gene expression. Consistent with this hypothesis, promoter SNPs had ≤3 additional nearby genes as eQTLs. Moreover, this study identifies CAD SNPs as eQTLs for genes located ≤0.5 Mb away. For instance, the variant rs2895811, is located in the intron of the HHIPL1 gene but is instead associated with variance in YY1 expression levels. They also note the complexity of deleterious protein-coding variants such as the lead SNP, rs867187, in the PROCR gene, which is in high linkage disequilibrium with another deleterious variant in the MYH7B gene. Although we have commonly annotated lead variants in relation to their nearest coding gene, this type of analysis highlights the structural complexity of the genome and the need for more systematic approaches that may disentangle the interacting regulatory architecture in regions of disease-associated loci (Figure).
Although the majority of post-GWAS efforts have focused on transcriptional regulation of candidate regulatory variants, this study also emphasizes the importance of miRNA regulation of causal gene expression as another mechanism of CAD associations. Regulation of TCF21 has been previously linked to miR-224–mediated interaction with the 3′ untranslated region lead SNP,3 but this putative causal mechanism has not been systematically investigated. Here, the authors reveal that 55 CAD SNPs that reside in the 3′ untranslated region of 33 genes are predicted to disrupt binding of 254 unique miRNA core-binding sequences. Not surprisingly, they note that 23 of these miRNAs are predicted to target multiple CAD genes, and the miR-SNPs are also in high linkage disequilibrium with promoter SNPs. These predicted interactions at the transcriptional and post-transcriptional level may explain causal regulatory mechanisms for multiple disease associations. Importantly, several CAD SNP-eQTL associations were highlighted as being likely because of disruption of miRNA-binding misregulation. It will be critical to validate these associations with changes in the endogenous upstream transcription factors and miRNAs in the appropriate context.
By considering primarily eQTLs and nonsynonymous amino acid changes the authors provisionally identify 151 candidate CAD genes from 159 SNPs, among which 98 represent genes not previously linked to the pathology of CAD. A literature-based approach to prioritization of SNPs from this list yielded only few genes with maximum scores, which have been extensively described in CAD-related publications, whereas 31% of CAD SNPs could be linked to CAD solely using a data-driven approach, for example, the REST gene being among them, a strong phenotypic modulator of vascular smooth muscle cells.
The primary limitation of this study is the lack of additional data sets that support the identification of causal variants and point to new mechanisms of disease association. New approaches promise to provide insights into the native chromatin architecture in disease-relevant cell types, and the underlying cis-regulatory mechanisms of the associations. For instance, the recently developed Assay of Transposase Accessible Chromatin method provides access to critical information about locus anatomy and can be conducted on primary cultured individual cells or individual cells that can be harvested from human disease lesions by microdissection, tissue dissolution and single-cell capture.4 Analogous to eQTL approaches, allele-specific expression data derived from limited numbers of primary cultured cells or residential lesion cells would be highly informative for identification of causal variants and causal genes.5 These studies would require fewer numbers of individuals to detect significant expression changes using heterozygous exonic SNPs as a surrogate, and may unravel the dynamics of intercellular heterogeneity. These types of data would also provide much needed insights into the mechanisms by which causal variants located outside known transcription factor motifs or miRNA-binding sites mediate changes in gene expression. Estimates from this study show that ≈50% of CAD SNPs reside outside both the ENCODE regulatory elements and gene-coding regions and mechanisms behind these associations remain elusive.
The findings from the integrative approach reviewed here have honed the pool of candidate genes to be further validated and studied with in vitro and in vivo functional studies to investigate the mechanism of their association (Figure). Through continued development of multiomic data sets from relevant cells and tissues, and painstaking identification of the causal genes and their biologically meaningful functions, we can significantly advance our understanding of disease-linked pathways to ultimately develop therapeutics targeted to the vessel wall. To paraphrase Admiral Hyman G. Rickover, “The devil is in the details, but so is the solution.”
Sources of Funding
This work has been supported by National Institutes Health grants HL103635 (T.Q.), U01HL107388 (T.Q.), HL109512 (T.Q.), R21HL120757 (T.Q.), and HL125912 (C.L.M.) and a grant from the LeDucq Foundation.
- © 2015 American Heart Association, Inc.
- Braenne I,
- Civelek M,
- Vilne B,
- et al
- Miller CL,
- Haas U,
- Diaz R,
- Leeper NJ,
- Kundu RK,
- Patlolla B,
- Assimes TL,
- Kaiser FJ,
- Perisic L,
- Hedin U,
- Maegdefessel L,
- Schunkert H,
- Erdmann J,
- Quertermous T,
- Sczakiel G