Molecular Characterization of the Lipid Genome-Wide Association Study Signal on Chromosome 18q11.2 Implicates HNF4A-Mediated Regulation of the TMEM241 GeneHighlights
Objective—We recently identified a locus on chromosome 18q11.2 for high serum triglycerides in Mexicans. We hypothesize that the lead genome-wide association study single-nucleotide polymorphism rs9949617, or its linkage disequilibrium proxies, regulates 1 of the 5 genes in the triglyceride-associated region.
Approach and Results—We performed a linkage disequilibrium analysis and found 9 additional variants in linkage disequilibrium (r2>0.7) with the lead single-nucleotide polymorphism. To select the variants for functional analyses, we annotated the 10 variants using DNase I hypersensitive sites, transcription factor and chromatin states and identified rs17259126 as the lead candidate variant for functional in vitro validation. Using luciferase transcriptional reporter assay in liver HepG2 cells, we found that the G allele exhibits a significantly lower effect on transcription (P<0.05). The electrophoretic mobility shift and ChIPqPCR (chromatin immunoprecipitation coupled with quantitative polymerase chain reaction) assays confirmed that the minor G allele of rs17259126 disrupts an hepatocyte nuclear factor 4 α–binding site. To find the regional candidate gene, we performed a local expression quantitative trait locus analysis and found that rs17259126 and its linkage disequilibrium proxies alter expression of the regional transmembrane protein 241 (TMEM241) gene in 795 adipose RNAs from the Metabolic Syndrome In Men (METSIM) cohort (P=6.11×10−07–5.80×10−04). These results were replicated in expression profiles of TMEM241 from the Multiple Tissue Human Expression Resource (MuTHER; n=856).
Conclusions—The Mexican genome-wide association study signal for high serum triglycerides on chromosome 18q11.2 harbors a regulatory single-nucleotide polymorphism, rs17259126, which disrupts normal hepatocyte nuclear factor 4 α binding and decreases the expression of the regional TMEM241 gene. Our data suggest that decreased transcript levels of TMEM241 contribute to increased triglyceride levels in Mexicans.
- functional genomics
- gene expression and regulation
- genome-wide association study
Serum triglyceride levels are heritable and environmentally modifiable risk factor for cardiovascular disease.1 Several groups have successfully used genome-wide association studies (GWAS) to identify signals for triglycerides and other lipid traits, including high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, and total cholesterol.2 However, the lead GWAS signals may not themselves be functional rather in linkage disequilibrium (LD) with the actual underlying susceptibility variant. This limitation in GWAS derives from the fact that the human genome is only relatively superficially screened in GWAS using common tag single-nucleotide polymorphisms (SNPs). Furthermore, the functional variant often acts through a regional gene. Therefore, GWASs are only a starting point and require subsequent fine mapping and functional validation studies to identify the actual susceptibility variants and genes.
According to a recent survey, both the US Hispanic men and women have higher levels of serum triglycerides than non-Hispanic whites or blacks,3 a result consistently reported for the past 2 decades.4 Recent studies using Latino cohorts have successfully narrowed European lipid loci.5 Moreover, because of the higher incidence of metabolic disease in the Amerindian origin populations, the investigation of their admixed genomes provides an opportunity to identify Amerindian-specific susceptibility variants for complex cardiovascular traits.6 Despite their high predisposition to dyslipidemias, Hispanics remain underinvestigated as the discovery study stage in genomic cardiovascular studies. Previously, we identified a locus on chromosome 18q11.2 associated with high serum triglycerides in Mexicans using GWAS.5 However, similar to other GWAS, the functional variants and the underlying gene(s) through which these variants exert their effects in the triglyceride phenotype remain to be elucidated. To find the actual functional risk variant(s), we systematically annotated the SNPs in the triglyceride-associated LD block with chromatin state marks and transcription factor–binding events which nominated rs17259126 as the top candidate functional variant. Its genomic landscape harbors regulatory sites and is predicted to disrupt an hepatocyte nuclear factor 4 α (HNF4A)–binding site. We show that the G allele of rs7259126 reduces expression of the luciferase reporter gene in a human liver cell line. Consistent with this result, the mobility shift and ChIPqPCR (chromatin immunoprecipitation coupled with quantitative polymerase chain reaction) assays confirmed that the same allele disrupts an HNF4A-binding site. Replicated cis-expression quantitative trait locus (cis-eQTL) analyses also implicate the minor G allele of rs17259126 for reduced expression of transmembrane protein 241 (TMEM241), suggesting TMEM241 as the regional candidate gene. Taken together, we found that the triglyceride locus on chromosome 18q11.2 harbors at least one functional variant, rs17259126, associated with a decreased expression of the regional TMEM241 gene, a novel gene for triglycerides in the rapidly growing Hispanic population with a high predisposition to dyslipidemias.
Materials and Methods
Materials and Methods are available in the online-only Data Supplement.
Pairwise LD Analysis to Identify LD Proxies
In our original GWAS,5 conditional association analyses at the top 12 genotyped loci did not reveal additional independent SNPs with P≤2.5×10−3. To identify the full set of variants in LD with the lead GWAS SNP rs9949617, we first performed a regional LD analysis in the triglyceride-associated LD block. The LD block was determined in our previous study as the region spanning SNPs in LD of r2≥0.5 with the lead SNP rs9949617.5 For the LD analysis, we used our genotyped and imputed GWAS data,5 and we also verified using the 1000 Genomes Project data that no additional SNP(s) inside or outside this LD block (±500 kb from the block borders) have emerged to be in LD with the lead SNP rs9949617 since our previous study.5 We found 3 genotyped and 6 imputed SNPs in LD (r2≥0.7) with the lead SNP rs9949617 (Table). Two of these 10 SNPs (rs9949617 and rs4800467) were genotyped in stages 1 and 2 of our original GWAS scan,5 both resulting in P values <5×10−8. Because any of these 10 SNPs in LD can be the functional variant underlying the triglyceride association on chromosome 18q11.2, we first performed functional annotation followed by hypothesis-driven functional assays to uncover the functional variant in the triglyceride-associated LD block. We also tested the candidate variant and its LD proxies for regional effects on gene expression among the 5 genes in the triglyceride-associated LD block using a cis-eQTL analysis to investigate whether the variant changes expression of a particular regional gene.
Functional Genomics Analysis Using Encyclopedia of DNA Elements Data
Cis-eQTL variants often reside in regulatory elements such as transcription factor–binding sites (TFBS) and interrupt transcription factor occupancy, leading to transcriptional changes. However, functional variants may also act through multiple other mechanisms making functional validation studies challenging. To facilitate the identification of suitable functional assays, we used the encyclopedia of DNA elements (ENCODE) data sets to give biological interpretation to the variants, and based on their predicted functionality, we conducted hypothesis-driven functional assays. TFBS often coincide with regions of open chromatin; hence, we annotated the chromatin state using ENCODE DNase I hypersensitive sites and histone marks in disease-relevant cell lines and control cell lines. In addition to the ENCODE biochemical annotations, we looked for transcription factor motif disruptions using HaploReg. We hypothesized that variants with the greatest amount of regulatory evidence from experimental data sets and bioinformatic predictions are more likely to be functional. Using this approach, we screened all 10 SNPs (the lead SNP and its 9 LD proxies) and selected rs17259126 as a top candidate for functional validation because it resides in a TFBS and a likely regulatory element defined by the co-occurrence of H3K27ac and H3K4me1. The G allele of rs17259126 is also predicted to disrupt a HNF4A regulatory motif (Figure I in the online-only Data Supplement). HNF4A is a known regulator of several metabolic genes.7 On the basis of these annotations, we hypothesized that rs17259126 resides in a TFBS and regulates expression of one of the regional genes on chromosome 18q11.
Functional Validation of Candidate Variants
We sought to validate our predicted functional variant rs17259126. We performed luciferase reporter assays using engineered vectors containing a 600-bp sequence around the SNP. At 48 hours post transfection of HepG2 cells, we found that the minor allele G displays a 1.5-fold decreased reporter expression (P<0.05) compared with the major A allele in 3 biological replicates (Figure 1). These results are consistent with the observed direction of the cis-eQTL effect (β=−0.149; Table). Similar assays for the lead SNP rs9949617 and rs4800467 did not reveal significant expression changes in the luciferase assay.
To further investigate whether the variant disrupts an HNF4A motif, we performed electrophoretic mobility shift assays (EMSA) using isolated HNF4A protein (Figure 2A) or HepG2 cell nuclear extracts (Figure 2B) and found evidence that HNF4A preferentially binds the major A allele of rs17259126 in 4 biological replicates. We also performed EMSA assays for the 9 other LD proxies variants. No allele-specific shifts were observed (Figure II in the online-only Data Supplement). Together, the luciferase (Figure 1) and EMSA (Figure 2) assays suggest that HNF4A may regulate expression of a target gene by directly binding to the rs17259126 regulatory site.
To confirm that HNF4A interacts with the variant site in HepG2 cells, we performed chromatin immunoprecipitation followed by qPCR targeting a 71-bp (site 1) or 151-bp (site 2) sequence surrounding rs17259126 (Figure 3). We found an average enrichment of 4.23 and 2.29 for the sequences, respectively, when compared with an unbound control site. Our functional studies provide converging evidence that the sequence underlying rs17259126 is an HNF4A-binding site and that the G minor allele significantly inhibits this interaction in vitro.
GWAS variants residing in regulatory elements such as TFBS can lead to gene expression changes and contribute to disease susceptibility. We investigated whether the lead GWAS SNP may affect expression of the regional genes in the ≈300-kb region defining the triglyceride-associated window on chromosome 18q11.2 (LD r2>0.5 with the lead SNP).5 We performed a cis-eQTL analysis for the 5 genes within this triglyceride-associated LD block using adipose RNA-seq samples (n=795) from the Metabolic Syndrome In Men (METSIM) cohort and discovered that the lead SNP rs9949617 (ie, the SNP with the strongest triglyceride association signal4) and its LD proxies are a cis-eQTL, regulating the expression of one regional gene, the TMEM241 (P=6.11×10−07–5.80×10−04; Table). These results pass the Bonferroni correction for the 50 performed tests (10 SNPs tested for 5 regional genes; P<0.001; Table), and the 10 triglyceride-associated SNPs did not regulate expression of any of the 4 other genes within the LD block (Bonferroni corrected P>0.05).
To validate and replicate these regional cis-eQTL analysis results, we used expression data from 856 publicly available human adipose, skin, and lymphocyte RNA microarray samples from the Multiple Tissue Human Expression Resource (MuTHER)8 and similarly discovered that the lead SNP rs9949617 is a cis-eQTL (Figure III in the online-only Data Supplement), regulating the expression of TMEM241 (P<1×10−5 across all 3 tissues; β=−0.107 for adipose). These replication data are consistent, including the direction of the effect, with our cis-eQTL signal in Finns and our luciferase assays in which the minor G allele results in a decreased expression (Table and Figure 1). We also found comparable cis-eQTL results for the lead SNP rs9949617 in the HapMap3 data sets for the CEU (Utah residents with Northern and Western European ancestry from the CEPH collection; P=0.0010), CHB (Han Chinese in Beijing, China; P=0.0019), and JPT (Japanese in Tokyo, Japan; P=6.0×10−4) samples in lymphoblastoid cells. Although there was a trend toward significance, this relationship did not hold for the MEX HapMap sample (P=0.20), perhaps because of the low number of Mexican-American samples (n=45) included in the HapMap project. These results implicate TMEM241 as a likely regional gene underlying the GWAS association because the lead SNP and its LD proxies robustly regulate TMEM241 expression through multiple cohorts. Taken together, these data suggest that rs17259126 is at least one of the functional SNPs underlying the original triglyceride GWAS signal5 on chromosome 18q11.2 in Amerindian origin populations.
We recently identified a locus on chromosome 18q11.2 associated with high serum triglycerides in Mexicans using GWAS.5 However, GWAS typically do not conclusively identify a functional regulatory variant and candidate gene, rather they require statistical and biochemical follow-up studies.9,10 We used statistical fine mapping to first identify variants in the triglyceride-associated LD block. Because all variants represent 3′UTR (untranslated region) or noncoding variants, we annotated their biological function using available regulatory data sets and bioinformatic tools and subsequently validated our recorded annotations using appropriate molecular assays.
Our LD analyses uncovered 9 variants in LD with the lead GWAS SNP rs9949617. Functional annotations using HaploReg11 found that rs17259126 is predicted to disrupt an HNF4A-binding site, the minor G allele exhibiting a lower enrichment score. Furthermore, the ENCODE TF ChIP-seq data in HepG2 showed evidence of HNF4A enrichment around rs17259126. These findings prompted us to nominate rs17259126 as the lead candidate for molecular validation. We performed HNF4A ChIPqPCR targeting the SNP region and confirmed that HNF4A indeed binds the SNP site. HNF4A is a well-known, central regulator of hepatocyte development, differentiation, and gene expression7,12 associated with type 2 diabetes mellitus, consistent with the triglyceride association. In line with our bioinformatics prediction, we also show that the G allele of rs17259126 reduces transcription of the luciferase reporter and significantly inhibits HNF4A binding in mobility shift assays. It is worth noting that Amerindian origin populations have >3-fold higher frequency of the minor allele G of rs17259126 when compared with Europeans (minor allele frequencies for admixed American=0.22, European=0.06, African=0.08, and Asian=0.20, respectively).
To identify the regional gene, we performed cis-eQTL analyses using expression data from multiple cohorts, tissues, and platforms. We provide replicated evidence that the minor G allele of rs17259126 and its LD proxies are a robust cis-eQTL decreasing expression of the regional TMEM241 gene across many cohorts. Our results suggest that HNF4A binds the A allele of rs17259126 site and increases expression of the TMEM241 gene, 1 of the 5 regional genes in the LD block. We hypothesize that individuals with the G allele have decreased TMEM241 expression which affects the normal triglyceride synthesis or secretory pathways through an unknown mechanism.
The TMEM241 gene is a yeast VRG4 homolog, a Golgi-localized GDP (guanosine diphosphate mannose)-mannose transporter. Yeast VRG4 is pleiotropically required for a range of Golgi functions, including N-linked glycosylation, secretion, protein sorting, and the maintenance of a normal endomembrane system.13,14 In the mammalian Golgi, carbohydrate processing is a highly diverse process. Carbohydrate chains may contain galactose, sialic acid, fucose, xylose, N-acetylglucosamine, and N-acetylgalactosamine unlike in the yeast Saccharomyces cerevisiae, where glycosylation is restricted to mannosylation. Thus, human TMEM241 may function in the transport of other nucleotide sugars required in mammalian systems. In addition to glycoproteins, sphingolipids are also modified in the Golgi and have been implicated in metabolic disease.15 TMEM241 is believed to function as a nucleotide sugar transporter and, when defective, may lead to underglycosylation of glycoproteins and sphingolipids, potentially resulting in dysregulation of triglyceride synthesis.
Together, our results provide converging evidence suggesting rs17259126 as one of the functional variants underlying the GWAS association signal on 18q11.2,5 and TMEM241 as the underlying gene for triglycerides in Amerindian origin populations. However, because not all individuals of Mexican ancestry share the same composition of Amerindian DNA, additional cohorts may or may not replicate this particular association.
Future studies focusing on characterizing the role of TMEM241 in triglyceride metabolism could include CRISPR/Cas9,16 an emerging technology for targeted genomic modification. This technology allows a site-specific genetic engineering in disease-relevant cell lines to interrogate the function of specific genes and single-nucleotide variants in their native chromatin state. Elucidation of the role of TMEM241 in triglyceride metabolism may help guide future research and development of new therapies for effective triglyceride management and prevention of heart disease in the rapidly growing Hispanic populations, currently underinvestigated in genomic cardiovascular studies despite their high predisposition to dyslipidemias.
We thank the Mexican and Finnish individuals who participated in this study. We also thank Saúl Cano-Colín for laboratory technical assistance. Michael Boehnke and Francis Collins are thanked for providing the METSIM genotype data.
Sources of Funding
This study was funded by the Institutes of Health (NIH) grants HL-095056, HL-28481, and DK093757. A. Rodríguez was supported by the National Science Foundation Graduate Research Fellowship Program NSF grant number DGE-1144087 and A. Ko by NIH grants F31HL127921 and T32HG002536. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the article. Genotyping services for the METSIM cohort were supported by NIH grants DK072193, DK093757, DK062370, and Z01HG000024 and provided by the Center for Inherited Disease Research (CIDR). CIDR is fully funded through a federal contract from the NIH to The Johns Hopkins University, contract number HHSN268201200008I.
The online-only Data Supplement is available with this article at http://atvb.ahajournals.org/lookup/suppl/doi:10.1161/ATVBAHA.116.307182/-/DC1.
- Nonstandard Abbreviations and Acronyms
- cis-expression quantitative trait locus
- encyclopedia of DNA elements
- genome-wide association study
- hepatocyte nuclear factor 4 α
- linkage disequilibrium
- single-nucleotide polymorphism
- transcription factor–binding sites
- transmembrane protein 241
- Received November 20, 2014.
- Accepted May 9, 2016.
- © 2016 American Heart Association, Inc.
- Go AS,
- Mozaffarian D,
- Roger VL,
- et al
- Aguilar-Salinas CA,
- Canizales-Quinteros S,
- Rojas-Martínez R,
- Mehta R,
- Villarreal-Molina MT,
- Arellano-Campos O,
- Riba L,
- Gómez-Pérez FJ,
- Tusié-Luna MT
- Weissglas-Volkov D,
- Aguilar-Salinas CA,
- Nikkola E,
- et al
- Hayhurst GP,
- Lee YH,
- Lambert G,
- Ward JM,
- Gonzalez FJ
- Schaub MA,
- Boyle AP,
- Kundaje A,
- Batzoglou S,
- Snyder M
- Ward LD,
- Kellis M
- Hansen HG,
- Schmidt JD,
- Søltoft CL,
- Ramming T,
- Geertz-Hansen HM,
- Christensen B,
- Sørensen ES,
- Juncker AS,
- Appenzeller-Herzog C,
- Ellgaard L
- Dean N,
- Zhang YB,
- Poster JB
The triglyceride locus on chromosome 18q11.2 harbors at least one functional variant, rs17259126, associated with a decreased expression of the regional TMEM241 gene, a novel gene for triglycerides in the Hispanic population.
HNF4A may regulate the expression of the TMEM241 gene by directly binding the rs17259126 regulatory site.
Our findings suggest that decreased transcript levels of TMEM241 contribute to increased triglyceride levels in Mexicans.