Loss of a Splice Donor Site at a ‘Skipped Exon’ in a Gene Homologous to Apolipoprotein(a) Leads to an mRNA Encoding a Protein Consisting of a Single Kringle Domain
Abstract Apolipoprotein(a) [apo(a)] and plasminogen are located in a gene cluster on chromosome 6 together with two other genes that share highly homologous 5′ flanking regions. We have isolated the human liver transcript derived from one of these genes, designated apo(a)-related gene C, that encodes a polypeptide of 132 amino acids composed of a secretion signal and a single kringle domain. Although the gene encodes several additional kringle domains, sequence analysis shows that the second kringle is incomplete in the derived mRNA because it lacks an apparent exon present in the gene. Analysis of genomic sequence shows that the predicted exon at this site lacks a canonical splice donor site. This results in “exon skipping” during maturation of the mRNA, causing a coding frame shift and the presence of premature stop codons.
Reprint requests to Richard M. Lawn, Falk Cardiovascular Research Center, Stanford University School of Medicine, 300 Pasteur Dr, Stanford, CA 94305-5246.
- Received August 9, 1994.
- Accepted October 18, 1994.
Apolipoprotein(a) [apo(a)] is the distinguishing protein moiety of lipoprotein(a) [Lp(a)], and elevated plasma Lp(a) levels are correlated with an increased risk of atherosclerosis (see References 1 through 3 for reviews). Apo(a) and the protease zymogen plasminogen are close homologues. Plasminogen is composed of five kringle domains and one trypsinlike protease domain, while apo(a) is composed of between 15 and 40 copies of a homologue of plasminogen kringle 4, a kringle 5–like domain, and an inactive protease domain. These regions contain between 75% and 94% DNA sequence identity to plasminogen.4 It is likely that apo(a) evolved from the plasminogen gene by initial gene duplication and subsequent deletion and internal duplication events. This included multiplication in the apo(a) gene of kringle domains, each of which is encoded by two exons. Recombination events have produced the range of number of kringle repeats now found in individual alleles of the apo(a) gene. The apo(a) and plasminogen genes are adjacent on human chromosome 6 and are flanked by two other highly related genes and/or pseudogenes.5 6 7 We recently presented preliminary evidence that one of these genes, the apo(a)-related gene C [apo(a)rg-C], is transcribed in human liver.6 We now report the cloning and sequencing of two alternatively spliced mRNAs transcribed from the apo(a)rg-C gene that are predicted to encode for a protein containing a single kringle domain.
PCR Amplification and Sequencing of Apo(a)rg-C cDNA Isolated From a Human Liver cDNA Library
A human liver cDNA library from Clontech Laboratories, Palo Alto, Calif, was screened by polymerase chain reaction (PCR). Briefly, a first round of PCR with coding orientation primer KC and reverse complement primer λ gt 10 forward was followed by a second round of PCR using 1:200 of the original PCR amplification product, nested at the 5′ end with coding orientation primer KCRC-3′ and reverse complement primer λ gt 10 forward. PCR amplification used 0.25 μmol/L primer, 2 mmol/L MgCl2, and 200 μmol/L dNTPs in 50 mmol/L KCl and 10 mmol/L tris(hydroxymethyl)aminomethane HCl, pH 8.3 (95°C, 1 minute; 55°C, 1 minute; 72°C, 5 minutes; 30 cycles). PCR products were gel electrophoresed, blotted, and hybridized to λ6, an apo(a) partial clone containing only kringle 4–like sequences4 in 5× saline/sodium phosphate/EDTA (SSPE), 5× Denhardt’s solution, and 0.5% sodium dodecyl sulfate (SDS) at 42°C for 16 hours and washed at 42°C in a solution containing 0.1× saline/sodium citrate (SSC) and 0.5% SDS.
Bands positively identified by hybridization were excised from a preparative agarose gel, cloned into a TA cloning vector (Invitrogen), and sequenced by the Sanger dideoxy method with use of the Sequenase kit (United States Biochemical). Unique sequence at the 3′ end of each clone was used to design additional PCR primers in the coding orientation that were used in two nested rounds of amplification with λ gt 10 forward.
Characterization of Reverse Transcriptase PCR Product From Apo(a)rg-C cDNA
RNA was prepared from human liver,8 and 1 μg of total RNA was reverse transcribed as described.9 Nested PCR was used to amplify reverse transcribed PCR products. Different sets of primers were used to amplify two alternatively transcribed PCR products that were unique at their 3′ ends. cDNA was amplified by PCR with primers 64 and 3RC (with the same PCR amplification conditions described above), and 1:200 of this PCR product was reamplified with primers KC and 4RC. A 1890-bp product was identified and cloned into the TA vector. A second 1460-bp PCR product was also isolated after amplification of cDNA by PCR (with the same PCR amplification conditions described above) with primers 64 and K5RC, followed by reamplification of 1:200 of these PCR products with primers KC and K5RC-2.
Cloning and Partial Sequencing of Apo(a)rg-C Genomic Fragments From Apo(a)rg-C Containing P1 Clone
A bacteriophage P1 clone10 containing part of the apo(a)rg-C gene was identified and characterized as described.6 Selected restriction digest fragments were subcloned into pBluescript (Stratagene) and sequenced as above.
Northern Blot Analyses
RNA was isolated from human liver,8 and mRNA was prepared by passage of 1.2 mg of total RNA through an oligo dT column (Pharmacia). Then 10 μg of poly (A)+ RNA was fractionated together with a 0.24- to 9.5-kb RNA ladder (Gibco BRL) on a 1% agarose denaturing gel containing formaldehyde (2.2 mol/L) and blotted. Filters were hybridized with both the 1890-bp and 1460-bp reverse transcriptase PCR products derived from both apo(a)rg-C transcripts. Both cDNAs were random primed labeled with [α-32P]dCTP with use of the Amersham Megaprime kit according to the manufacturer’s instructions, hybridized at 42°C in 30% formamide, 5× SSPE, 5× Denhardt’s solution, and 0.5% SDS and washed at 68°C in a solution containing 0.1× SSC and 0.5% SDS. Multiple tissue Northern blots (Clontech) were also hybridized and washed under the same conditions.
Confirmation That Two Alternatively Spliced mRNAs Originate From the Apo(a)rg-C Gene
Yeast artificial chromosome (YAC) clones were identified from the library of the Centre d’Etude du Polymorphisme Humain (CEPH), Paris, France,11 as previously described.6 DNA suitable for analysis by PCR was prepared as described.12
YAC clone 146 (CEPH designation 366H2) containing the apo(a)rg-C gene and clone 19 (CEPH designation 431B1) were used as positive and negative controls for YACs in which the apo(a)rg-C gene was either present or absent. Two sets of primers were identified from one transcript containing the kringle 5 domain (R and K5RC2 and R and S), and two additional sets of primers were identified from the other transcript containing the protease domain (V and 4RC and V and 3RC). We then amplified DNA by PCR from each of the two YAC clones using each of the four primer combinations with 0.25 μmol/L primers, 2 mmol/L MgCl2, and 200 μmol/L dNTPs in 50 mmol/L KCl and 10 mmol/L tris(hydroxymethyl)aminomethane HCl, pH 8.3 (95°C, 1 minute; 60°C, 1 minute; 72°C, 5 minutes; 35 cycles).
PCR Primers for Amplification of Library Clones and Reverse Transcriptase PCR
PCR primers for amplification of library clones and reverse transcriptase PCR were as follows: λ gt 10 forward, AGCAAGTTCAGCCTGGTTAAG (reverse complement orientation); KC, CTGAAGTCAGCACCGACTGAGACAGGGCCTTCT (coding orientation); KC 3′, TGCAGGAGTGCTACCACAGTAA (coding orientation); 64, CTGAGCCAGTGGCATGGGTCTC (coding orientation); 4RC, GGGACACTGAAATAACCTATTTTA (reverse complement orientation); 3RC, CATGTTAGAATAGGAATTTGACTG; K5RC, TTTCCAGACCTGCCCGTGGAC (reverse complement orientation); K5RC2, CGTGGACTTGTCCTGGAGT (reverse complement orientation); R, ACTCTATGTTTGGGAATGG (coding orientation); S, TTTGTCCTTGCGATAGTTTGC (reverse complement orientation); and V, GCTTGGACAACGGGATGAAA (coding orientation).
Identification of Apo(a)rg-C cDNA Clones
We have previously described a 292-bp PCR product originating from the apo(a)rg-C gene that was isolated from RNA from two human liver samples.6 It constituted a region of 98% identity to the 5′ untranslated region of apo(a), followed by a region of 88% DNA identity to the initial coding region of apo(a). The initial coding region of the apo(a)rg-C gene was also homologous to the secretion prepeptide and kringle 4 of plasminogen. An initial attempt to isolate full-length cDNA clones from a human liver cDNA library by hybridization with a specific oligonucleotide (KC) proved unsuccessful, and hybridization with less specific cDNA probes yielded only plasminogen clones. A PCR strategy was designed to obtain the 3′ end of the cDNA taking advantage of a unique region of sequence at the 5′ end of the first kringle 4–like domain (PCR primer KC6 ). An aliquot of the cDNA library was amplified with the gene-specific KC primer and a primer in the λ vector sequence. The PCR products were analyzed by electrophoresis and Southern blot hybridization to a kringle-encoding probe (λ6) derived from an apo(a) cDNA clone.4 It had previously been shown that λ6 hybridized to apo(a)rg-C kringle 4–like domains contained within a partial genomic clone in a P1 bacteriophage vector.6 Nested PCR was performed with a primer that was 3′ to the KC primer together with the same λ arm primer. When we used this technique it was possible to identify several clones of approximately 1 kb and to generate sufficient quantity of products to allow subcloning and sequencing. Sequence analysis confirmed that these clones were unique and contained considerable homology with kringle 4–like domains from both apo(a) and plasminogen.
To isolate clones containing further 3′ sequence, the same general strategy was used. Unique sequence was identified at the 3′ end of the longest clone, and coding orientation primers were designed to allow amplification by nested PCR of clones containing additional 3′ sequence. Overlapping clones were obtained, giving full-length cDNA sequence. Two distinct cDNAs containing several copies of kringle 4 homologues were identified whose sequences are identical for 1504 bp from their 5′ ends, after which their sequences diverge (Fig 1A⇓ and 1B⇓). The apo(a)rg-C cDNA sequence from one transcript contains homologues of four complete and one incomplete kringle 4–like domains, whereas the other transcript encodes homologues of three complete and two incomplete kringle 4–like domains (Fig 2⇓). After the point of divergence, one clone contains the second exon of a kringle 4–like domain followed by a protease-like domain, with regions of 50% DNA identity to apo(a), and a polyadenylation signal AATAAA 25 bases from the beginning of a poly (A) tail [plasminogen and apo(a) kringles are encoded by two exons each]. By comparison, 3′ to base 1504, the other clone contains the first exon of a kringle 5–like domain, with 93% DNA identity to apo(a), instead of the second exon of a kringle 4 domain, followed by 60 bp of unrecognizable sequence. Although this clone ends with AAAAA, no obvious polyadenylation signal is present, suggesting that it may not be complete. However, the size of RNA transcripts detected by blot hybridization (see below) indicates that little if any further sequence is present in the message. The presence of these two cDNA forms was confirmed by use of reverse transcriptase PCR with primers based on sequence of the cDNA library–derived clones to isolate and sequence both forms from human liver RNA.
Predicted Apo(a)rg-C Protein Sequence
The apo(a)rg-C cDNA sequence contains several kringle domain homologues, but its translation predicts that it encodes a secreted protein containing a single kringle domain. If secretion signal cleavage occurs at an analogous location to apo(a), preceding a glutamic acid residue, apo(a)rg-C would contain a secretion signal peptide of 19 amino acids followed by a kringle domain of 72% amino acid identity and 83% DNA identity to the first kringle of apo(a). In contrast to apo(a), the apo(a)rg-C transcript contains several kringle domains that are not identical tandem repeats but instead have approximately 90% pairwise DNA conservation. Some of the encoded kringle domains are incomplete, and the second kringle domain that is present in both transcripts lacks the portion that would be encoded by the first of its two exons.
Deletion of part of the second kringle domain of apo(a)rg-C has profound implications for its predicted protein product. Mapping and sequence analysis of the apo(a)rg-C genomic clone revealed the presence of an expected exon at this location that lacks a canonical splice site and is therefore present in the gene but absent in the mature message. Confirmation of the position of the “skipped exon” within the gene was obtained by complete sequence identity of the preceding exon in the genomic clone with the second exon of the first kringle domain in the mRNA. Sequence obtained from the genomic clone also allowed localization of introns surrounding the skipped exon, which is consistent with the location and sequence of analogous splice sites in the plasminogen gene.13 There is exact conservation of the last six nucleotides of the homologous apo(a)rg-C and plasminogen exons, but a nucleotide replacement then destroys the GT dinucleotide that invariably marks the beginning of introns (Fig 3A⇓). Hence, exon skipping during the maturation of RNA, rather than a deletion in the gene, may be the reason that these sequences are absent from the mature apo(a)rg-C mRNA (Fig 3B⇓). Absence of this exon results in a coding frame shift and the presence of a stop codon shortly thereafter. Therefore, the predicted protein product of the apo(a)rg-C gene contains a signal peptide of 19 amino acids and a single kringle retaining the six canonical cysteine residues and an overall amino acid identity of 72% to the first kringle of apo(a) and 60% identity to plasminogen kringle 4.
Two mRNA Species Result From Alternative Splicing of the Apo(a)rg-C Gene
We have previously shown that the cloned genomic sequence of apo(a)rg-C gene sequence matched the sequence of a 292-bp product derived from reverse transcriptase PCR of human liver RNA.6 Further sequence analysis of the partial genomic clone revealed additional correspondence to RNA sequence derived from both cDNA libraries and PCR of liver RNA from two individuals. Both reverse transcriptase PCR and cDNA library screening procedures yielded two related species of transcript. Since both RNAs are identical for 1504 nucleotides and diverge at a predicted exon boundary, it is likely that they originate from alternative splicing of the same primary transcript. Confirmation of this fact was obtained by PCR analysis of a YAC human genomic clone containing the entire apo(a)rg-C gene plus a portion of the 3′ end of the apo(a) gene, but excluding other members of this gene family.6 PCR primer pairs were designed to amplify either kringle 5– or protease-like sequences from each species of apo(a)rg-C transcript. One primer of each pair contained at least 60% mismatch to apo(a) cDNA sequence. Fig 4A⇓ shows that each of the primer pairs yields PCR products of the predicted size from the YAC clone containing the apo(a)rg-C gene but not from the clone containing only the apo(a) gene.
Apo(a)rg-C Gene Expression
The 1890-bp and 1460-bp reverse transcriptase PCR products derived from both RNAs were purified and used as hybridization probes. The probes hybridized to plasminogen mRNA, which is present in substantial amounts in liver. On blots containing RNA from liver, the band with by far the greatest intensity originated from plasminogen at 2.9 kb, but also present were two additional bands at 2.1 kb and at 1.8 kb that were compatible with apo(a)rg-C transcripts (Fig 4B⇑). At the level of Northern blot detection, apo(a)rg-C transcripts were not found in kidney, heart, brain, placenta, lung, skeletal muscle, pancreas, spleen, thymus, prostate, testis, ovary, small intestine, colon, or peripheral blood leukocytes. We have not eliminated the possibility that apo(a)rg-C may be more highly expressed in other tissues or in certain pathological states. In addition, currently available antibodies to apo(a) and plasminogen may not offer the sensitivity to detect low levels of this predicted protein in human plasma.
The apo(a)rg-C gene is located approximately 40 kb from the apo(a) gene within the apo(a)/plasminogen gene cluster on human chromosome 6 (6q26-27). We have shown previously by characterization of genomic YAC and P1 clones that this gene contains several kringle 4–like domains and, like apo(a), lacks homologues of the preactivation peptide and first three kringles of plasminogen.6 Here we show that the apo(a)rg-C gene is transcribed in human liver. Interestingly, the apo(a)rg-C mRNA departs from homologous gene sequences at a splice donor site, which may result in the presence of a “pseudo-exon” that is skipped during maturation of its RNA. This is the predicted cause of the deletion of part of its second kringle domain and the frame shift that results in stop codons shortly thereafter. Although the mRNA contains a premature in-frame stop codon, it still retains some stability and is not completely degraded. Two forms of apo(a)rg-C mRNA are produced, one containing a kringle 5–like domain and the other a protease-like domain. Since the alternative splicing occurs 3′ to the stop codon, both forms of message encode an identical polypeptide containing a secretion prepeptide and single kringle domain of 113 amino acids, containing each of the six cysteine residues that form the three disulfide bonds that stabilize these structures plus one additional cysteine.
It is likely that members of the closely linked apo(a)/plasminogen gene family arose by duplication of an ancestral plasminogen gene, followed by internal deletions and duplications of sequence blocks and single base changes, creating the four genes and/or pseudogenes that are clustered on human chromosome 6. Plasminogen contains five kringles and one protease domain, while individual alleles of apo(a) contain between 14 and 40 kringles, many of which are exact tandem repeats.4 14 Analysis of genomic DNA by hybridization suggested that apo(a)rg-C contains several kringle 4–like domains.6 Despite its presence in the gene, we now show that partial deletion of one kringle leads to a frame shift that reduces the coding potential of apo(a)rg-C to a single kringle domain. Interestingly, it is predicted that this occurs because of the presence of a splice site mutation rather than the loss of the corresponding exon from the gene. Pre-mRNA splicing of eukaryotic nuclear transcripts entails cleavage at the 5′ GU dinucleotide of an intron followed by formation of a lariat intermediate and subsequent cleavage and ligation at the AG dinucleotide at the 3′ end of the intron (see Reference 15 for review). Base substitution results in a GU to AU replacement at the splice donor site after the first predicted exon of the second kringle of apo(a)rg-C RNA (Fig 3A⇑). It has been shown that in other genes, mutations of the GU dinucleotide allow cleavage at the splice donor site but result in formation of a lariat intermediate with an abnormal RNA branch that inhibits cleavage at the splice acceptor site.15 To bypass the abnormal splice donor site, the properly formed lariat from the preceding intron may be used, resulting in omission of an exon and its adjacent introns from the mature mRNA (Fig 3B⇑). All of the details of RNA splicing are not known, and several models have been proposed. In particular, the “exon definition” hypothesis16 predicts that mutation at a splice donor site, as seen here, would adversely affect the assembly of splicing intermediates and result in exon skipping. A similar type of G to A mutation in a splice donor site resulting in exon skipping has recently been described in the ferrochelatase gene of a patient with erythropoietic protoporphyria and fatal liver failure.17
The DNA identity between apo(a)rg-C and apo(a) kringles varies between 82% and 93%, and the close homology between the two genes warrants caution concerning the design of PCR primers for amplification of kringle domains from either gene from genomic DNA. This observation is significant in studies of polymorphisms of the apo(a) gene in which choice of primer sequence and PCR conditions is more important than was previously appreciated, since it is likely that most primer pairs situated in kringle domains will amplify sequence from both apo(a) and apo(a)rg-C genes.
Apo(a) plasma concentration varies widely in the human population. Posttranslational maturation of the protein and its assembly into the Lp(a) particle, as well as factors influencing apo(a) gene transcription, appear to account for much of this variation. It is interesting to speculate that splice site alterations of the type we describe for apo(a)rg-C might occur in some alleles of the apo(a) gene. We are currently examining the possibility that such mutations may account for some of the transcript positive or negative “null” alleles of apo(a) that exist.
Although no protein has been described that is composed of only a single kringle domain, a number of proteins contain one or more kringles plus other domains. These include prothrombin, tissue- and urokinase-type plasminogen activators, factor XII, hepatocyte growth factor, plasminogen, and apo(a). It is now well established that kringles correspond to autonomous structural/functional units that mediate protein-protein interactions.18 19 20 It has been proposed that the fibronectin type II domain is related to kringles and may also function as a protein-binding site.21 22 The bovine seminal fluid protein PDC-109 is made up of only two such domains.21 22 Its physiological role is uncertain, although it has been shown to stimulate the release of gonadotropins from pituitary cells.23 The kringles of apo(a) have been found to mediate binding to fibrin, fibronectin, tetranectin, apolipoprotein B-100, and undefined components of cell surfaces and extracellular matrix, mediating the activity of the Lp(a) lipoprotein particle.1 2 3 One could speculate that a single kringle might bind to similar substrates and possess signaling or competitive functions. The recent finding of Folkman and colleagues24 of the profound antiangiogenic and antimetastatic activity of the kringle 1-4 fragment of plasminogen serves notice that kringles may possess a wide variety of functions. In light of this activity of tumor-associated kringle domains, we have embarked on a study of the effect of the apo(a)rg-C protein on angiogenesis. With colleagues David Grainger, Paul Kemp, and James Metcalfe, University of Cambridge, we have recently detected the apo(a)rg-C transcript in 6 of 13 breast tumor samples.
This research was supported by National Institutes of Health Program Project grant HL-48638 and University of California/Tobacco-Related Disease Research grant 2RT0339 to Dr Lawn. Dr Byrne is supported by traveling fellowships from the Peel Medical Trust and Parke-Davis/Cambridge University, UK.
Utermann G. The mysteries of lipoprotein (a). Science. 1989;246:904-910.
Scanu AM, Fless GM. Lipoprotein (a): heterogeneity and biological significance. J Clin Invest. 1990;85:1709-1715.
Scanu AM, Lawn RM, Berg K. Lipoprotein(a) and atherosclerosis. Ann Intern Med. 1991;115:209-218.
McLean JW, Tomlinson JE, Kuang WJ, Eaton DL, Chen EY, Fless GM, Scanu AM, Lawn RM. cDNA sequence of human apolipoprotein (a) is homologous to plasminogen. Nature. 1987;300:132-139.
Malgaretti N, Acquati F, Magnaghi P, Bruno L, Pontoglio M, Rocchi M, Saccone S, Della Vale G, D’Urso M, LePaslier D, et al. Characterization by yeast artificial chromosome cloning of the linked apolipoprotein(a) and plasminogen genes and identification of the apolipoprotein(a) 5′ flanking region. Proc Natl Acad Sci U S A. 1992;89:11584-11588.
Byrne CD, Schwartz K, Meer K, Cheng J-F, Lawn RM. The human apolipoprotein (a)/plasminogen gene cluster contains a novel homologue transcribed in liver. Arterioscler Thromb. 1994;14:534-541.
Magnaghi P, Citterio E, Malgaretti N, Acquati F, Ottolenghi S, Taramelli R. Molecular characterization of the human apo(a)-plasminogen gene family clustered on the telomeric region of chromosome 6 (6q26-27). Hum Mol Genet. 1994;3:437-442.
Kawasaki ES. Amplification of RNA. In: Innis MA, Gelfand DH, Sninsky JJ, White TJ, eds. PCR Protocols: A Guide to Methods and Applications. New York, NY: Academic Press, Inc; 1991:21-27.
Sternberg N. Bacteriophage P1 cloning system for the isolation, amplification and recovery of DNA fragments as large as 100 kilobase pairs. Proc Natl Acad Sci U S A. 1990;87:103-107.
Albertsen HM, Abderrahim H, Cann H, Dausset J, Le Paslier D, Cohen D. Construction and characterization of a yeast artificial chromosome library containing seven haploid human genome equivalents. Proc Natl Acad Sci U S A. 1990;87:4256-4260.
Maniatis R, Fritsch E, Sambrook J. Molecular Cloning: A Laboratory Manual. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory; 1989.
Petersen T, Martzen MR, Ichinose A, Davie EW. Characterization of the gene for human plasminogen, a key proenzyme in the fibrinolytic system. J Biol Chem. 1990;265:6104-6111.
Lackner C, Boerwinkle E, Leffert CC, Rahmig T, Hobbs HH. Molecular basis of apolipoprotein(a) isoform size heterogeneity as revealed by pulsed-field gel electrophoresis. J Clin Invest. 1991;87:2153-2161.
Robberson BL, Cote GJ, Berget SM. Exon definition may facilitate splice site selection in RNAs with multiple exons. Mol Cell Biol. 1990;10:84-94.
Trexler M, Patthy L. Folding autonomy of the kringle 4 fragment of human plasminogen. Proc Natl Acad Sci U S A. 1983;80:2457-2461.
Van Zonneveld AJ, Veerman H, Pannekoek H. Autonomous functions of structural domains on human tissue-type plasminogen activator. Proc Natl Acad Sci U S A. 1986;83:4670-4674.
Manjunath P, Sairam MR. Purification and biochemical characterization of three major acidic proteins (BSP-A1, BSP-A2 and BSP-A3) from bovine seminal plasma. Biochem J. 1987;241:685-692.