| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Brief Reviews |
From the Department of Pathology (A.P.J.J.B., E.L., M.J.A.P.D.), Cardiovascular Research Institute Maastricht (CARIM), University of Maastricht; the Department of Population Genetics (T.A.), Cardiovascular Research Institute Maastricht (CARIM), University of Maastricht; the Division of Biopharmaceutics (J.K.), Leiden/Amsterdam Center for Drug Research, Gorlaeus Laboratories, Leiden; and the Department of Medical Biochemistry (A.J.H.), Academic Medical Center, University of Amsterdam, The Netherlands.
Correspondence to Ann-Pascale Bijnens, Department of Pathology, P. Debeyelaan 25, 6229 HX Maastricht, The Netherlands. E-mail ann-pascal.bijnens{at}path.unimaas.nl
| Abstract |
|---|
This review discusses some critical issues in the methodology, analysis, and interpretation of gene expression studies using vascular specimens from animals and humans. Our analysis demonstrates that future studies may benefit from recent developments in statistical and bioinformatical analysis methods to exploit the full potential of transcriptomics data.
Key Words: atherosclerosis gene expression genetically altered mice pathology vascular biology
| Introduction |
|---|
Gene expression experiments produce complex and large data sets, and many investigators are not experienced in the analytical steps needed to convert tens of thousands of data points into reliable and interpretable biologic information. It is important to realize that several prerequisites need to be fulfilled to select the appropriate genes to meet the objectives described above. The experimental design needs to fulfill some minimal criteria to obtain meaningful results,3 such as a clear description of analyzed samples, and sample sizes representative of the population and its intrinsic variance. Furthermore, the analysis strategy for a genome-wide experiment should be determined in light of the overall objective of the study.1 For instance, cluster analysis (ie, a method for partitioning samples into groups on the basis of the similarities and differences among their gene expression profiles) can help in generating hypotheses but does not provide statistically valid quantitative information about the degree of differential expression between classes.1,2
In this review, we give an overview and critical analysis of the design, analytical approaches, and outcome of published data sets of atherosclerosis. Focusing on murine and human atherosclerosis, we discuss the possibility to integrate information obtained from different expression studies. We exemplify shortcomings in available data by comparing published expression data of seven CC chemokine family members in murine and human atherosclerosis. Our analysis reveals the methodological limitations in the published studies and highlights challenges for future genomic profiling.
| Approaches for Gene Expression Profiling of Atherosclerosis Using Vascular Samples |
|---|
|
Most studies have been performed on small numbers of biological replicates, resulting in a low statistical power for detecting differentially expressed genes. Although the value of studies with small sample sizes should not be underestimated, recent studies8 clearly demonstrate that increasing the sample size increases the statistical power and decreases the error rate. If the number of samples is not a true representation of the population and its intrinsic variance, the distribution of parameters will be biased toward those specific for the type of samples collected. For the same statistical power, fewer individuals are needed when using inbred animals compared with that required using outbred human subjects. An approach to using fewer human samples, but maintaining the same statistical power, is to select samples with lower variability in the gene expression profile eg, using samples from the same site in the vascular bed, or from donors of the same gender, similar age, or clinical record. Finding consistent data within an outbred population can be among the most convincing kinds of evidence. Stronger arguments can be made against the validity of studies performed in inbred animals, which are theoretically equivalent to only one individual within an outbred population. This is exemplified by the variations in expression levels of inbred strains with different genetic backgrounds.9
In the majority of gene expression profiling studies, the analysis has been performed on entire vessel segments1025 to get information about all the molecular processes involved in the development of atherosclerosis. A limitation of this approach is the lack of insight into the underlying reason for observed differences in gene expression levels. These differences may reflect the changed composition of the vessel wall during atherosclerosis (eg, thinning of the medial smooth muscle cell layer) result from the presence of different cell types (eg, infiltration of T-lymphocytes), or reflect a change in the gene expression profile of cells or a subpopulation of cells caused by a pro-atherosclerotic environment (eg, differentiation of macrophages to foam cells).
To circumvent the problem of analyzing complex tissues made up of multiple cell types, 3 different approaches have been used that correct for differences in plaque composition by the isolation of relatively pure cell populations. In 3 studies,7,26,27 macrodissection was used to separate the smooth muscle cell (SMC)-rich fibrous cap,26,27 media,7,26 and nonatherosclerotic intima.26 To this end, the adventitia was trimmed, the endothelium was scraped off, and the fibrous cap was dissected from the necrotic core and shoulder regions. In 2 studies,10,28 laser capture microdissection was used to dissect SMCs or macrophages from whole mount specimens for subsequent cell-specific RNA isolation. A drawback of both macrodissection and laser capture microdissection is the low tissue and RNA yield. This necessitates pooling of samples and/or (several rounds of) amplification techniques. Pooling of samples limits the possibilities for downstream statistical analysis (see "Key Features of Gene Expression Studies in Atherosclerosis"). Pros and cons of amplification have been discussed in several recent articles and include issues such as reproducibility29 and effects on magnitude of differential expression30 caused by amplification. An alternative approach to obtain relatively pure cell populations and to remove cell products is to culture cells after isolation from entire vessel wall samples. An advantage of this approach is that cell type-specific transcripts are amplified in culture omitting the need for pooling or amplication procedures. However, a disadvantage is that the in vitro culture may cause a shift in the transcriptome. As a result, the expression profiles of cultured cells may not be entirely representative of the in vivo situation. In the current review, we do not aim to provide an overview of articles using in vitro cultures to study atherosclerosis.
| Platforms Applied for Expression Profiling of Atherosclerosis |
|---|
| Key Features of Gene Expression Studies of Atherosclerosis |
|---|
|
Design
To assess the design, we scored the published studies based on following parameters: (1) morphological data of the vessel samples; (2) general donor data (gender and age); (3) clinical data (eg, diabetes, blood pressure, medication); (4) comparison of samples with similar cellular composition; and (5) profiling of individual plaques. This analysis shows that gene expression profiling studies over the past 6 years have evolved from experiments assessing differences in expression levels of several hundreds of genes between pools of poorly characterized samples, to profiling of thousands of genes in well-characterized individual samples from donors with detailed information about age, gender, and, to a lesser extent, clinical data. Human plaques have been classified in different ways: based on the American Heart Association guidelines,13,18,19,21,23,25,28 based on pulse wave velocity measurements,20 or after macroscopic inspection.16,22 However, a detailed microscopic characterization of human samples has only been given in 2 studies.18,28 Consequently, this lack of transparency in the morphology of analyzed samples hinders extrapolation and comparison of gene expression results. In only a minority of studies,10,2528 the expression profiles of samples with a uniform cell content have been compared. However, one of these studies compares the transcriptome of SMC-enriched and macrophage-enriched cell populations28 and does not exploit the advantage of isolating relatively pure cell populations. Last, but not least, we observed an evolution from analyzing pools of samples toward individual samples, which has a positive effect on the possibilities for downstream statistical analysis.
Statistical and Bioinformatical Analysis
In evaluating analytical procedures, we have taken into account various statistical and bioinformatical issues, as well as the need for validation of array results. To identify genes that are differentially expressed in the studied conditions, the degree of differential expression was evaluated in different ways (Table 1). Most studies ranked the magnitude of differential expression based on fold-change in expression level.1012,1416,18,19,27,28 Although this method is simple and intuitive, fold-change does not address the reproducibility of the observed difference and cannot be used to determine the statistical significance (for a review, see Draghici et al36). To address this, comparison statistics (eg, t test,19,21,13,15 ANOVA,17 Wilcoxon ANOVA19) need to be used to assign a confidence level to the differential expression. These statistics require replicates and use the variability within the replicates to assign a probability value that indicates the probability of incorrectly classifying a gene as differentially regulated. It needs to be considered that expression levels of thousands of genes are analyzed simultaneously in a microarray. This implies that a subset of genes will always be classified as "significantly different" simply by chance. To reduce the chance of false-positives, the probability value needs to be adjusted for multiple testing error (for a review, see Draghici et al36). At present, however, there is no consensus about the best method to correct for multiple testing in microarray experiments. To this end, 5 recent studies have used permutation-based methods: significance analysis of microarrays37 (SAM),13,21 custom-made analysis algorithms including permutations to study multiple sets of time-course microarray data,9,13 or by comparison with randomized data sets to quantify false-positive frequency in the data sets.26
The statistical techniques discussed above aim to identify genes that are differentially expressed on a gene-by-gene basis. An alternative method of analyzing microarray data sets is to exploit the correlation in expression patterns between genes that perform similar functions or belong to the same biological pathway. To this end, various multivariate analysis methods have been developed to identify patterns of gene expression in microarray data.38 In atherosclerosis research, hierarchical clustering,17 K-means clustering,22 self-organizing maps,11 principal component analysis,13 or custom-made methods12 have been used. This "unsupervised" clustering38 is performed without previous knowledge of groups and may lead to the identification of genes that share previously unknown common expression patterns (identification of novel subsets of genes) or the discovery of new subtypes of disease (identification of subsets in experimental groups). This type of analysis has not yet led to follow-up studies to confirm the proposed coregulation of genes or novel disease subsets in atherosclerosis. Clustering procedures have also been performed on differentially expressed genes (eg, on genes with P<0.05 using Student t test19) instead of on complete expression data sets. When clustering is used on a selected data set, the added value of cluster analysis is limited because it will yield the same results as in the previous selection procedure and will not provide additional information about differences in expression patterns.19
Apart from these studies, 2 studies13,22 classified vessel samples according to location22 or disease state13,22 based on the expression levels of classifier genes. Classifier genes are a fixed subset of "informative genes" chosen based on the correlation of their expression level with a class distinction and used to make a prediction about a new sample on the basis of the expression level of these genes. The statistics associated with sample classification are quite a bit different than the statistics associate with the comparison of gene expression levels. A critical overview2 and a typical, well-documented example from the cancer field1 are added as references.
In atherosclerosis, classifier genes have been identified after analysis of a large series of samples (7 to 32 individual samples of the same artery per experimental group; Table 3). The validation has been performed using out-of-sample cross validation22 or independent (murine and human) test sets.13 In contrast to comparable studies in cancer research (eg, predicting the survival from breast or prostate cancer),39,40 these studies have no direct clinical application. However, they indicate the existence of expression profiles in different stages or sites of atherosclerotic plaque development, which are conserved despite inter-individual22 or even inter-species13 variations.
|
To unveil the biological relevance of gene expression profiles, biological information from differentially expressed genes need to be obtained. To this end, literature mining (eg, using NCBI, Gene Ontology [GO], Kyoto Encyclopedia of Genes and Genomes [KEGG], or Biocarta databases) can be performed. Several atherosclerosis studies with a limited number of differentially expressed genes10,11,1417,23,25,27,31 have been analyzed at the individual gene level (eventually followed by grouping of functionally related genes19). To overcome the enormous efforts that are needed to perform literature mining on a gene-by-gene basis, especially across a data set of several thousands of genes, tools have been developed to link genomics data to literature data in an automated way (for a review see Curtis et al41). As demonstrated in Table 1, GO-based analyses are widespread. GO is a manually curated database using a standardized vocabulary of terms describing biological processes, cellular components, and molecular functions of genes. In addition, it provides a hierarchical structure for organizing genes into biologically relevant groups. In 7 studies,12,13,18,2022,28 differentially expressed genes were categorized based on GO terms. However, to identify genes that warrant further study, GO-based enrichment procedures are more appropriate to identify GO terms that are overrepresented in a set of differentially expressed genes (eg, in King et al21). In these enrichment procedures several statistical methods41 can be used to calculate whether more genes from a particular pathway or classification are differentially expressed as would be expected by chance (expressed as probability value, z-score, or odds ratio41). Important drawbacks of GO-based procedures are: (1) results are restricted to the genes for which information is available in the databases (sometimes only half of the genes that are represented in a microarray); and (2) abundantly studied genes may be associated with a variety of GO terms, including terms unlikely to be relevant to the studied process. As an example, we compared the GO terms of human, mouse and rat CCL2 using the "AmiGO" tool (http://www.godatabase.org/). The data in Table I (please see http://atvb.ahajournals.org) exemplify the bias that can be introduced when categorizing genes based on GO terms. Whereas the GO database indicates that human CCL2 is involved in 11 biological functions (including "humoral immune response"), the function of mouse CCL2 is not described at all and the function of rat CCL2 is limited to "inflammatory response." Even more striking is the discrepancy in cellular components of CCL2 according to the GO database (extracellular space for human and murine CCL2, but cytoplasmatic for rat CCL2) and the low degree of similarity in molecular function between CCL2 among various species. This comparison illustrates that GO-based analysis procedures need to be carried out and interpreted with caution.
Moreover, it should be obvious that GO analysis can only be meaningful when performed on a statistical valid data set. For instance, the results of one particular study describing mouse strain-specific differences in vascular wall gene expression9 need to be interpreted with caution because GO analysis was performed on genes that were determined to be differentially expressed using a false discovery rate of nearly 50%.
As statistical and bioinformatical analysis techniques improve, detection of smaller changes in expression is becoming feasible. However, it is still unclear how statistical and bioinformatical significance relates to biological relevance. Subtle but consistent changes in expression of a group of genes with a related function may result in significant changes at the pathway level. However, experimental validation is especially important for small changes in expression levels before concluding that these changes contribute to biologically relevant differences. Taking this into account, validation of "in silico" pathways consisting of genes with slight differences in expression level, is required for a valid interpretation of the biological significance of biostatistical results. In this respect, one should be aware that differences in RNA transcript levels do not necessarily reflect differences at the protein or functional level.
Moreover, clustering methods and literature network analysis may be biased because they are based on coregulation and copublication, respectively, and that a biological interaction can only be demonstrated using "wet laboratory" experiments (eg, by coimmunoprecipitation to prove physical interactions, or by reporter assays for transcriptional activity). However, as summarized in Table 1, recent articles that apply state-of-the art statistical analysis to identify subtle differences in expression level do not validate these results on the same or independent samples. Neither do they validate results at the RNA level using independent methods, nor at the protein or functional level. Thus, they have failed to show the biological relevance of the observed differences at the RNA level.
Availability of Data Sets
Finally, our analysis reveals that data from only 3 expression studies expression are available in a public repository for microarray data (GEO data sets GSE1560,9 GSE420,20 and GSE214321). Even more striking, GSE1560 contains the expression data of only two-thirds of the experimental groups (ie, for all C3H/HeJ and C57BL/6 *groups, but not of the apolipoprotein E/tm1Unc groups). Moreover, GSE2143 does not include a clear description of the samples, which makes the expression data impossible to interpret. As is illustrated in the section entitled "Integration of Genome-Wide Expression Data Sets," this lack of public data compromises our ability to gain insight into the molecular basis of atherosclerosis.
Together, based on our evaluation of gene expression studies in atherosclerosis, we consider 5 critical issues important in determining the quality of gene expression experiments using whole vessel samples: (1) transparent and detailed description of analyzed samples; (2) use of individual samples representative of the population and its intrinsic variance; (3) comparison statistics correcting for multiple testing error; (4) experimental validation of the biological significance of differentially expressed genes; and (5) public availability of data sets.
| Gene Expression Profiling of Murine Atherosclerosis |
|---|
A limiting factor for gene expression studies in mice is the small amount of RNA that can be extracted from murine vessels. Therefore, 3 to 30 samples per experimental condition are pooled. RNA is extracted from entire aortas or aortic arches (thus including diseased and nondiseased areas) rather than from individual plaques, or T7 polymerase-based linear amplification protocols are used before microarray analysis (Table 2).
Array studies analyzed the gene expression profiles of arterial wall samples of mice of different ages or those fed various diets,9,1113 after cytomegalovirus infection,14 or after maternal hypercholesterolemia15 (Table 2). Remarkably, the primary objective of genome-wide studies in mice has been to identify individual differentially expressed genes whose functions are well-described, as opposed to unraveling novel genes/pathways. Genes that were repeatedly reported to be differentially expressed were chemokines,1115 chemokine receptors,11,13,15 and cathepsins.11,15 Based on information obtained from GO and other databases, differentially expressed genes represented biological themes such as inflammation,12,13 matrix degradation,12,13 and ossification13 (which is in agreement with our current knowledge that these processes have an important role in the development of this disease), but also less expected pathways such as carbohydrate metabolism.13 Only one study used cDNA representational difference analysis (RDA)24 to analyze the expression in the ascending aorta and arch in apoE/ and LDLR/ mice. This yielded mostly ESTs and uncharacterized genes. This suggests a major contribution of genes that were novel at the time when the arrays were performed.
| Gene Expression Profiling of Human Atherosclerosis |
|---|
Four studies used "selected arrays" to evaluate vascular expression levels of a few hundred genes,10,16,17,27 selected because of their involvement in apoptosis for example.10,16,27 These arrays do not provide a genome-wide view but give the expression profile of the selected subset of genes. Therefore, they are suited to identify individual genes differentially expressed in atherosclerosis (eg, death-associated protein kinase,10 Egr-1 [early growth response gene]10,16,27 and Egr-1-inducible genes10,16,27), but not to provide an unbiased expression profile.
Six articles1822,28 used genome-wide arrays to screen the expression of genes in human atherosclerosis (Table 3). Because these studies assay tens of thousands genes simultaneously, they provide an unbiased profile of expressed genes, which may lead to insights into previously unknown molecular interactions. However, most of these studies aimed to identify the expression level of individual genes, and not pathways.1820 In these studies, GO analysis suggested that the differentially expressed genes were involved in inflammation,19,21,22 cell turnover,19,21,22 matrix degradation,18,19 lipid metabolism,18,19 coding for matrix proteins,20 or originated from SMC proliferation and dedifferentiation.21 Thus, to date, gene expression studies have merely validated the differential expression of genes and pathways known to be involved in atherosclerosis,4245 and have yet to fully exploit the power and possibilities of identifying novel players (and eventually novel pathways) underlying atherosclerosis.
To compare gene expression patterns from different parts of the atherosclerotic vessel wall, laser capture microdissection10,28 and macrodissection26,27 have been used to isolate relatively pure cell populations from whole mount specimen (Table 3). As mentioned, these techniques help to overcome one of the shortcomings of the whole mount approach as it corrects for the cellular composition of a sample. Using a "selected arrays" of genes involved in apoptosis, Martinet et al10 compared the expression of apoptosis-related genes between microdissected medial SMCs from atherosclerotic plaques and nondiseased mammary arteries, which revealed no differences between these SMC subpopulations. Yet the expression profile of genes involved in apoptosis in medial SMCs was shown to differ from the profile of the SMCs in the fibrous cap.10,16,27 A cluster of stress-responding genes was shown to be upregulated in the fibrous cap. Adams et al26 showed more consistent differences in gene expression between the intimal and medial SMC than between the cap and the media, because the adjacent nonatherosclerotic intima appears much more media-like than does the cap.
Expression profiling of vascular intimal SMC-rich and macrophage-rich shoulder regions28 revealed differences in the expression level of genes involved in a variety of biological processes, such as cell signaling, structure, and metabolism. Several, but not all, of these differences were also present in PMA-stimulated versus nonstimulated THP-1 macrophages in vitro. Most remarkable was the 16-fold increased expression level of 3-hydroxy-3-methylglutaryl (HMG) coenzyme A (CoA) reductase in macrophage-rich areas compared with vascular intimal SMCs. This suggests that the plaque stabilizing effect of statins may result from the direct effect of these drugs on HMG CoA reductase expression in plaque macrophages compared with intimal SMCs.
Apart from arrays, other techniques have been used to analyze human atherosclerosis. One study used RDA to examine differential gene expression between normal and atherosclerotic human vessels.25,31 In these studies, a high proportion (33% to 55%) of differentially expressed sequences represented genes that have not yet been annotated or functionally characterized. The article by Faber et al23 is the only one to our knowledge that reports the gene expression profile of atherosclerotic plaques with a thrombus (Table 3). In this study, SSH is used to make an inventory of genes that are differentially expressed in whole-mount stable versus thrombus-containing plaques. Several genes that had not previously been linked to atherosclerosis (including perilipin23 and a previously unidentified gene named vasculin46) were identified as being differentially expressed.
In conclusion, over the past 6 years gene expression profiling of human atherosclerosis has lead to the identification of individual genes involved in atherosclerosis and has, to a lesser extent, allowed the relative significance of pathways involved in atherosclerosis to be evaluated. Pathways associated with the differentially expressed genes were inflammation, (smooth muscle) cell proliferation, cell death, matrix formation and degradation, and lipid metabolism. Although genome-wide analysis offers the opportunity to obtain an unbiased picture of the genes expressed during atherosclerosis, the majority of studies have focused on known genes and have not exploited the full potential of genome-wide screening to identify novel player in atherosclerosis. The high proportion of uncharacterized ESTs at the time when the microarrays, SSH and RDA, studies were analyzed suggests that the molecular basis of atherosclerosis is far more complicated than our current knowledge of genes and pathways. Therefore, complete insight into the molecular mechanisms of atherosclerosis may only be reached if gene expression data are combined with research that focuses on the characterization of currently unknown genes.
| Integration of Genome-Wide Expression Data Sets |
|---|
To evaluate the possibility of integrating published information from various genome-wide expression studies, we analyzed the expression data of 7 chemokines with a cysteincystein (CC) motif (CCLs), which were recently demonstrated by our12 and other4749 groups to be differentially expressed during murine atherosclerosis. CCLs are members of a family of small secreted proteins (8 to 16 kDa) that mediate migration and activation of monocytes and T-lymphocytes into the tissue. An important role for CC chemokines in the pathogenesis of atherosclerosis has been demonstrated in murine atherosclerosis models, for example, using anti-CCL2 (anti-MCP-1) therapy,47,48 antagonists of cysteine-cysteine chemokine receptor 5 (CCR5, RANTES receptor)49 or mice lacking functional CCR2.50,51
A major bottleneck in analyzing these studies was the lack of publicly available data sets. Expression data from only 3 of 20 gene expression studies9,20,21 are deposited in a public repository for microarray data (GEO data sets GSE1560, GSE420, and GSE2143). As mentioned in the section entitled "Key Features of Gene Expression Studies in Atherosclerosis," GSE420 is incomplete and GSE2143 does not include a clear description of the samples. Other studies include incomplete lists of differentially expressed genes, with or without quantitative details. All together, this makes an adequate comparison impossible.
Comparison of our own data set (7 CCLs)12 with 6 other murine atherosclerosis studies11,15,24 yielded little information about the differential expression of CCLs (Table I). CCL2 levels were higher after murine cytomegalovirus infection of the aorta of apoE/ mice, coinciding with an increase in atherosclerotic plaque area,14 and were higher in ApoE/ fed a high-fat diet compared with those on a chow diet, as well as compared with C57Bl/6 and C3H/HeJ mice.9,13 CCL4 expression was higher in the atherosclerotic aorta of ApoE/ mice in comparison to the normal aorta of C57Bl/6 mice.11 Comparison of gene expression profiles from non-atherosclerotic aortas of C57Bl/6 mice and after 40 weeks high fat diet (lipid deposits),9 revealed that CCL2, 3, 4 and 8 are only marginally expressed in C57Bl/6 aortas even after high-fat feeding, and that CCL12 was significantly downregulated after lipid accumulation. Although it is difficult to draw a conclusion with such a limited data set, our analysis suggests that the available gene expression data on murine atherosclerosis are in line with each other.
Only 3 of 13 studies on gene expression profiles of human vascular specimens reported expression levels of selected CCLs (Table I). In total, 9 hits were found, which correspond to
10% of the query of 7 genes in 13 data sets (91 potential hits). In contrast to the murine data set that indicates an upregulation of CCL2 throughout plaque progression, CCL2 was downregulated in coronary arteries from patients with unstable angina versus stable angina17 and was not differentially expressed according to the degree of aortic stiffness.20 However, this comparison is hampered by the fact that plaque instability and aortic stiffness in humans does not necessarily reflect the same molecular processes as plaque progression in mice. CCL3 was mentioned in the list of genes associated with disease burden,22 but neither quantitative nor qualitative details were given. CCL4 and CCL5 were expressed in 2 of 4 stiff and 1 of 4 distensible aortic biopsies,20 suggesting higher CCL4 and CCL5 expression levels in more advanced stages of atherosclerosis. If so, the expression pattern of CCL5 in humans may differ from its expression in mice. Unfortunately, the CCL5 data reported by Seo et al22 cannot be used to give a conclusive answer, because the latter study does not specify whether CCL5 is upregulated or downregulated according to disease burden. CCL8 (MCP-2) levels were lower in patients with unstable compared with stable angina17 and were higher in stiff compared with distensible aortas.20 For CCL12 and CCL21, we did not find any evidence for differential expression in human atherosclerosis. Therefore, our analysis indicates that the available information is not sufficient to draw a definitive conclusion as to whether data from mouse studies can be translated to the human situation.
Our analysis revealed that the incompleteness of published data sets is the major hindrance for the integration of information. Unexpectedly, our comparison showed that in a few studies differences in gene expression levels were insignificant for genes that are already considered to play an important role in atherosclerosis in terms of function or protein expression (eg, CCL2 and CCL5; Table II, available online at http://atvb.ahajournals.org). A possible explanation is that RNA levels are a poor representation of the protein levels and functional activity of these chemokines. Additional bottlenecks are the different ways in which the degree of atherosclerosis can be classified, the use of different models, platforms, statistical approaches, and reference tissues. As indicated in Tables 2 and 3
, there is no consensus about a reference tissue in atherosclerosis, and the reference varies based on the addressed research question. One way to deal with this is to use a general common reference.52 To facilitate the comparison of datasets, initiatives have been taken to investigate the differences in array data gathered on different platforms53 and to agree on a reference standard to facilitate the comparison of data sets.52 Apart from this, it is important to note that the dissimilarities identified are the results of comparisons made at the level of individual genes. Because subtle changes in single gene expression levels may act in concert to result in significant changes at the pathway level, it may be worthwhile to transcend from the differential expression of individual genes to the identification of important biological processes underlying atherosclerosis. Recent insights54,55 have demonstrated that single gene-based analyses (all atherosclerosis studies so far) may miss important changes at the pathway level. As an example, pathway analysis has been shown to reveal differences in gene expression at the pathway level in type 2 diabetes, whereas it was not possible to detect significant differences at the level of individual genes.56
| Conclusions |
|---|
| Acknowledgments |
|---|
Received July 4, 2005; accepted February 23, 2006.
| References |
|---|