Yıl: 2018 Cilt: 40 Sayı: 1 Sayfa Aralığı: 20 - 25 Metin Dili: İngilizce DOI: 10.20515/otd.371882 İndeks Tarihi: 28-12-2018

Supervised Principal Component Analysis Approach Based on Artificial Neural Networks in Gene Expression Data

Öz:
The aim of this study is dimension reduction of multidimensional gene expression data using supervisedprincipal component analysis (S-PCA) and –proposed as a new approach- supervised principal component analysiswith artificial neural networks (S-ANN-PCA) and to compare performances of these two methods by using randomsurvival forests (RSF). In simulation application 5000 genes were generated according to multivariate normaldistribution and then survival time that is correlated to these gene data were generated for 100 units. Simulation stepwas carried out with 1000 repetitions.In addition, gene expression data for 240 individuals with extensive B-cell lymphoma (DLBCL) were used.Dimension reduction was done using Wald statistic in selection of important genes. The new data sets obtained fromthe methods were analyzed using RSF analysis.In the simulation application, it was obtained that the explanatorinessof S-PCA was significantly different from S-ANN-PCA (p<0.001). In the DLBCL data application, it was found thatthe error rate for the S-PCA was 36.78% and 43% for the S-ANN-PCA as a result of RSF. The importance value ofS-PCA method was found to be higher and its error rate was found to be lower than the other method.S-PCAperformed better than S-ANN-PCA in analyzing gene expression data experiencing a multidimensional problem
Anahtar Kelime:

Konular: Genel ve Dahili Tıp

Gen Ekspresyon Verilerinde Yapay Sinir Ağlarına Dayalı Denetimli Temel Bileşenler Analizi Yaklaşımı

Öz:
Bu çalışmada, denetimli temel bileşenler analizi (D-TBA) ile yeni bir yaklaşım olarak önerilen yapay sinir ağlarıyla denetimli temel bileşenler analizi (D-YSA-TBA) kullanılarak çok boyutlu gen ekspresyon verilerinin boyutunun indirgenmesi ve random survival forests (RSF) analizi kullanılarak performansların karşılaştırılması amaçlandı. Simülasyon uygulamasında çok değişkenli normal dağılımdan 100 birim için 5000 gen ve bu gen verisi ile ilişkili yaşam süresi verisi türetildi. Simülasyon aşaması 1000 tekrarlı olarak gerçekleştirildi. Ayrıca yaygın B-hücreli lenfoma (DLBCL) hastası 240 bireye ilişkin gen ekspresyon verileri kullanıldı. Önemli genlerin seçiminde Wald istatistiği kullanılarak boyut indirgemesi yapıldı. Yöntemlerden elde edilen yeni veri setleri RSF analizi kullanılarak analiz edildi. Simülasyon uygulamasında D-TBA ve D-YSA-TBAyöntemlerinin açıklayıcılıkları arasında anlamlı bir fark olduğu görülmüştür (p<0.001). DLBCL verisi ile yapılan uygulamada D-TBA yönteminin hatasının %36.78, DYSA- TBA yönteminin ise RSF sonucu- %43 olduğu bulunmuştur. D-TBA yönteminin önem değeri diğer yöntemden daha büyük, hatası ise daha düşük çıkmıştır. Çok boyutluluk problemi yaşanan gen ekspresyon verilerinin analizinde D-TBA, D-YSA-TBA’ya göre daha iyi performans göstermiştir.
Anahtar Kelime:

Konular: Genel ve Dahili Tıp
Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık
  • Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American statistical association. 2002;97(457):77-87.
  • Quackenbush J. Computational analysis of microarray data. Nature reviews genetics. 2001;2(6):418-27.
  • Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature medicine. 2001;7(6):673-9.
  • O'Neill MC, Song L. Neural network analysis of lymphoma microarray data: prognosis and diagnosis near-perfect. BMC bioinformatics. 2003;4(1):13
  • Liu B, Cui Q, Jiang T, Ma S. A combinational feature selection and ensemble neural network method for classification of gene expression data. BMC bioinformatics. 2004;5(1):136.
  • Zhao H, Ljungberg B, Grankvist K, Rasmuson T, Tibshirani R, Brooks JD. Gene expression profiling predicts survival in conventional renal cell carcinoma. PLoS medicine. 2005;3(1):e13.
  • Van Wieringen WN, Kun D, Hampel R, Boulesteix A-L. Survival prediction using gene expression data: a review and comparison. Computational statistics & data analysis. 2009;53(5):1590-603.
  • Nguyen TS, Rojo J. Dimension reduction of microarray data in the presence of a censored survival response: a simulation study. Statistical applications in genetics and molecular biology. 2009;8(1):1-38.
  • Ishwaran H, Kogalur UB. Random survival forests for R. R News. 2007;7(2):25-31.
  • Rosenwald A, Wright G, Chan WC, Connors JM, Campo E, Fisher RI, et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. New England Journal of Medicine. 2002;346(25):1937-47.
  • Bender R, Augustin T, Blettner M. Generating survival times to simulate Cox proportional hazards models. Statistics in medicine. 2005;24(11):1713-23.
  • Haykin S. Neural Networks, a comprehensive foundation,2nd ed., Prentice Hall, 842 p. 1999.
  • Breiman L. Random forests. Machine learning. 2001;45(1):5-32
  • Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology. 1983;148(3):839-43.
  • Bøvelstad HM, Nygård S, Størvold HL, Aldrin M, Borgan Ø, Frigessi A, et al. Predicting survival from microarray data—a comparative study. Bioinformatics. 2007;23(16):2080-7.
  • Zhang H, Yu C-Y, Singer B, Xiong M. Recursive partitioning for tumor classification with gene expression microarray data. Proceedings of the National Academy of Sciences. 2001;98(12):6730-5.
  • Michailidis G, de Leeuw J. Multilevel homogeneity analysis with differential weighting. Computational statistics & data analysis. 2000;32(3):411-42..
  • Daszykowski M, Walczak B, Massart D. A journey into low-dimensional spaces with autoassociative neural networks. Talanta. 2003;59(6):1095-105.
  • Fotheringhame D, Baddeley R. Nonlinear principal components analysis of spike train data. Biological Cybernetics. 1997;77(4):283-8.
  • Oja E. Principal components, minor components, and linear neural networks. Neural networks. 1992;5(6):927-35.
  • Ture M, Kurt I, Akturk Z. Comparison of dimension reduction methods using patient satisfaction data. Expert Systems with Applications. 2007;32(2):422-6.
  • Hsieh WW. Nonlinear principal component analysis by neural networks. Tellus A: Dynamic Meteorology and Oceanography. 2001;53(5):599-615.
  • Albanis G, Batchelor R, editors. Assessing the long-term credit standing using dimensionality reduction techniques based on neural networks—an alternative to overfitting. The proceedings of the SCI 99/ISAS 99 conference, Orlando, US; 1999.
  • HAYAT EA, Mevlut T, SENOL S. An Alternative Dimension Reduction Approach to Supervised Principal Components Analysis in High Dimensional Survival Data. Turkiye Klinikleri Journal of Biostatistics. 2016;8(1):21-9
  • Dong D, McAvoy TJ. Batch tracking via nonlinear principal component analysis. AIChE Journal. 1996;42(8):2199-208.
  • Scholz M, Fraunholz M, Selbig J. Nonlinear principal component analysis: neural network models and applications. Principal manifolds for data visualization and dimension reduction: Springer; 2008. p. 44-67.
  • Monahan AH. Nonlinear principal component analysis by neural networks: theory and application to the Lorenz system. Journal of Climate. 2000;13(4):821-35.
  • Hsieh WW. Machine learning methods in the environmental sciences: Neural networks and kernels: Cambridge university press; 2009.
  • Kramer MA. Nonlinear principal component analysis using autoassociative neural networks. AIChE journal. 1991;37(2):233-43
  • Beer DG, Kardia SL, Huang C-C, Giordano TJ, Levin AM, Misek DE, et al. Geneexpression profiles predict survival of patients with lung adenocarcinoma. Nature medicine. 2002;8(8):816-24.
  • Chen X, Wang L, Smith JD, Zhang B. Supervised principal component analysis for gene set enrichment of microarray data with continuous or survival outcomes. Bioinformatics. 2008;24(21):2474-81
  • Bair E, Tibshirani R. Semi-supervised methods to predict patient survival from gene expression data. PLoS biology. 2004;2(4):e108.
APA TÜRE M, KURT ÖMÜRLÜ İ (2018). Supervised Principal Component Analysis Approach Based on Artificial Neural Networks in Gene Expression Data. , 20 - 25. 10.20515/otd.371882
Chicago TÜRE Mevlüt,KURT ÖMÜRLÜ İMRAN Supervised Principal Component Analysis Approach Based on Artificial Neural Networks in Gene Expression Data. (2018): 20 - 25. 10.20515/otd.371882
MLA TÜRE Mevlüt,KURT ÖMÜRLÜ İMRAN Supervised Principal Component Analysis Approach Based on Artificial Neural Networks in Gene Expression Data. , 2018, ss.20 - 25. 10.20515/otd.371882
AMA TÜRE M,KURT ÖMÜRLÜ İ Supervised Principal Component Analysis Approach Based on Artificial Neural Networks in Gene Expression Data. . 2018; 20 - 25. 10.20515/otd.371882
Vancouver TÜRE M,KURT ÖMÜRLÜ İ Supervised Principal Component Analysis Approach Based on Artificial Neural Networks in Gene Expression Data. . 2018; 20 - 25. 10.20515/otd.371882
IEEE TÜRE M,KURT ÖMÜRLÜ İ "Supervised Principal Component Analysis Approach Based on Artificial Neural Networks in Gene Expression Data." , ss.20 - 25, 2018. 10.20515/otd.371882
ISNAD TÜRE, Mevlüt - KURT ÖMÜRLÜ, İMRAN. "Supervised Principal Component Analysis Approach Based on Artificial Neural Networks in Gene Expression Data". (2018), 20-25. https://doi.org/10.20515/otd.371882
APA TÜRE M, KURT ÖMÜRLÜ İ (2018). Supervised Principal Component Analysis Approach Based on Artificial Neural Networks in Gene Expression Data. Osmangazi Tıp Dergisi, 40(1), 20 - 25. 10.20515/otd.371882
Chicago TÜRE Mevlüt,KURT ÖMÜRLÜ İMRAN Supervised Principal Component Analysis Approach Based on Artificial Neural Networks in Gene Expression Data. Osmangazi Tıp Dergisi 40, no.1 (2018): 20 - 25. 10.20515/otd.371882
MLA TÜRE Mevlüt,KURT ÖMÜRLÜ İMRAN Supervised Principal Component Analysis Approach Based on Artificial Neural Networks in Gene Expression Data. Osmangazi Tıp Dergisi, vol.40, no.1, 2018, ss.20 - 25. 10.20515/otd.371882
AMA TÜRE M,KURT ÖMÜRLÜ İ Supervised Principal Component Analysis Approach Based on Artificial Neural Networks in Gene Expression Data. Osmangazi Tıp Dergisi. 2018; 40(1): 20 - 25. 10.20515/otd.371882
Vancouver TÜRE M,KURT ÖMÜRLÜ İ Supervised Principal Component Analysis Approach Based on Artificial Neural Networks in Gene Expression Data. Osmangazi Tıp Dergisi. 2018; 40(1): 20 - 25. 10.20515/otd.371882
IEEE TÜRE M,KURT ÖMÜRLÜ İ "Supervised Principal Component Analysis Approach Based on Artificial Neural Networks in Gene Expression Data." Osmangazi Tıp Dergisi, 40, ss.20 - 25, 2018. 10.20515/otd.371882
ISNAD TÜRE, Mevlüt - KURT ÖMÜRLÜ, İMRAN. "Supervised Principal Component Analysis Approach Based on Artificial Neural Networks in Gene Expression Data". Osmangazi Tıp Dergisi 40/1 (2018), 20-25. https://doi.org/10.20515/otd.371882