Gene selection and classification for cancer microarray data based on machine learning and similarity measures

dc.contributor.authorLiu, Qingzhong
dc.contributor.authorSung, Andrew H.
dc.contributor.authorChen, Zhongxue
dc.contributor.authorLiu, Jianzhong
dc.contributor.authorChen, Lei
dc.contributor.authorDeng, Youpin
dc.contributor.authorWang, Zhaohui
dc.contributor.authorHuang, Xudong
dc.contributor.authorQiao, Mengyu
dc.date.accessioned2022-01-25T16:18:14Z
dc.date.available2022-01-25T16:18:14Z
dc.date.issued2011
dc.descriptionThis article was originally published in BMC Genomics. doi:10.1186/1471-2164-12-S5-S1
dc.description.abstractBackground: Microarray data have a high dimension of variables and a small sample size. In microarray data analyses, two important issues are how to choose genes, which provide reliable and good prediction for disease status, and how to determine the final gene set that is best for classification. Associations among genetic markers mean one can exploit information redundancy to potentially reduce classification cost in terms of time and money. Results: To deal with redundant information and improve classification, we propose a gene selection method, Recursive Feature Addition, which combines supervised learning and statistical similarity measures. To determine the final optimal gene set for prediction and classification, we propose an algorithm, Lagging Prediction Peephole Optimization. By using six benchmark microarray gene expression data sets, we compared Recursive Feature Addition with recently developed gene selection methods: Support Vector Machine Recursive Feature Elimination, Leave-One-Out Calculation Sequential Forward Selection and several others. Conclusions: On average, with the use of popular learning machines including Nearest Mean Scaled Classifier, Support Vector Machine, Naive Bayes Classifier and Random Forest, Recursive Feature Addition outperformed other methods. Our studies also showed that Lagging Prediction Peephole Optimization is superior to random strategy; Recursive Feature Addition with Lagging Prediction Peephole Optimization obtained better testing accuracies than the gene selection method varSelRF.
dc.description.sponsorshipThe Institute for Complex Additive Systems Analysis, a division of New Mexico Tech, and from Sam Houston State University
dc.description.subjecttesting accuracies
dc.description.subjectgene selection method
dc.identifier.citationLiu et al.: Gene selection and classification for cancer microarray data based on machine learning and similarity measures. BMC Genomics 2011 12(Suppl 5):S1. doi:10.1186/1471-2164-12-S5-S1
dc.identifier.urihttps://hdl.handle.net/20.500.11875/3258
dc.language.isoen
dc.publisherBMC Genomics
dc.subjectMicroarray data
dc.subjectinformation redundancy
dc.subjectRecursive Feature Addition
dc.subjectLagging Prediction Peephole Optimization algorithm
dc.subjectlearning machines
dc.titleGene selection and classification for cancer microarray data based on machine learning and similarity measures
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Gene selection and classification for cancer_OCR.pdf
Size:
1.5 MB
Format:
Adobe Portable Document Format
Description:
Article

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.63 KB
Format:
Item-specific license agreed upon to submission
Description: