A Distribution-free Convolution Model for Background Correction of Oligonucleotide Microarray Data

dc.contributor.authorChen, Zhongxue
dc.contributor.authorMcGee, Monnie
dc.contributor.authorLiu, Qingzhong
dc.contributor.authorKong, Megan
dc.contributor.authorDeng, Youpin
dc.contributor.authorSchuermann, Richard H
dc.date.accessioned2022-03-25T18:32:38Z
dc.date.available2022-03-25T18:32:38Z
dc.date.issued2009-07-07
dc.descriptionThis was originally published by BMC Genomics in 2009.
dc.description.abstractIntroduction: Affymetrix GeneChip® high-density oligonucleotide arrays are widely used in biological and medical research because of production reproducibility, which facilitates the comparison of results between experiment runs. In order to obtain high-level classification and cluster analysis that can be trusted, it is important to perform various pre-processing steps on the probe-level data to control for variability in sample processing and array hybridization. Many proposed preprocessing methods are parametric, in that they assume that the background noise generated by microarray data is a random sample from a statistical distribution, typically a normal distribution. The quality of the final results depends on the validity of such assumptions. Results: We propose a Distribution Free Convolution Model (DFCM) to circumvent observed deficiencies in meeting and validating distribution assumptions of parametric methods. Knowledge of array structure and the biological function of the probes indicate that the intensities of mismatched (MM) probes that correspond to the smallest perfect match (PM) intensities can be used to estimate the background noise. Specifically, we obtain the smallest q2 percent of the MM intensities that are associated with the lowest q1 percent PM intensities, and use these intensities to estimate background. Conclusion: Using the Affymetrix Latin Square spike-in experiments, we show that the background noise generated by microarray experiments typically is not well modeled by a single overall normal distribution. We further show that the signal is not exponentially distributed, as is also commonly assumed. Therefore, DFCM has better sensitivity and specificity, as measured by ROC curves and area under the curve (AUC) than MAS 5.0, RMA, RMA with no background correction (RMA-noBG), GCRMA, PLIER, and dChip (MBEI) for preprocessing of Affymetrix microarray data. These results hold for two spike-in data sets and one real data set that were analyzed. Comparisons with other methods on two spike-in data sets and one real data set show that our nonparametric methods are a superior alternative for background correction of Affymetrix data.
dc.description.sponsorshipNational Institutes of Health contracts N01-AI40076 and N01-AI40041 to RHS and grant R15-AG16192 to Monnie McGee.
dc.identifier.citationChen Z, McGee M, Liu Q, Kong M, Deng Y, Scheuermann RH (2009). A distribution-free convolution model for background correction of oligonucleotide microarray data. BMC Genomics,10 (Suppl 1):S19.
dc.identifier.urihttps://hdl.handle.net/20.500.11875/3352
dc.language.isoen
dc.publisherBMC Genomics
dc.subjectmedical research
dc.subjectAffymetrix GeneChip® high-density oligonucleotide arrays
dc.subjectmicroarray data
dc.subjectcontrol for variability
dc.subjectpre-processing steps on arrays
dc.subjectparametric
dc.subjectDistribution Free Convolution Model (DFCM)
dc.subjectestimate background noise
dc.subjectexponentially distributed
dc.titleA Distribution-free Convolution Model for Background Correction of Oligonucleotide Microarray Data
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Distruibution-free convolution model for background._OCR.pdf
Size:
891.65 KB
Format:
Adobe Portable Document Format
Description:
Article

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.63 KB
Format:
Item-specific license agreed upon to submission
Description: