Academic Colleges and Programs
Permanent URI for this communityhttps://hdl.handle.net/20.500.11875/4530
Browse
Browsing Academic Colleges and Programs by Department "Computer Science"
Now showing 1 - 20 of 41
- Results Per Page
- Sort Options
Item A Distribution-free Convolution Model for Background Correction of Oligonucleotide Microarray Data(BMC Genomics, 2009-07-07) Chen, Zhongxue; McGee, Monnie; Liu, Qingzhong; Kong, Megan; Deng, Youping; Scheuermann, Richard H.Introduction: Affymetrix GeneChip® high-density oligonucleotide arrays are widely used in biological and medical research because of production reproducibility, which facilitates the comparison of results between experiment runs. In order to obtain high-level classification and cluster analysis that can be trusted, it is important to perform various pre-processing steps on the probe-level data to control for variability in sample processing and array hybridization. Many proposed preprocessing methods are parametric, in that they assume that the background noise generated by microarray data is a random sample from a statistical distribution, typically a normal distribution. The quality of the final results depends on the validity of such assumptions. Results: We propose a Distribution Free Convolution Model (DFCM) to circumvent observed deficiencies in meeting and validating distribution assumptions of parametric methods. Knowledge of array structure and the biological function of the probes indicate that the intensities of mismatched (MM) probes that correspond to the smallest perfect match (PM) intensities can be used to estimate the background noise. Specifically, we obtain the smallest q2 percent of the MM intensities that are associated with the lowest q1 percent PM intensities, and use these intensities to estimate background. Conclusion: Using the Affymetrix Latin Square spike-in experiments, we show that the background noise generated by microarray experiments typically is not well modeled by a single overall normal distribution. We further show that the signal is not exponentially distributed, as is also commonly assumed. Therefore, DFCM has better sensitivity and specificity, as measured by ROC curves and area under the curve (AUC) than MAS 5.0, RMA, RMA with no background correction (RMA-noBG), GCRMA, PLIER, and dChip (MBEI) for preprocessing of Affymetrix microarray data. These results hold for two spike-in data sets and one real data set that were analyzed. Comparisons with other methods on two spike-in data sets and one real data set show that our nonparametric methods are a superior alternative for background correction of Affymetrix data.Item A gene selection method for GeneChip array data with small sample sizes(2010-07) Chen, Zhongxue; Liu, Qingzhong; McGee, Monnie; Kong, Megan; Huang, Xudong; Deng, Youping; Scheuermann, Richard H.In microarray experiments with small sample sizes, it is a challenge to estimate p-values accurately and decide cutoff p-values for gene selection appropriately. Although permutation-based methods have proved to have greater sensitivity and specificity than the regular t-test, their p-values are highly discrete due to the limited number of permutations available in very small sample sizes. Furthermore, estimated permutation-based p-values for true nulls are highly correlated and not uniformly distributed between zero and one, making it difficult to use current false discovery rate (FDR)-controlling methods. Results: We propose a model-based information sharing method (MBIS) that, after an appropriate data transformation, utilizes information shared among genes. We use a normal distribution to model the mean differences of true nulls across two experimental conditions. The parameters of the model are then estimated using all data in hand. Based on this model, p-values, which are uniformly distributed from true nulls, are calculated. Then, since FDR-controlling methods are generally not well suited to microarray data with very small sample sizes, we select genes for a given cutoff p-value and then estimate the false discovery rate. Conclusion: Simulation studies and analysis using real microarray data show that the proposed method, MBIS, is more powerful and reliable than current methods. It has wide application to a variety of situations.Item A Logic Approach to Granular computing(International Journal of Cognitive Informatics and Natural Intelligence, 2008-04) Zhou, Bing; Yiyu, YaoGranular computing is an emerging field of study that attempts to formalize and explore methods and heuristics of human problem solving with multiple levels of granularity and abstraction. A fundamental issue of granular computing is the representation and utilization of granular structures. The main objective of this article is to examine a logic approach to address this issue. Following the classical interpretation of a concept as a pair of intension and extension, we interpret a granule as a pair of a set of objects and a logic formula describing the granule. The building blocks of granular structures are basic granules representing an elementary concept or a piece of knowledge. They are treated as atomic formulas of a logic language. Different types of granular structures can be constructed by using logic connectives. Within this logic framework, we show that rough set analysis (RSA) and formal concept analysis (FCA) can be interpreted uniformly. The two theories use multilevel granular structures but differ in their choices of definable granules and granular structures.Item A new statistical approach to combining p-values using gamma distribution and its application to genome-wide association study(BMC Bioinformatics, 2014-12-16) Liu, Qingzhong; Chen, Zhongxue; Yang, William; Yang, Jack Y.; Li, Jing; Yang, Mary QuBackground: Combining information from different studies is an important and useful practice in bioinformatics, including genome-wide association study, rare variant data analysis and other set-based analyses. Many statistical methods have been proposed to combine p-values from independent studies. However, it is known that there is no uniformly most powerful test under all conditions; therefore, finding a powerful test in specific situation is important and desirable. Results: In this paper, we propose a new statistical approach to combining p-values based on gamma distribution, which uses the inverse of the p-value as the shape parameter in the gamma distribution. Conclusions: Simulation study and real data application demonstrate that the proposed method has good performance under some situations.Item Android System Partition to Traffic Data?(International Journal of Knowledge Engineering, 2017-12) Zhou, Bing; Liu, Qingzhong; Byrd, BrittanyThe familiarity and prevalence of mobile devices inflates their use as instruments of crime. Law enforcement personnel and mobile forensics investigators, are constantly battling to gain the upper-hand at developing a standardized system able to comprehensively identify and resolve the vulnerabilities present within the mobile device platform. The Android mobile platform can be perceived as an antagonist to this objective, as its open nature provides attackers direct insight into the internalization and security features of the most popular platform presently in the consumer market. This paper identifies and demonstrates the system partition in an Android smartphone as a viable attack vector for covert data trafficking. An implementation strategy (comprised of four experimental phases) is developed to exploit the internal memory of a non-activated rooted Android HTC Desire 510 4g smartphone. A set of mobile forensics tools: AccessData Mobile Phone Examiner Plus (MPE+ v5.5.6), Oxygen Forensic Suite 2015 Standard, and Google Android Debug Bridge adb were used for the extraction and analysis process. The data analysis found the proposed approach to be a persistent and minimally detectable method to exchange dataItem Assessment of gene order computing methods for Alzheimer’s disease(BMC Medical Genomics, 2013-01-23) Liu, Qingzhong; Hu, Benqiong; Pang, Chaoyang; Wang, Shipend; Chen, Zhongxue; Vanderburg, Charles R; Rogers, Jack T.; Deng, Youping; Huang, Xudong; Jiang, GangBackground: Computational genomics of Alzheimer disease (AD), the most common form of senile dementia, is a nascent field in AD research. The field includes AD gene clustering by computing gene order which generates higher quality gene clustering patterns than most other clustering methods. However, there are few available gene order computing methods such as Genetic Algorithm (GA) and Ant Colony Optimization (ACO). Further, their performance in gene order computation using AD microarray data is not known. We thus set forth to evaluate the performances of current gene order computing methods with different distance formulas, and to identify additional features associated with gene order computation. Methods: Using different distance formulas- Pearson distance and Euclidean distance, the squared Euclidean distance, and other conditions, gene orders were calculated by ACO and GA (including standard GA and improved GA) methods, respectively. The qualities of the gene orders were compared, and new features from the calculated gene orders were identified. Results: Compared to the GA methods tested in this study, ACO fits the AD microarray data the best when calculating gene order. In addition, the following features were revealed: different distance formulas generated a different quality of gene order, and the commonly used Pearson distance was not the best distance formula when used with both GA and ACO methods for AD microarray data. Conclusion: Compared with Pearson distance and Euclidean distance, the squared Euclidean distance generated the best quality gene order computed by GA and ACO methods.Item Attack Modeling and Mitigation Strategies for Risk-Based Analysis of Networked Medical Devices(Proceedings of the 53rd Hawaii International Conference on System Sciences, 2020-01) Hodges, Bronwyn J.; McDonald, J. Todd; Glisson, William Bradley; Jacobs, Michael; Van Devender, Maureen; Pardue, J. HaroldThe escalating integration of network-enabled medical devices raises concerns for both practitioners and academics in terms of introducing new vulnerabilities and attack vectors. This prompts the idea that combining medical device data, security vulnerability enumerations, and attack-modeling data into a single database could enable security analysts to proactively identify potential security weaknesses in medical devices and formulate appropriate mitigation and remediation plans. This study introduces a novel extension to a relational database risk assessment framework by using the open-source tool OVAL to capture device states and compare them to security advisories that warn of threats and vulnerabilities, and where threats and vulnerabilities exist provide mitigation recommendations. The contribution of this research is a proof of concept evaluation that demonstrates the integration of OVAL and CAPEC attack patterns for analysis using a database-driven risk assessment framework.Item Attack-Graph Threat Modeling Assessment of Ambulatory Medical Devices(Proceedings of the 50th Hawaii International Conference on System Sciences, 2017-01) Luckett, Patrick; McDonald, J. Todd; Glisson, William BradleyThe continued integration of technology into all aspects of society stresses the need to identify and understand the risk associated with assimilating new technologies. This necessity is heightened when technology is used for medical purposes like ambulatory devices that monitor a patient’s vital signs. This integration creates environments that are conducive to malicious activities. The potential impact presents new challenges for the medical community. \ \ Hence, this research presents attack graph modeling as a viable solution to identifying vulnerabilities, assessing risk, and forming mitigation strategies to defend ambulatory medical devices from attackers. Common and frequent vulnerabilities and attack strategies related to the various aspects of ambulatory devices, including Bluetooth enabled sensors and Android applications are identified in the literature. Based on this analysis, this research presents an attack graph modeling example on a theoretical device that highlights vulnerabilities and mitigation strategies to consider when designing ambulatory devices with similar components.Item A Bleeding Digital Heart: Identifying Residual Data Generation from Smartphone Applications Interacting with Medical Devices(Proceedings of the 52nd Hawaii International Conference on System Sciences, 2019-01) Grispos, George; Glisson, William Bradley; Cooper, PeterThe integration of medical devices in everyday life prompts the idea that these devices will increasingly have evidential value in civil and criminal proceedings. However, the investigation of these devices presents new challenges for the digital forensics community. Previous research has shown that mobile devices provide investigators with a wealth of information. Hence, mobile devices that are used within medical environments potentially provide an avenue for investigating and analyzing digital evidence from such devices. The research contribution of this paper is twofold. First, it provides an empirical analysis of the viability of using information from smartphone applications developed to complement a medical device, as digital evidence. Second, it includes documentation on the artifacts that are potentially useful in a digital forensics investigation of smartphone applications that interact with medical devices.Item Calm Before the Storm: The Challenges of Cloud Computing in Digital Forensics(International Journal of Digital Crime and Forensics, 2012-04) Grispos, George; Storer, Tim; Glisson, William BradleyCloud computing is a rapidly evolving information technology (IT) phenomenon. Rather than procure, deploy, and manage a physical IT infrastructure to host their software applications, organizations are increasingly deploying their infrastructure into remote, virtualized environments, often hosted and managed by third parties. This development has significant implications for digital forensic investigators, equipment vendors, law enforcement, as well as corporate compliance and audit departments, amongst other organizations. Much of digital forensic practice assumes careful control and management of IT assets (particularly data storage) during the conduct of an investigation. This paper summarises the key aspects of cloud computing and analyses how established digital forensic procedures will be invalidated in this new environment, as well as discussing and identifying several new research challenges addressing this changing context.Item CDBFIP: Common Database Forensic Investigation Processes for Internet of Things(IEEE Access, 2017-10) Al-Dhaqm, Arafat; Razak, Shukor; Othman, Siti Hajar; Choo, Kim-Kwang Raymond; Glisson, William Bradley; Ali, Abulalem; Abrar, MohammadDatabase forensics is a domain that uses database content and metadata to reveal malicious activities on database systems in an Internet of Things environment. Although the concept of database forensics has been around for a while, the investigation of cybercrime activities and cyber breaches in an Internet of Things environment would benefit from the development of a common investigative standard that unifies the knowledge in the domain. Therefore, this paper proposes common database forensic investigation processes using a design science research approach. The proposed process comprises four phases, namely: 1) identification; 2) artefact collection; 3) artefact analysis; and 4) the documentation and presentation process. It allows the reconciliation of the concepts and terminologies of all common database forensic investigation processes; hence, it facilitates the sharing of knowledge on database forensic investigation among domain newcomers, users, and practitioners.Item Cloud Forecasting: Legal Visibility Issues in Saturated Environments(Computer Law & Security Review, 2018) Brown, Adam J.; Glisson, William Bradley; Andel, Todd R.; Choo, Kim-Kwang RaymondThe advent of cloud computing has brought the computing power of corporate data pro- cessing and storage centers to lightweight devices. Software-as-a-service cloud subscribers enjoy the convenience of personal devices along with the power and capability of a service. Using logical as opposed to physical partitions across cloud servers, providers supply flexible and scalable resources. Furthermore, the possibility for multitenant accounts promises considerable freedom when establishing access controls for cloud content. For forensic analysts conducting data acquisition, cloud resources present unique challenges. Inherent proper- ties such as dynamic content, multiple sources, and nonlocal content make it difficult for a standard to be developed for evidence gathering in satisfaction of United States federal evidentiary standards in criminal litigation. Development of such standards, while essential for reliable production of evidence at trial, may not be entirely possible given the guarantees to privacy granted by the Fourth Amendment and the Electronic Communications Privacy Act. Privacy of information on a cloud is complicated because the data is stored on resources owned by a third-party provider, accessible by users of an account group, and monitored according to a service level agreement. This research constructs a balancing test for competing considerations of a forensic investigator acquiring information from a cloud.Item Comparison of feature selection and classification for MALDI-MS data(BMC Genomics, 2009-07-07) Liu, Qingzhong; Sung, Andrew H.; Chen, Zhongxue; Yang, Jack Y.; Qiao, Mengyu; Yang, Mary Qu; Huang, Xudong; Deng, YoupingIntroduction: In the classification of Mass Spectrometry (MS) proteomics data, peak detection, feature selection, and learning classifiers are critical to classification accuracy. To better understand which methods are more accurate when classifying data, some publicly available peak detection algorithms for Matrix assisted Laser Desorption Ionization Mass Spectrometry (MALDI-MS) data were recently compared; however, the issue of different feature selection methods and different classification models as they relate to classification performance has not been addressed. With the application of intelligent computing, much progress has been made in the development of feature selection methods and learning classifiers for the analysis of high-throughput biological data. The main objective of this paper is to compare the methods of feature selection and different learning classifiers when applied to MALDI-MS data and to provide a subsequent reference for the analysis of MS proteomics data. Results: We compared a well-known method of feature selection, Support Vector Machine Recursive Feature Elimination (SVMRFE), and a recently developed method, Gradient based Leaveone-out Gene Selection (GLGS) that effectively performs microarray data analysis. We also compared several learning classifiers including K-Nearest Neighbor Classifier (KNNC), Naive Bayes Classifier (NBC), Nearest Mean Scaled Classifier (NMSC), uncorrelated normal based quadratic Bayes Classifier recorded as UDC, Support Vector Machines, and a distance metric learning for Large Margin Nearest Neighbor classifier (LMNN) based on Mahanalobis distance. To compare, we conducted a comprehensive experimental study using three types of MALDI-MS data. Conclusion: Regarding feature selection, SVMRFE outperformed GLGS in classification. As for the learning classifiers, when classification models derived from the best training were compared, SVMs performed the best with respect to the expected testing accuracy. However, the distance metric learning LMNN outperformed SVMs and other classifiers on evaluating the best testing. In such cases, the optimum classification model based on LMNN is worth investigating for future study.Item A Comparison Study using Stegexpose for Steganalysis.(International Journal of Knowledge Engineering, 2017-06) Olson, Eric; Carter, Larry; Liu, QingzhongSteganography is the art of hiding secret message in innocent digital data files. Steganalysis aims to expose the existence of steganograms. While internet applications and social media has grown tremendously in recent years, the use of social media is increasingly being used by cybercriminals as well as terrorists as a means of command and control communication including taking advantage of steganography for covert communication. In this paper, we investigate open source steganography/steganalysis software and test StegExpose for steganalysis. Our experimental results show that the capability of stegExpose is very limited.Item Detecting Deception Using Machine Learning(Proceedings of the 54th Hawaii International Conference on System Sciences, 2021-01) Ceballos Delgado, Alberto Alejandro; Glisson, William Bradley; Shashidhar, Narasimha; McDonald, J. Todd; Grispos, George; Benton, RyanToday’s digital society creates an environment potentially conducive to the exchange of deceptive information. The dissemination of misleading information can have severe consequences on society. This research investigates the possibility of using shared characteristics among reviews, news articles, and emails to detect deception in text-based communication using machine learning techniques. The experiment discussed in this paper examines the use of Bag of Words and Part of Speech tag features to detect deception on the aforementioned types of communication using Neural Networks, Support Vector Machine, Naïve Bayesian, Random Forest, Logistic Regression, and Decision Tree. The contribution of this paper is two-fold. First, it provides initial insight into the identification of text communication cues useful in detecting deception across different types of text-based communication. Second, it provides a foundation for future research involving the application of machine learning algorithms to detect deception on different types of text communication.Item Detecting Differentially Methylated Loci for Multiple Treatments Based on High-Throughput Methylation Data(BMC Bioinformatics, 2014-05-15) Chen, Zhongxue; Huang, Hanwen; Liu, QingzhongBackground: Because of its important effects, as an epigenetic factor, on gene expression and disease development, DNA methylation has drawn much attention from researchers. Detecting differentially methylated loci is an important but challenging step in studying the regulatory roles of DNA methylation in a broad range of biological processes and diseases. Several statistical approaches have been proposed to detect significant methylated loci; however, most of them were designed specifically for case-control studies. Results: Noticing that the age is associated with methylation level and the methylation data are not normally distributed, in this paper, we propose a nonparametric method to detect differentially methylated loci under multiple conditions with trend for Illumina Array Methylation data. The nonparametric method, Cuzick test is used to detect the differences among treatment groups with trend for each age group; then an overall p-value is calculated based on the method of combining those independent p-values each from one age group. Conclusions: We compare the new approach with other methods using simulated and real data. Our study shows that the proposed method outperforms other methods considered in this paper in term of power: it detected more biological meaningful differentially methylated loci than others.Item Detecting Repackaged Android Applications Using Perceptual Hashing(Proceedings of the 53rd Hawaii International Conference on System Sciences, 2020-01) Nguyen, Thanh; McDonald, J. Todd; Glisson, William Bradley; Andel, Todd R.The last decade has shown a steady rate of Android device dominance in market share and the emergence of hundreds of thousands of apps available to the public. Because of the ease of reverse engineering Android applications, repackaged malicious apps that clone existing code have become a severe problem in the marketplace. This research proposes a novel repackaged detection system based on perceptual hashes of vetted Android apps and their associated dynamic user interface (UI) behavior. Results show that an average hash approach produces 88% accuracy (indicating low false negative and false positive rates) in a sample set of 4878 Android apps, including 2151 repackaged apps. The approach is the first dynamic method proposed in the research community using image-based hashing techniques with reasonable performance to other known dynamic approaches and the possibility for practical implementation at scale for new applications entering the Android market.Item Exploitation and Detection of a Malicious Mobile Application(Proceedings of the 50th Hawaii International Conference on System Sciences, 2017-01) Nguyen, Thanh; McDonald, J. Todd; Glisson, William BradleyMobile devices are increasingly being embraced by both organizations and individuals in today’s society. Specifically, Android devices have been the prominent mobile device OS for several years. This continued amalgamation creates an environment that is an attractive attack target. The heightened integration of these devices prompts an investigation into the viability of maintaining non-compromised devices. Hence, this research presents a preliminary investigation into the effectiveness of current commercial anti-virus, static code analysis and dynamic code analysis engines in detecting unknown repackaged malware piggybacking on popular applications with excessive permissions. The contribution of this paper is two-fold. First, it provides an initial assessment of the effectiveness of anti-virus and analysis tools in detecting malicious applications and behavior in Android devices. Secondly, it provides process for inserting code injection attacks to stimulate a zero-day repackaged malware that can be used in future research efforts.Item Feature Selection and Classification of MAQC-II Breast Cancer and Multiple Myeloma Microarray Gene Expression Data(PLoS ONE, 2009-12-11) Liu, Qingzhong; Sung, Andrew H.; Chen, Zhongxue; Huang, Xudong; Deng, Youping; Liu, JianzhongMicroarray data has a high dimension of variables but available datasets usually have only a small number of samples, thereby making the study of such datasets interesting and challenging. In the task of analyzing microarray data for the purpose of, e.g., predicting gene-disease association, feature selection is very important because it provides a way to handle the high dimensionality by exploiting information redundancy induced by associations among genetic markers. Judicious feature selection in microarray data analysis can result in significant reduction of cost while maintaining or improving the classification or prediction accuracy of learning machines that are employed to sort out the datasets. In this paper, we propose a gene selection method called Recursive Feature Addition (RFA), which combines supervised learning and statistical similarity measures. We compare our method with the following gene selection methods: Support Vector Machine Recursive Feature Elimination (SVMRFE) N Leave-One-Out Calculation Sequential Forward Selection (LOOCSFS) N Gradient based Leave-one-out Gene Selection (GLGS) To evaluate the performance of these gene selection methods, we employ several popular learning classifiers on the MicroArray Quality Control phase II on predictive modeling (MAQC-II) breast cancer dataset and the MAQC-II multiple myeloma dataset. Experimental results show that gene selection is strictly paired with learning classifier. Overall, our approach outperforms other compared methods. The biological functional analysis based on the MAQC-II breast cancer dataset convinced us to apply our method for phenotype prediction. Additionally, learning classifiers also play important roles in the classification of microarray data and our experimental results indicate that the Nearest Mean Scale Classifier (NMSC) is a good choice due to its prediction reliability and its stability across the three performance measurements: Testing accuracy, MCC values, and AUC errors.Item Gene selection and classification for cancer microarray data based on machine learning and similarity measures(BMC Genomics, 2011) Liu, Qingzhong; Sung, Andrew H.; Chen, Zhongxue; Liu, Jianzhong; Chen, Lei; Deng, Youping; Wang, Zhaohui; Huang, Xudong; Qiao, MengyuBackground: Microarray data have a high dimension of variables and a small sample size. In microarray data analyses, two important issues are how to choose genes, which provide reliable and good prediction for disease status, and how to determine the final gene set that is best for classification. Associations among genetic markers mean one can exploit information redundancy to potentially reduce classification cost in terms of time and money. Results: To deal with redundant information and improve classification, we propose a gene selection method, Recursive Feature Addition, which combines supervised learning and statistical similarity measures. To determine the final optimal gene set for prediction and classification, we propose an algorithm, Lagging Prediction Peephole Optimization. By using six benchmark microarray gene expression data sets, we compared Recursive Feature Addition with recently developed gene selection methods: Support Vector Machine Recursive Feature Elimination, Leave-One-Out Calculation Sequential Forward Selection and several others. Conclusions: On average, with the use of popular learning machines including Nearest Mean Scaled Classifier, Support Vector Machine, Naive Bayes Classifier and Random Forest, Recursive Feature Addition outperformed other methods. Our studies also showed that Lagging Prediction Peephole Optimization is superior to random strategy; Recursive Feature Addition with Lagging Prediction Peephole Optimization obtained better testing accuracies than the gene selection method varSelRF.
- «
- 1 (current)
- 2
- 3
- »