Electronic Theses and Dissertations

Permanent URI for this collectionhttps://hdl.handle.net/20.500.11875/18

Browse

Now showing 1 - 15 of 15

APPAREL RECOMMENDATION WITH TRANSFER LEARNING AND LOCALITY SENSITIVE HASHING
(2022-12-01T06:00:00.000Z) Gundogan, Kubra; Cho, Hyuk; Liu, Qingzhong; An, Min Kyung; Islam, ABM R
The textile and apparel industries have now grown a lot and there is a variety of clothing that is constantly renewed or changed throughout the world. Given the abundance of selection options available, we developed a system that takes an image a user provides and then offers a recommendation which matches the user’s query image. This study developed a cloth recommendation system, which employs transfer learning with a pre-trained deep learning model (VGG16) followed by locality sensitive hashing with random projection. The dataset was originated by the H&M company and was exhibited in a competition via Kaggle. This dataset contains 105K image data in total by addressing 130 different categories in five (5) main groups. Among a total of 7,000 of the Ladieswear group, occupying about 37.7% in the dataset, a balanced dataset was obtained by splitting the 7,000 images into seven (7) clothing groups. These groups are labeled dress, trousers, sweater, blouse, skirt, t-shirt, and vest top. Specifically, we extracted embedded features of the image using transfer learning and achieved a fast recommendation using locality sensitive hashing. We demonstrated the effectiveness of the proposed recommendation system by comparing the average cosine similarity of top 6 recommendations before and after locality sensitive hash. Furthermore, we qualitatively visualized the quality of the recommendation.
Data Collection Scheduling in Directional Wireless Sensor Networks
(2019-04-17) Simsek, Ecem; An, Min Kyung
This thesis studies Minimum Latency Collection Scheduling (MLCS) problem in Wireless Sensor Networks (WSNs) whose objective is to obtain collision-free data collection schedules with minimum latencies. Unlike most existing works that explored the problem with the uniform power model in omnidirectional WSNs, this thesis studies the problem with non-uniform power model in directional WSNs. In this study, power control, where power levels of sensor nodes need to be controlled, is also considered. The thesis proposes an algorithm, named Hierarchical Streaming Collection Scheduling Algorithm (HSCS), that produces collision-free data collection schedules where appropriate power levels are assigned, and validates its performance in terms of latency on simulate networks.
Detect forgery video by performing transfer learning on Deep Neural Network
(2019-04-17) Zhang, Zhaohe; Liu, Qingzhong
Nowadays, the authenticity of digital image and videos becomes hard while the forgery techniques are more advanced. Given the recent progress on Generative Neural Network (GNN) development that may generate realistic images and videos, it becomes more difficult to detect the authenticity of digital photographs. In this thesis, we expose a popular open-source video forgery library called “DeepFaceLab” by making use of deep learning. We retrain the existing state-of-the-art image classification neural networks to capture the features from manipulated video frames. After passing various sets of forgery video frames through a well-trained neural network, a bottleneck file is created for each image, it contains the features and artifacts in forgery video that could not be captured by the human eye. Our testing accuracy is over 99% when testing DeepFake videos. We also examined our method on FaceForensics dataset and achieved good detection results on both testing set and validation set. Experiments under different data sizes confirm the effectiveness and efficiency of the proposed method.
DETECTION OF RED-COCKADED WOODPECKER HABITATS USING YOLO ALGORITHMS
(2022-08-01T05:00:00.000Z) de Lemmus, Emerson; Cho, Hyuk; Zhou, Bing; An, Min Kyung; Liu, Qingzhong
Habitat and population monitoring are crucial for the preservation of endangered species. However, gathering habitat data may be a hazardous and laborious task. As a result, wildlife ecologists increasingly turn to remote sensing and automation to collect large-scale ecological data on a given species. Particularly, the red-cockaded woodpecker (RCW) is a species endemic to the southeastern United States. Endangered since 1973, wildlife biologists have performed pedestrian surveys to assess the status of the species. Through close interdisciplinary collaboration with ecologists, this work conducts a pilot study that automatically detects potential habitats of RCW. The dataset of 978 images was collected by a team of wildlife ecologists from Raven Environmental Inc. using unmanned aerial vehicles (UAVs). RCW habitat imagery is unique and unavailable in the public domain, thus considered novel image data. The primary goal of this research is to assess the RCW habitat detection performance by You Only Look Once (YOLO) object detection algorithms. Due to the demanding computing requirements of YOLO algorithms, only two small models, YOLOv4-tiny and YOLOv5n, are employed and assessed for this study. The best hyperparameter values are identified for each model to maximize accuracy performance. YOLOv4-tiny reached a training mAP (minimum Average Precision) of 0.96 (i.e., 96%) and a testing accuracy of 0.85 (i.e., 85%), while YOLOv5n achieved a training mAP of 0.78 (i.e., 78%) and a testing accuracy of 0.82 (82%). Overall, combining the inference results of both models achieved a 100% detection of de facto habitats. This study realizes a real-time platform that integrates computer vision with domain knowledge and identifies potential habitats from large-scale image data. Therefore, the deployment of this study on wildlife ecosystems will significantly assist wildlife biologists in saving personnel hours through real-time detection of potential habitats and accelerating proactive field validating for the preservation of RCW.
GENDER AND ETHNICITY BIAS IN DEEP LEARNING
(December 2023) Islam, Ahsan Ul; Islam, ABM R; Liang, Fan; An, Min Kyung
People’s opinions and actions in everyday life are increasingly influenced by artificial intelligence. However, representation in the design of these technologies has the potential to undo decades of progress in gender and ethnicity. These biases threaten the strides toward equality in these areas, casting a shadow over our progress. The concerns surrounding gender and ethnicity biases have pervaded numerous fields, none more prominently than within artificial intelligence, especially in pre-trained deep learning models. These models, celebrated for their capacity to extract knowledge from extensive datasets, hold immense potential to revolutionize society and decision-making. However, they are not impervious to the biases embedded in the data upon which they are trained. It raises the possibility of unintentionally perpetuating and amplifying societal biases linked to gender or ethnicity. The issue of gender bias in deep learning models has gained significant traction in recent times. As these models have become increasingly ubiquitous across various applications, it has become evident that they often perpetuate and exacerbate long-standing gender biases inherent in the training data. This paper embarks on a methodical and empirically rigorous exploration, delving into the nuanced landscape of gender and ethnicity bias within a diverse array of pre-trained deep learning models. Through meticulous scrutiny of these models’ performance about gender and ethnicity-based predictions, we aim to unearth invaluable insights regarding the presence, intricacies, and magnitude of bias.This research paper offers a comprehensive and empirically grounded examination of gender and ethnicity bias within a diverse range of pre-trained deep-learning models. This investigation involves a meticulous analysis of how these models perform when making predictions related to gender and ethnicity. By scrutinizing their predictions, the aim is to unearth valuable insights into the presence, nuances, and extent of these biases within AI systems. Furthermore, this work introduces an innovative, holistic solution to mitigate gender and ethnicity bias. We present CNN models strategically crafted to address and rectify biases about gender and ethnicity effectively. This model represents a pioneering step towards combating bias on multiple fronts within AI systems. This research thus contributes significantly to the broader understanding of bias within AI technologies. Simultaneously addressing gender and ethnicity bias and proposing a practical remedy and the way for more equitable and unbiased advancements in artificial intelligence. Through rigorous analysis and innovative solutions, we seek to ensure that AI systems respect and uphold the principles of fairness, inclusively, and diversity, thereby fostering a more just technological landscape for all.
HEAT-MAP BASED EMOTION AND FACE RECOGNITION FROM THERMAL IMAGES
(2019-04-17) Ilikci, Burak; Liu, Qingzhong
Nowadays, emotion recognition has become a feasible problem with implementation of Convolutional Neural Networks in Computer Vision domain. However, credibility of emotion recognition from daily images or videos is not enough. As people can easily mimic emotions one after another and fooling the trained models, a different approach should be taken into consideration. Thermal cameras would be a suitable way to develop more credible emotion recognition models. Heat-map of faces proved hinting emotions before, and it is not easy to fool the models trained from thermal heat-maps as it visualizes state of the body’s heat. In this research a method is adapted for training a model for recognizing emotions from thermal heat-mapped cameras with a fast detection algorithm -YOLOv3-. With this method the main aim is to detecting emotions from a given picture which taken from thermal cameras.
Investigation of IndexedDB Persistent Storage for Digital Forensics
(2022-08-01T05:00:00.000Z) Paligu, Furkan; Varol, Cihan; Cho, Hyuk; Shashidhar, Narasimha K; Rasheed, Amar A; Saliah-Hassane, Hamadou
The dependency on electronic services is increasing at a rapid rate in every aspect of our daily lives. While the Covid-19 virus remolded how we conduct business through remote collaboration applications, social media is rooting its grasp more in our day-in and day-out activities. Every day, a substantial amount of data is left in both desktop and web-based applications. As the size and the sophistication of stored data increases, so does the complexity of the technology that handles it. Consequently, forensic investigators are facing challenges in constantly adapting to emerging technologies. Hence, these technologies constitute the base for handling the vast size and volume of data in the modern era of information technology. In the scope of this dissertation the efficacy of emerging client-side technology, namely IndexedDB, is scrutinized for forensic value, practices of extraction, processing, presentation, and verification. Accordingly, a series of single case pretest-posttest quasi experiments are conducted to populate artifacts in the underlying storage technologies of IndexedDB. Subsequently, the populated artifacts are extracted and processed based on signature patterns and evaluated for their significance. Additionally, the artifacts are characterized, verified, and presented with the help of cornerstone tools that are implemented in this scope. Furthermore, time-frame analysis is constructed where it is possible to display ordered sequences of events for investigators in a suitable format.
NNMF IN GOOGLE TENSORFLOW AND APACHE SPARK: A COMPARISON STUDY
(2019-07-12) Li, Qizhao; Cho, Hyuk
Data mining is no longer a new term as it has been already pervasive in all aspects of our lives. New computing platforms for specific usages are proposed continuously. Therefore, the awareness of the characteristics and the capacity of existing and newly proposed platforms becomes a critical task for researchers and practitioners, who want to use existing algorithms and also develop new ones on the recent platforms. Particularly, this thesis aims to implement and compare a set of popular matrix factorization algorithms on recent computing platforms. Specifically, the three matrix factorization algorithms, including classic Non-negative Matrix Factorization (NNMF), CUR Matrix Decomposition, and Compact Matrix Decomposition (CMD), are implemented on the two computing platforms, including Apache Spark and Google TensorFlow. As rank k approximation with Singular Value Decomposition (SVD) is an optimal baseline, both CUR and CMD approximation are less accurate than the SVD approximation. The experimental result shows that CMD in TensorFlow performs better in terms of matrix approximation than the other two non-negative matrix factorization algorithms (NNMF, and CUR) in the same experiment setup. Also, as the number of rows or columns selected for CUR and CMD increases, the approximation error decreases.
NVMe-Assist: A Novel Theoretical Framework for Digital Forensics A Case Study on NVMe Storage Devices and Related Artifacts on Windows 10
(2022-08-01T05:00:00.000Z) Neyaz, Ashar; Shashidhar, Narasimha K; Varol, Cihan; Rasheed, Amar A
With ever-advancing changes in technology come implications for the digital forensics community. In this document, we use the term digital forensics to denote the scientific investigatory procedure for digital crimes and attacks. Digital forensics examiners often find it challenging when new devices are used for nefarious activities. The examiners gather evidence from these devices based on supporting literature. Multiple factors contribute to a lack of research on a particular device or technology. The most common factors are that the technology is new to the market, and there has not been much time to conduct sufficient research. It is also likely that the technology is not popular enough to garner research attention. If an examiner encounters such a device, they are often required to develop impromptu solutions to investigate such a case. Sometimes, examiners have to review their examination processes on model devices that labs are necessitated to purchase to see if existing methods suffice. This ad-hoc approach adds time and additional expense before actual analysis can commence. In this research, we investigate a new storage technology called Non-Volatile Memory Express (NVMe). This technology uses Peripheral Component Interconnect (PCIe) mechanics for its working. Since this storage technology is relatively new, it lacks a substantial digital forensics foundation to draw upon to conduct a forensics investigation. Additionally, to the best of our knowledge, there is an insufficient body of work to conduct sound forensics research on such devices. To this end, our framework, NVMe-Assist puts forth a strong theoretical foundation thatempowers digital forensics examiners in conducting analysis onNVMedevices, including wear-leveling, TRIM, Prefetch files, Shellbag, and BootPerfDiagLogger.etl. Lastly, we have also worked on creating the NVMe-Assist tool using Python. This tool parses the partition tables in the boot sector and is the upgrade of the mmls tool of The Sleuth Kit command-line tools. Our tool currently supports E01, and RAW files of the physical acquisition of hard-disk drives (HDDs), solid-state drives (SSDs), NVMe SSDs, and USB flash drives as data source files. To add to that, the tool works on both the MBR (Master Boot Record) and GPT (GUID Partition Table) style partitions.
PHONETIC MATCHING TOOLKIT WITH STATE-OF-THE-ART META-SOUNDEX ALGORITHM (ENGLISH AND SPANISH)
(2016-10-27) Koneru, Keerthi; Varol, Cihan; Karpoor, Shashidhar; Zhou, Bing
Researchers confront major problems while searching for various kinds of data in large imprecise databases, as they are not spelled correctly or in the way they were expected to be spelled. As a result, they cannot find the word they sought. Over the years of struggle, pronunciation of words was considered to be one of the practices to solve the problem effectively. The technique used to acquire words based on sounds is known as “Phonetic Matching”. Soundex was the first algorithm developed and other algorithms like Metaphone, Caverphone, DMetaphone, Phonex etc., are also used for information retrieval in different environments. This project mainly deals with the analysis and implementation of newly proposed Meta-Soundex algorithm for English and Spanish languages which retrieves suggestions for the misspelled words. The newly developed Meta-Soundex algorithm addresses the limitations of Metaphone and Soundex algorithms. Specifically, the new algorithm has more accuracy compared to both Soundex and Metaphone algorithm. The new algorithm also has higher precision compared to Soundex, thus reducing the noise in the considered arena. A phonetic matching toolkit is also developed enclosing the different phoneticmatching algorithms along with the state-of-the-art Meta-Soundex algorithm for both Spanish and English languages.
QUALITATIVE CLEANING METHODS ON DISTRIBUTED IOT DATASETS
(2019-04-08) Ogungbemile, George; ZHOU, BING
Data analysis encompasses a set of individual steps that allows a typically large data set to be remodeled such that actionable information can be extracted from the data set, which can then be used to support decision-making. Data generated from multiple distributed sources is usually dirty by default and dirty data will often lead to inaccurate or incomplete data analysis. As a result, without first performing data cleaning, wrong or fatally flawed business decisions is inevitable. IoT describes a network of physical and virtual objects containing software, electrical components and sensors that exchange data with other connected devices over the internet. The data generated from these sensors is distributed by design and my aim for this thesis is to explore qualitative data cleaning methods such as integrity constraints and functional dependency violations to perform error detection and in place error repairing techniques on the distributed data set generated from these devices. This approach is relatively new since most of the prior data cleaning research in this domain have focused on quantitative techniques such as outlier detection. The next goal for my thesis will then be to perform exploratory data analysis on the data sets from these IoT sources using data wrangling tools on open source frameworks such as Optimus under Apache Spark to handle the unstructured and semi structured formats of the data generated from these sources. The end goal will be to generate clean data from these data sources such that insights can be gained to support decision making for the purpose of product improvement.
SELECTED FORENSIC DATA ACQUISITION FROM ANDROID DEVICES
(2018-11-12) Rathi, Khushboo; Karabiyik, Umit
In recent times, amount of data stored in the smartphones have increased phenomenally. A smartphone is as powerful as a laptop or a desktop where people store their person data or do daily activities, as a result it can act as an important evidence for law enforcement while solving the cases such as, in case of accident, malicious exchange of text messages, photos or videos taken during mass shooting incident. This act as an important forensic interest to the investigator. Some people may be willing to give their phones to the investigator, but they would like to make sure that their privacy and their data privacy have been taken into consideration, meaning that only data relevant to the case under investigation should be analyzed and collected. Even supreme court have passed that ruling to preserve the users and data privacy. In this research study; a new forensic tool is developed which can do selective extraction of data from an android device. The input to this tool is based on the consent form which is filled by the witness/victim who voluntarily hands over his/her phone to law enforcement and investigator extracts data within those limits. This tool does the extraction on metadata and content based filtering and export the extracted data along with the hash values to a bootable drive in a forensically sound manner. State-of the art machine learning models are used to perform content based filtering. As a result, a robust and efficient tool is built to solve the real time cases while preserving the users and data privacy.
SENTIMENT AND BEHAVIORAL ANALYSIS IN EDISCOVERY
(2022-08-01T05:00:00.000Z) Krishnan, Sundar; Narasimha K. Shashidhar, PhD; Cihan Varol, PhD; ABM Rezbaul Islam, PhD
A suspect or person-of-interest during legal case review or forensic evidence review can exhibit signs of their individual personality through the digital evidence collected for the case. Such personality traits of interest can be analytically harvested for case investigators or case reviewers. However, manual review of evidence for such flags can take time and contribute to increased costs. This study focuses on certain use-case scenarios of behavior and sentiment analysis as a critical requirement for a legal case’s success. This study aims to quicken the review and analysis phase and offers a software prototype as a proof-of-concept. The study starts with the build and storage of Electronic Stored Information (ESI) datasets for three separate fictitious legal cases using publicly available data such as emails, Facebook posts, tweets, text messages and a few custom MS Word documents. The next step of this study leverages statistical algorithms and automation to propose approaches towards identifying human sentiments, behavior such as, evidence of financial fraud behavior, and evidence of sexual harassment behavior of a suspect or person-of-interest from the case ESI. The last stage of the study automates these approaches via a custom software and presents a user interface for eDiscovery teams and digital forensic investigators.
SPHERICAL AND STOCHASTIC CO-CLUSTERING ALGORITHMS
(2019-04-17) Sariboz, Emrah; Cho, Hyuk
Clustering, without a doubt, is a dominating area in data mining and machine learning field. Due to the wide range of the necessity to clustering algorithms, it has many applications in real-life problems, ranging from bioinformatics to personalized information delivery. Feature characteristics of the newly generated data lead us to new approaches to explore the nature of it. General single-sided (i.e. one-way) clustering algorithms such as K-means algorithm clusters either rows or columns of the data matrix. Coclustering algorithm clusters both the instances and features of the data matrix simultaneously and thus, it is more suitable to discover the pattern(s) hidden in both row and column dimensions. Most existing Coclustering algorithms include inexplicit clustering steps for each dimension, separately. In this study, we developed two novel Coclustering algorithms, named as Spherical Coclustering and Stochastic Coclustering, which utilize the existing K-means framework, furthermore a specific data construction, and two specific data normalization was included as a pre-processing step. The Coclustering framework resembles one existing Coclustering algorithm, Spectral Coclustering, as it first applies feature selection using singular value decomposition and utilizes one-way clustering to achieve Coclustering. Furthermore, we partially address a couple of practical well-known problem in clustering algorithm which include the cluster initialization, the degeneracy problem, a local minimum, and a nan (not-a-number) condition in a Kullback-Leibler divergence. The correctness and efficiency of the two algorithms were validated with publicly available benchmark dataset in terms of monotonicity of objective function value change and clustering accuracy. To be specific, we compared the accuracy performance of Euclidean K-means, Stochastic K-means, Spherical K-means, Stochastic Coclustering and Spherical Coclustering algorithms.
WHALE AND DOLPHIN CLASSIFICATION USING ENSEMBLE TRANSFER LEARNING
(2022-12-01T06:00:00.000Z) Kose, Nuri Alperen; Cho, Hyuk; Liu, Qingzhong; An, Min Kyung; Islam, ABM R
Although whales and dolphins are endangered due to reasons such as global warming and improper hunting, they play an important role in the lives of other living things by providing almost 50% of the world's oxygen, which motivated this study. The datasets in this study were provided by Happywhale through Kaggle, and there are about 48 thousand images, including 31 thousand whale and 16 thousand dolphin images. Three major species from whales and dolphins, respectively, were selected from the original dataset. This study developed a novel classification model that might help marine mammal scientists monitor endangered whales and dolphins. We implemented an ensemble transfer learning model to improve classification performance, where we combined five pre-trained CNN models with specific selected datasets. The performance of classification model was measured with four metrics including accuracy, precision, recall, and F1 score. The proposed ensemble transfer learning model performs overall better than individual models for selected dataset. Although we encountered hardware requirements, limitations, and challenges in executing the ensemble transfer learning model with large size datasets, we gained experience for other pre-trained CNN models we could investigate further in future.

Browse

Browsing Electronic Theses and Dissertations by Department "Computer Science"