Date of Award


Document Type

Thesis (Ph.D.)

Department or Program

Department of Computer Science

First Advisor

V.S. Subrahmanian


Google’s Android operating system was first announced to the public in 2007 and was installed on more than three billion mobile devices by 2019. With the prevalence of Android OS, Android malware has since proliferated. Android malware is malicious software designed to exploit Android operating systems running on smart devices. Some variants of Android malware have the capability of disabling the device, allowing a malicious actor to remotely control the device, track the user’s activity, lock the device, and so on. Moreover, the evolution and sophistication of modern Android malware obfuscation and detection bypassing methods have significantly improved in recent years, making many traditional malware detection methods (e.g. signature-based detection) obsolete. In the meantime, malware samples from the same family might disguise themselves with different functionality. These features might be relatively stable over time to keep their purpose, or evolve and change to cope with the emerged detection techniques. To tackle malware proliferation, we need a scalable Android malware detection approach that can easily and reliably identify malware applications. Although numerous malware detection tools have been developed, including system-level and network-level approaches, scaling up the effective and lightweight detection for a large package of apps remains challenging.

In this thesis, I propose data-driven methods to detect and analyze malicious Android applications through static, dynamic, and custom-designed advanced features. First, I propose the advanced features to improve both Android malware detection results and the robustness of the detection system, specifically focusing on Android spyware, banking Trojans, and rooting malware. Second, I explore stability analysis in 122 general Android malware families and 120 Android goodware families through the definition of tau-Homogeneous Partition and the stability score of feature vectors in a period, which reveals the top stable and unstable features during general Android malware’s evolution over time. This thesis will help academics obtain a detailed picture of Android malware detection based on data-driven analysis using basic and advanced features. It will then act as a framework for subsequent studies to initiate new studies and continue to direct research in the area more broadly.