Machine learning based solutions have been successfully employed for automatic detection of malware on Android. However, machine learning models lack robustness to adversarial examples, which are crafted by adding carefully chosen perturbations to the normal inputs. So far, the adversarial example scan only deceive detectors that rely on syntactic features (e. g. Requested permissions, API calls, etc. ), and the perturbations can only be implemented by simply modifying application’s manifest. While recent Android malware detectors rely more on semantic features from Dalvik byte code rather than manifest, existing attacking/defending methods are no longer effective.Read More
In this paper, we introduce a new attacking method that generates adversarial examples of Android malware and evades being detected by the current models. To this end, we propose a method of applying optimal perturbations onto Android APK that can successfully deceive the machine learning detectors. We develop an automated tool to generate the adversarial examples without human intervention. In contrast to existing works, the adversarial examples crafted by our method can also deceive recent machine learning based detectors that rely on semantic features such as control flow graph. The perturbations can also be implemented directly onto APK’s Dalvik byte code rather than Android manifest to evade from recent detectors. We demonstrate our attack on two state-of-the-art Android malware detection schemes, MaMaDroid and Drebin. Our results show that the malware detection rates decreased from 96%to 0% in MaMaDroid, and from 97% to 0%in Drebin, with just a small number of codes to be inserted into the APK.
He growth of mobile applications and their users, security has increasingly become a great concern for various stakeholders. According to McAfee’s report , the number of mobile malware samples has increased to 22millions in third quarter of 2017. Symantec further reported that in Android platform, one in every five mobile applications is actually malware . Hence, it is not surprising that the demand for automated tools for detecting and analyzing mobile malware has also risen. Most of the researchers and practitioners in this area target Android platform, which dominants the mobile OS market. To date, there has been a growing body of research in malware detection for Android. Among all the proposed methods , machine learning based solutions have been increasingly adopted by anti malware companies  due to their ant obfuscation nature and their capability of detecting malware variants as well as zero day samples. Despite the benefits of machine learning based detectors, it has been revealed that such detectors are vulnerable to adversarial examples , . Such adversarial examples are crafted by adding carefully designed perturbations to the legitimate inputs that force machine learning models to output false predictions , , . Analogously, adversarial examples for machine learning based detection are very much like the HIV which progressively disables human beings’ immune system. We chose malware detection over Android platform to assess the feasibility of using adversarial examples as a core security problem. In contrast to the same issue in other areas such as image classification, the span of acceptable perturbations is greatly reduced: an image is represented by pixel values in the feature space and the adversary can modify the feature vector arbitrarily, as long as the modified image is visually indistinguishable ; however, in the context of crafting adversarial examples for Android malware, a successful case must comply with the following restrictions which are much more challenging than the image classification problem: 1) the perturbation must not jeopardize malware’s original functions, and 2) the perturbation to the feature space can be practically implemented in the Android Package (APK), meaning that the perturbation can be realized in the program code of an unpacked malware and can also be repacked/rebuilt into an APK. So far, there are already a few attempts on crafting/defending adversarial examples against machine learning based malware detection for Android platform. However, the validity of these works is usually questionable due to their impracticality. For example, Chen et al.  proposed to inject crafted adversarial examples into the training data set so as to reduce detection accuracy. This method is impractical because it is not easy for attackers to gain access to the training dataset in most use cases. Grosse et al.  explored the feasibility of crafting adversarial examples in Android platform, but their malware detecting classifier was limited to Deep Neural Network (DNN) only. They could not guarantee the success of adversarial examples against traditional machine learning detectors such as Random Forest (RF) and Support Vector Machine (SVM). Demotes et al.  proposed a theoretically sound learning algorithm to train linear classifiers with more evenly distributed feature weights. This allows one to improve system security without significantly affecting computational efficiency. Chen et al.  also developed an ensemble learning method against adversarial examples. Yang et al.  conducted new malware variants for malware detectors to test and strengthen their detection signatures/models. According to our research, all these ideas , ,  can only be applied to the malware detectors that adopt syntactic features (e. g. , permissions requested in the manifest or specific APIs in the source code , , , ). However, almost all recent machine learning based detection methods rely more on the semantic features collected from Dalvik byte code (i. e., classes. dex). This disables existing methods of crafting/defending adversarial examples in Android platform. Moreover, it is usually simple for existing methods to modify the manifest for the generation of adversarial examples. However, when the features are collected from the byte code, it becomes very challenging to modify the byte code without changing the original functionality due to their programmatic complexity. Therefore, existing works are not of much value in providing proactive solutions to the ever evolving adversarial examples in terms of Android malware variants , , ,, . In this paper, we propose and study a highly effective attack that generates adversarial malware examples in Android platform, which can evade being detected by current machine learning based detectors. In the real world, defenders and attackers are always engaged in a never-ending war. To increase the robustness of Android malware detectors against malware variants, we need to be proactive and take potential adversarial scenarios into account while designing malware detectors to achieve creating such a proactive design. The working this paper envisions an advanced method to craft Android malware adversarial examples. The results can be used for Android malware detectors to identify malware variants with the manipulated features. For the convenience of description, we selected two typical Android malware detectors, MaMaDroid  and Drebin . Each of these two selects semantic or syntactic features to model malware behaviors.
We summaries the key contributions of this paper from different angles of view as follows:
- Technically, we propose an innovative method of crafting adversarial examples on recent machine learning based detectors for Android malware (e. g. , Drebin and MaMaDroid). They mainly collected features (either syntactic or semantic ones) from Dalvik byte code to capture behaviors of Android malware. This contribution is distinguishable from the existing works , , ,  because can only target/protect the detectors relying on syntactic features.
- Practically, we designed an automated tool to apply the method to the real world malware samples. The tool calculates the perturbations, modifies source files, and rebuilds the modified APK. This is a key contribution the developed tool adds the perturbations directly to APK’s classes. dex. This is in contrast to the existing works (e. g. , , ) that simply apply perturbations in Android Manifest. xml. Although it is easy to implement, they cannot target/protect recent Android malware detectors (e. g. , , ) which do not extract features from Manifest.
- We evaluated the proposed manipulation methods of adversarial examples by using the same datasets that Drebin and MaMaDroid (5879malware samples) used, . Our results show that, the malware detection rates decreased from96%to0%in MaMaDroid, and from97%to0%in Drebin, with just a small distortion generated by our adversarial example manipulation method
Recent studies in adversarial machine learning and computer security have shown that, due to its weakness in battling against adversarial examples, machine learning could be potential weak point of a security system , , . This vulnerability may further result in the compromise of the overall security system. The underlying reason is that machine learning techniques are not originally designed to cope with intelligent and adaptive adversaries, who can manipulate input data to mislead the learning system. The goal of this work has been, more specifically, to show that adversarial examples can be very effective to Android malware detectors. To this end, we first introduced a DNN based substitute model to calculate optimal perturbations that also comply with the APK feature interdependence. We next developed an automated tool to implement the perturbations onto the source files (e. g. , small code) of a targeted malware sample. According to the evaluation results, the Android malware detection rates decreased from 96%to 0%in MaMaDroid (i. e. , a typical detector that uses semantic features). We also tested Drebin (i. e. , a typical detector that uses syntactic features but also collects some features from classes. dex). We found Drebin’s detection rates decreased from 97% to 0%. To the best of our knowledge, our work is the first one to overcome the challenge of targeting recent Android malware detectors, which mainly collect semantic features from APK’s ‘classes. dex’ rather than syntactic features from ‘Android Manifest. xml’. Our future work will focus on two areas: defense mechanisms against such attacks and attack modifications to cope with such mechanisms. For this paper, we only present in Section VID a brief discussion about the feasibility and effectiveness of an adversarial training and an ensemble learning defending method. In the next stage, we plan to continue their depth analysis of various defense mechanisms. We will also compare between the effectiveness of different substitute models’ architectures.