Hibridna metoda otkrivanja zlonamjernih programa

Gržinić, Toni

prikaz prve stranice dokumenta Hibridna metoda otkrivanja zlonamjernih programa

Download
PDF 1.8 MB

doctoral thesis

Hibridna metoda otkrivanja zlonamjernih programa

Varaždin: University of Zagreb, Faculty of Organization and Informatics Varaždin, 2017. urn:nbn:hr:211:664288

Gržinić, Toni

University of Zagreb
Faculty of Organization and Informatics

Institutional repository: Faculty of Organization and Informatics - Digital Repository

Cite this document

APA 6th Edition

Gržinić, T. (2017). Hibridna metoda otkrivanja zlonamjernih programa (Doctoral thesis). Varaždin: University of Zagreb, Faculty of Organization and Informatics. Retrieved from https://urn.nsk.hr/urn:nbn:hr:211:664288

MLA 8th Edition

Gržinić, Toni. "Hibridna metoda otkrivanja zlonamjernih programa." Doctoral thesis, University of Zagreb, Faculty of Organization and Informatics, 2017. https://urn.nsk.hr/urn:nbn:hr:211:664288

Chicago 17th Edition

Gržinić, Toni. "Hibridna metoda otkrivanja zlonamjernih programa." Doctoral thesis, University of Zagreb, Faculty of Organization and Informatics, 2017. https://urn.nsk.hr/urn:nbn:hr:211:664288

Harvard

Gržinić, T. (2017). 'Hibridna metoda otkrivanja zlonamjernih programa', Doctoral thesis, University of Zagreb, Faculty of Organization and Informatics, accessed 25 December 2024, https://urn.nsk.hr/urn:nbn:hr:211:664288

Vancouver

Gržinić T. Hibridna metoda otkrivanja zlonamjernih programa [Doctoral thesis]. Varaždin: University of Zagreb, Faculty of Organization and Informatics; 2017 [cited 2024 December 25] Available at: https://urn.nsk.hr/urn:nbn:hr:211:664288

IEEE

T. Gržinić, "Hibridna metoda otkrivanja zlonamjernih programa", Doctoral thesis, University of Zagreb, Faculty of Organization and Informatics, Varaždin, 2017. Available at: https://urn.nsk.hr/urn:nbn:hr:211:664288

Cite this item: https://urn.nsk.hr/urn:nbn:hr:211:664288

Metadata

Title	Hibridna metoda otkrivanja zlonamjernih programa
Title (english)	Hybrid method of detecting malware
Author	Toni Gržinić
Mentor	Mirko Čubrilo (mentor) MBZ: 135963
Mentor	Tonimir Kišasondi (komentor) MBZ: 304134
Committee member	Marin Golub (predsjednik povjerenstva) MBZ: 206824
Committee member	Željko Hutinski (član povjerenstva) MBZ: 85631
Committee member	Mirko Maleković (član povjerenstva) MBZ: 169522
Granter	University of Zagreb Faculty of Organization and Informatics Varaždin
Defense date and country	2017-12-13, Croatia
Scientific / art field, discipline and subdiscipline	SOCIAL SCIENCES Information and Communication Sciences
Universal decimal classification (UDC)	004 - Computer science and technology. Computing. Data processing
Abstract	Maliciozni ili zlonamjerni programi predstavljaju danas najvecu prijetnju poslovnim organizacijama diljem svijeta, a razvijaju se prvenstveno s ciljem krađe podataka te omogucuju krađu podataka i novaca od svojih žrtava. U svijetu gotovo da ne postoji poslovna organizacija ili kucni korisnik koji se nije susreo s malicioznim kodom, a u novije vrijeme brojnim štetama prouzrocenim zlonamjernim programima pribrajaju se i slucajevi iz Hrvatske. Zadnjih godina zlonamjerni programi poput ransomwarea kriptiraju podatke na racunalima tražeci otkupninu za vracanje podataka ili bankarskih trojanaca koji ubacujuci se u dobronamjerne programe presrecu korisnicku komunikaciju s e-bankarstvom te kradu sredstva s bankovnih racuna. Za razliku od ciljanih napada kod kojih je potrebno veliko ulaganje zbog otkrivanja nepoznatnih ranjivosti i planiranja provedbe samog napada, logika uobicajenih zlonamjernih programa je vrlo jednostavna - razvijeni su da mogu ucinkovito zaraziti što veci broj dostupnih racunala. Najpoznatija zaštita od zlonamjernih sadržaja u posljednjih dvadesetak godina su antivirusni alati. Njihov detekcijski mehanizam temelji se na potpisima (engl. signatures) i heuristikama. Analiticari zaposleni u sigurnosnim tvrtkama, analizom zlonamjernih programa dolaze do potpisa, odnosno obrazaca koji prepoznaju zlonamjerne programe. Slicno je i s heuristickim metodama, koje predstavljaju pravila kojima se otkrivaju opcenite varijante zlonamjernih programa. Problem kod trenutnih metoda detekcije je što uvelike ovise radu i pronicljivosti analiticara koji analiziraju prethodno nepoznate varijacije zlonamjernih programa. Takoer, razvojem i povecanjem broja varijacija razlicitih zlonamjernih programa povecavaju se i baze potpisa, koje zbog svog velikog rasta predstavljaju neodrživi model za prepoznavanje novih zlonamjernih programa. Cilj ovog istraživanja je izgradnja nove metode koja omogucuje prepoznavanje zlonamjernih programa bez korištenja potpisa. Nova metoda opisana u ovom istraživanju koristi staticku i dinamicku analizu programa, tj. dvije metode koje se meusobno nadopunjuju te daju razlicite poglede na funkcionalnosti i namjeru analiziranog programa. Nova, hibridna metoda, koristi tri osnovna klasifikatora CS, CD1 i CD2 za konacnu klasifikaciju prirode programa. Osnovni klasifikatori nauceni su na stratificiranom uzorku zlonamjernih programa TRAIN-1 koji se pojavio u razdoblju od 2011. - 2016. godine te na popularnim dobronamjernim programima. Cjeloviti skup programa iz kojeg je izuzet stratificirani uzorak prikupljen je s otvorenih izvora na internetu te su rucno dodani zlonamjerni programi koji se se koristili u APT (engl. advanced persistent threat) napadima. Stratificirani skup TRAIN-1 sadrži 2064 zlonamjernih programa te 980 benignih programa. Kako bi se testirala ucinkovitost razvijenog hibridnog klasifikatora prikupljen je i neovisni skup TEST-1 koji sadržava zlonamjerne programe koji su se pojavili tijekom provoenja istraživanja, a koje popularni antivirusni alati nisu bili u mogucnosti detektirati.Prilikom izbora adekvatnih metoda strojnog ucenja za osnovne klasifikatore evaluirane su sljedece metode strojnog ucenja: logisticka regresija, jednostavni Bayesov klasifikator, metoda potpornih vektora, stabla odlucivanja i metoda nasumicne šume. Metode strojnog ucenja usporeene su s ciljem utvrivanja standardnog algoritma koja postiže najvecu tocnost klasifikacije na danom skupu podataka. Klasifikator CS koristi znacajke dobivene statickom analizom Portable Executable datoteke, klasifikator CD1 koristi znacajke vezane za karakteristike poziva prema operacijskom sustavu dok klasifikator CD2 koristi podatke o redoslijedu poziva prema operacijskom sustavu. Prilikom izrade hibridne metode slijedile su se dobre prakse u dubinskoj analizi podataka pri cemu su transformirane znacajke osnovnih klasifikatora te je usporeena tocnost metoda strojnog ucenja korištenjem cjelovitih skupova znacajki s reduciranim skupovima znacajki dobivenim metodama rekurzivne eliminacije (RFE) i glavnih komponenata (PCA). Hibridni klasifikator CH koristi se za konacnu klasifikaciju vjerojatnosti pripadnosti pojedinoj klasi, tj. vjerojatnosti da je program zlonamjeran ili dobronamjeran, za svaki od osnovnih klasifikatora. Ovakvo slaganje klasifikatora omogucuje proširivanje same hibridne metode novim klasifikatorima koji se mogu koristiti za specificne namjene. Za razliku od slicnih istraživanja hibridna metoda ne kombinira znacajke staticke i dinamicke analize vec zasebne klasifikatore svake pojedine analize. Korištenjem metode slaganja klasifikatora u hibridnoj metodi omogucuju se bolji rezultati zbog korištenja meusobno razlicitih znacajki u osnovnim klasifikatorima te bolje prepoznavanje novih i prethodno nepoznatih varijacija zlonamjernih programa od trenutacno korištenih i referentnih metoda. Razvijeni hibridni klasifikator postiže bolje rezultate od pojedinacnih, osnovnih, klasifikatora CS, CD1 i CD2 zbog korištenja razlicitih znacajki i samih klasifikatora što u konacnici rezultira tocnijom odlukom. Sama pretpostavka vece tocnosti metode ansambla odnosno izbora gomile u odnosu pojedinacne odluke poznata je još od Galtonovog esejea Vox populi [30] i sama pretpostavka je provjerena u mnogim istraživanjima, ipak uspješnost samih ansambla u odnosu na pojedincne klasifikatore uvelike varira te u domeni detekcije zlonamjernih programa nije dovoljno istražena. Hibridna metoda je na neovisnom skupu postigla vecu tocnost od antivirusnih alata, 98% naspram 86%, te bolje rezultate od slicnih referentnih istraživanja. Također, skupovi znacajka koji se koriste u osnovnim klasifikatorima razlikuju se od slicnih istraživanja te su dodatno prošireni novim znacajkama. Primjerice kod statickih znacajki S, provjerava se integritet sadržanih adresa, tj. postoje li nelogicnosti prilikom poravnanja izvršne datoteke u memoriji te sumnjive funkcije koje zlonamjerni programi koriste za tehnike skrivanja prilikom njihove analize. Dinamicke znacajke D1 opisuju kategorije poziva prema operacijskom sustavu poput: omjera uspješnosti poziva, njihovo vremensko trajanje, statistike korištenih argumentima u pozivu i sl. Društvena opravdanost korištenja hibridne metode ocituje se u obradi vece kolicine sumnjivih programa i sadržaja koje omogucuje rasterecenje rada analiticara te rano prepoznavanje opasnih programa koje mogu uzrociti znatne štete u energetskim postrojenjima, bankarskoj industriji i slicnim sustavima posebne namjene.
Abstract (english)	Today malware is certainly the biggest threat to information security and business continuity of organizations around the world. The main reason why criminals develop malware is the ability to steal victims’ money, sensitive data, or just to cause damages to their victims. Probably worldwide there is no company or home user that has not been infected by some variant of malware. Lately many organizations have had significant financial losses due to infections with malicious programs where Croatian companies were not an exception. Most popular malware families include ransomware, which encrypts victims’ data demanding a ransom payment for recovering it, and banking trojans which intercept sensitive data in order to steal funds from bank accounts. Unlike advanced persistent threats, which are expensive both because they need zero day exploits and have to be carefully planned to efficiently infect their targets, generic malware is developed to infect commonly used operating systems. In the last twenty years computer systems are primarily protected against malicious content with the help of antivirus software. Antivirus software relies on the usage of signatures and heuristics to detect malware. While analyzing new malware samples malware analysts detect specific patterns also known as signatures, which are included in the signature database to enable detection of the specific malware by the antivirus. Similarly, heuristics represent a set of generic rules used to detect more efficiently various variations of similar malware. As we can see the process of detecting malware is costly because it heavily depends on human work done by malware analysts. Due to the mentioned cost of malware analysis and a exponential growth of new malware samples in the last years, signature databases slowly become an unsustainable model for detecting new malware. The main goal of this research is to develop a method that is able to detect malware, in the common format Portable Executable used on Microsoft Windows, and without using signatures. The proposed hybrid method complementary uses results from static and dynamic analysis, which give different insights into functionalities and behavior of the analyzed program. Cuckoo Sandbox an open source malware analysis sandbox is used dynamic analysis of collected executables. The proposed hybrid method uses three basic classifiers CS, CD1 and CD2 to classify programs, classifiers are individually trained on dataset TRAIN-1 that consists of malware samples from 2011 – 2016 stratified by their malware family and known benign programs such as executables from current Microsoft Windows versions and utility applications. The initial dataset, that included 19 877 malicivous programs, in the stratification process was reduced to dataset TRAIN-1 that included 2064 malicious programs and 980 benign programs. Various machine learning methods were compared and the method that yielded best accuracy on a given dataset was used as the basic classifier. The following machine learning methods were evaluated for choosing the best basic classifiers CS, CD1 and CD2: Logistic Regression, Naïve Bayes, Decision Tree C 4.5. Support Vector Machine and Random Forest. Classifier CS uses features extracted from the Portable Executable format, classifier CD1 uses features about specifics of systems calls intercepted during dynamic analysis, and CD2 uses the system calls sequence. Best practices were followed during the development of the hybrid method, for example data transformations and feature selection procedures were performed to identify appropriate feature subsets. Principal Component Analysis and Recursive Feature Elimination were used for dimensionality reduction and feature selection. The final hybrid classifier CH uses class probabilities of basic classifiers for classification, i.e. probabilities that a program is malware or benign of each basic classifier. The used method of stacking classifiers or stacking generalization enables extension of the hybrid method with new classifiers designed for special purposes, for example introducing new basic classifiers for detecting malware that targets specific banking systems or industry controlled systems. The presented hybrid classifier achieves better results than each individual basic classifier CS, CD1 or CD2. An independent dataset TEST-1 consisting of previously unseen malware was collected to test effectiveness of the hybrid classifier in real world scenarios. On the dataset TEST-1 the hybrid method achieved better accuracy results than state of the art antivirus tools, 98 % compared to 86 % (Kaspersky AV), and also achieved better results than similar benchmark research. Combining individual basic classifiers using stacking generalization presents a novel contribution of this research, if we compare it with similar previous research that primarily combines static and dynamic features and not classifiers. Also, features used in basic classifiers differ from similar research and are extended with novel features. For example static analysis features S include features for checking addresses integrity, the related procedure checks the presence of wrong addressing when a program is loaded in memory, as well as presence of suspicious system calls that are commonly used for thwarting the malware analysis process (e.g. anti-debug, anti-vm, packing and others). Dynamic analysis features D1 consist of system calls categories specifics like: successful system calls ratio, their duration, system calls arguments statistics and similar. Using stacking generalization for combining classifiers in the hybrid method enables yielding better accuracy because basic classifiers use diverse features for building the final model, also this confirms the hypothesis that a combination of basic classifiers produces better results than a single classifier, and because of that the proposed hybrid classifier can detect new malware variants more efficiently than currently used methods.
Keywords
Keywords (english)
Language	croatian
URN:NBN	urn:nbn:hr:211:664288
Study programme	Title: Postgraduate doctoral study in Information Science Study programme type: university Study level: postgraduate Academic / professional title: doktor/doktorica znanosti, područje društvenih znanosti, polje informacijske i komunikacijske znanosti (doktor/doktorica znanosti, područje društvenih znanosti, polje informacijske i komunikacijske znanosti)
Type of resource	Text
File origin	Born digital
Access conditions	Open access
Terms of use
Created on	2018-01-15 16:34:10

Search form