Learning on User Behavior for Novel Worm Detection

  • Published on
    19-Dec-2015

  • View
    212

  • Download
    0

Embed Size (px)

Transcript

<ul><li> Slide 1 </li> <li> Learning on User Behavior for Novel Worm Detection </li> <li> Slide 2 </li> <li> Steve Martin, Anil Sewani, Blaine Nelson, Karl Chen, and Anthony Joseph {steve0, anil, nelsonb, quarl, adj}@cs.berkeley.edu University of California at Berkeley </li> <li> Slide 3 </li> <li> The Problem: Email Worms (source: http://www.sophos.com) Email worms cause billions of dollars of damage yearly. Nearly all of the most virulent worms of 2004 spread by email: </li> <li> Slide 4 </li> <li> Current Solutions Signature-based methods are effective against known worms only. 25 new Windows viruses a day released during 2004! Human element slows reaction times. Signature generation can take hours to days. Signature acquisition and application can take hours to never. Signature methods are mired in an arms race. MyDoom.m and Netsky.b got through EECS mail scanners </li> <li> Slide 5 </li> <li> Statistical Approaches Unsupervised learning on network behavior. Leverage behavioral invariant: a worm seeks to propagate itself over a network. Previous work: novelty detection by itself is not enough. Many false negatives = worm attack will succeed. Many false positives = irritated network admins. Common solution: make the novelty detector model very sensitive. Tradeoff: Introduces additional false positives. Can render a detection system useless. </li> <li> Slide 6 </li> <li> Our Approach Use two-layer approach to filter novelty detector results. Novelty detector minimizes false negatives. Secondary classifier filters out false positives. Leverage human reactions and existing methods to improve secondary classifier. Use supervisor feedback to partially label data corpus Correct and retrain as signatures become available Filter novelty detection results with per-user classifier trained on semi-supervised data. </li> <li> Slide 7 </li> <li> Per-User Detection Pipeline </li> <li> Slide 8 </li> <li> Pipeline Details Both per-email and per-user features used. User features capture elements of behavior over a window of time. Email features examine individual snapshots of behavior. Any novelty detector can be inserted. These results use a Support Vector Machine. One SVM is trained on all users normal email. Parametric classifier leverages distinct feature distributions via a generative graphical model. A separate model is fit for each user. Classifier retrains over semi-supervised data. </li> <li> Slide 9 </li> <li> System Deployment </li> <li> Slide 10 </li> <li> Using Feedback Use existing virus scanners to update corpus. For each email within last d days: If the scanner returns virus, we label virus If the scanner returns clean, we leave the current label. Outside prev. d days, scanner labels directly. Threshold number of emails classified as virus to detect user infection. Machine is quarantined, infected emails queued. If infection confirmed, i random messages from queue are labeled by the supervisor. Model is retrained Labels retained until virus scanner corrects them. </li> <li> Slide 11 </li> <li> Feedback Utilization Process </li> <li> Slide 12 </li> <li> Evaluation Examined feature distributions on real email. Live study with augmented mail server and 20 users. Used Enron data set for further evaluation. Collected virus data for six email worms using virtual machines and real address book. BubbleBoy, MyDoom.u, MyDoom.m, Netsky.d, Sobig.f, Bagle.f Constructed training/test sets of real email traffic artificially infected with viruses. Infections interleaved while preserving intervals between worm emails. </li> <li> Slide 13 </li> <li> Results I Average Accuracy: 79.45% Training Set: 1000 infected emails from 5 different worms, 400 clean emails Test set: 200 infected emails, 1200 clean emails Table 1. Results using only SVM Virus NameFalse PositivesFalse NegativesAccuracy BubbleBoy23.56%1.01%79.64% Bagle.F23.90%0.00%79.50% Netsky.D24.06%0.00%79.36% Mydoom.U23.98%0.00%79.43% Mydoom.M23.61%0.00%79.71% Sobig.F24.14%1.51%79.07% </li> <li> Slide 14 </li> <li> Results II Average Accuracy: 99.69% Training Set: 1000 infected emails from 5 different worms, 400 clean emails Test set: 200 infected emails, 1200 clean emails Table 2. Results using SVM and Semi-Sup Classifier Virus NameFalse PositivesFalse NegativesAccuracy BubbleBoy0.00%1.51%99.79% Bagle.F0.00%2.01%99.71% Netsky.D0.00%2.01%99.71% Mydoom.U0.00%2.01%99.64% Mydoom.M0.00%2.03%99.64% Sobig.F0.00%2.01%99.64% </li> </ul>