Upgrade to Pro — share decks privately, control downloads, hide ads and more …

REMOS CST (Research Seminar)

Faiz Zaki
October 27, 2021

REMOS CST (Research Seminar)

Research seminar presentation at the monthly REMOS session for Department of Computer System and Technology, Faculty of Computer Science and Information Technology, Universiti Malaya.

Faiz Zaki

October 27, 2021
Tweet

More Decks by Faiz Zaki

Other Decks in Technology

Transcript

  1. Muhammad Faiz bin Mohd Zaki – 17021637/3 Department of Computer

    System and Technology Faculty of Computer Science and Information Technology Universiti Malaya Supervisor: Associate Professor Dr. Nor Badrul Anuar bin Juma’at Co-Supervisor: Professor Dr. Abdullah Gani Research Seminar - 29 October 2021
  2. • According to Cisco, 66% of the global population would

    have Internet access by 2023, a 15% surge from 2018 (Cisco, 2020). • The same report also forecasted the number of networked devices to rise to 29.3 billion from 18.4 billion devices by 2023. • Consequences? • Network traffic volume gets larger. • Network gets busier. • Network management tasks become more complex. • (Partial) Solution? Network traffic classification.
  3. • A process to map network traffic (packets or flows)

    to applications that generated them. • Although seems trivial, network packets arrive without name tags! Figure shows a sample packet capture with almost unreadable contents.
  4. • Therefore, there is a need for novel techniques to

    classify the traffic. Figure shows a typical classification framework consisting a model capable of outputting various classification classes. • Pioneer works include the packet filtering firewalls in late 80s and differentiated services (DiffServ) in late 90s.
  5. • Throughout the decades, network traffic classification techniques have evolved

    from port-based to the more advanced deep learning. • At a high level, there are five techniques to classify network traffic: • Port • Signature • Behavioural • Statistical • Machine learning
  6. • Port-based techniques are considered obsolete as a stand-alone. •

    Signature-based techniques are still widely used in commercial products (e.g., DPI, Next-Gen Firewall). • Machine learning is the current state-of-the-art.
  7. • Various techniques also correspond to various classification classes. •

    There are four classification classes, with varying granularities: application protocol, type, name and service.
  8. • Lacks attention for granular network traffic classification, specifically at

    the most granular level: application service. Lacks attention!
  9. • This study digs deeper into the granularity hierarchy by

    dividing Application Service into inter-application and intra-application services. • Even less attention at the intra-application service level. Least attention!
  10. • This study aims to classify network traffic at all

    three levels of granular classification: application name, inter-application and intra-application services. • To achieve this aim, the study proposes an alternative method to granular network traffic classification through Granular Multi-label Network Traffic Classification (GIANT) framework. • GIANT framework: an end-to-end classification pipeline, including ground truth preparation and multi-label classifiers using chained adaptive random forest algorithm.
  11. • Prepares the raw input traffic for classification. • The

    most important ingredient for optimal classification is a high-quality ground truth. • Ground truth: a pre-labelled dataset that goes through a reliable labelling process, serving as the gold standard or benchmark to evaluate machine learning models. • This study proposes Grano-GT, a granular ground truth collection tool for encrypted browser-based Internet traffic.
  12. • Builds on four main engines: browser isolator, packet capture,

    application isolator and service isolator. Figure shows Grano-GT’s overall architecture.
  13. Engines Description Browser Isolator Isolates traffic from a target browser

    tab using Chrome Devtools Protocol, including a complete IP address log and packet arrival times Packet Capture Captures all traffic from a target browser tab using Tshark. Application Isolator Isolates application traffic (e.g., Facebook) using IP address log from Browser Isolator Engine. Service Isolator Extracts granular service traffic using string signatures and arrival times. • Table describes each engine briefly.
  14. • Extracts the most discriminative features based on payload length

    to classify traffic at all three granularity levels —Payload-length based features are non- intrusive and remove reliance on time-related features (e.g., inter-arrival time) that is affected by network conditions. • Computes statistical properties (e.g., average, standard deviation, variance) of payload length. • Computes moving statistics (e.g., moving averages) to capture the dynamics of a traffic flow. • Applies Pearson’s correlation for feature selection.
  15. • Initially extracts 52 statistical features based on payload length

    before reducing to 7 features as the base feature set. No. Feature Description 1 protocol Layer 4 protocol, i.e., TCP or UDP 2 max_avg_payload The maximum of average payload length in either direction, i.e. source to destination or vice versa 3 mss_count (mss_count _100) The count of packets in the first 100 having payload length equalling the maximum segment size 4 range (range_10) The range of payload length for the first ten packets, i.e. maximum – minimum 5 payload_first_stat (std_10) The standard deviation of payload length for the first ten packets 6 payload_mov_stat (ma_5) The five-packets moving average for payload length 7 ma_40_avg_5 The average of the first five entries of 40-packets moving average for payload length
  16. • Implements Adaptive Random Forest algorithm: a Random Forest variation

    adapted for data stream classification. • Proposes App-Classifier and Service-Classifier. • Combines the two classifiers using a modified classifier chain. • Original implementation: A series of binary classifiers 𝐶0 … 𝐶𝑛 where 𝑛 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 and feature space of 𝐶𝑖 is extended by including the output from 𝐶𝑖−1 ; 𝑖 > 0. • GIANT implementation: Only two Adaptive Random Forest Classifiers, where the Service Classifier extends its feature space to include the output from App-Classifier (extended feature set containing 8 features).
  17. • Output chaining allows the classifier to maintain context and

    label dependencies. Figure illustrates the classifier chain implementation.
  18. • Three output from two classifiers: • App-Classifier: Application name

    output • Service-Classifier: Inter- and intra-application service output • Output evaluation based on: • Public dataset (ISCX VPN-nonVPN) • Internal GIANT dataset. • Baseline classifiers Abbreviation Method Reference Bakh_CART k-means + CART (Bakhshi & Ghita, 2016) Shbr_RF Multi-level Random Forest (Shbair et al., 2016) Dong_KNN Multi-level k-NN (Dong et al., 2017) Flat_RF Random Forest Scikit-Learn implementation This Study Chained Random Forest -
  19. Application name Bytes accuracy Facebook 99.8% Twitter 97.2% Youtube 90.4%

    Netflix 98.5% Telegram 100% Web-Whatsapp 100% Reddit 70.5% • Ground truth acquired from Grano-GT evaluated using state-of-the-art DPI software, nDPI. nDPI is an open-source DPI software based on the well-known OpenDPI. Table shows the bytes accuracy when compared with nDPI. • Limitation on nDPI: outdated or unavailable signatures
  20. • Evaluation at the application service level using the Kolmogorov-Smirnov

    statistical test. Figures show the output for inter and intra-application services.
  21. • Feature correlation analysis using Pearson’s Correlation. Figures show the

    highly correlated initial feature set and the final base feature set.
  22. • The base feature set captures the differences between applications.

    Figure shows the moving average between application names.
  23. • Three evaluation modes: • Single training setup: Classifier is

    evaluated using a subset of data with a single training. • Prequential evaluation (Interleaved test-train): Classifier first attempts to classify data in batches (e.g., 1000, 10000 packets) before reusing the same batch to train. • Complete training setup: Classifier trains using conventional approach where a complete dataset is available • For simplicity, this research seminar presents the evaluation using only the complete training setup.
  24. Classifier Metric Application name Application service Bakh_CART Precision 0.82 Recall

    - 0.93 F-measure 0.87 Shbr_RF Precision 1.00 0.92 Recall 0.99 0.88 F-measure 0.99 0.90 Dong_KNN Precision 0.99 0.81 Recall 0.98 0.79 F-measure 0.99 0.80 Flat_RF Precision 0.85 Recall - 0.80 F-measure 0.82 Precision 1.00 0.89 GIANT Recall 0.99 0.88 F-measure 0.99 0.88
  25. Application Precision Recall F-measure Facebook 1.00 1.00 1.00 Web-Whatsapp 1.00

    1.00 1.00 Telegram 0.99 0.99 0.99 Lazada 1.00 1.00 1.00 Shopee 1.00 0.91 0.95 Twitter 1.00 1.00 1.00 Youtube 1.00 1.00 1.00 Medium 1.00 1.00 1.00 Reddit 1.00 1.00 1.00 Netflix 1.00 1.00 1.00 Macro Average 1.00 0.99 0.99
  26. Application services Precision Recall F-measure Facebook 0.85 0.83 0.83 Web-Whatsapp

    0.99 0.98 0.98 Telegram 0.96 0.95 0.95 Lazada 1.00 0.98 0.99 Shopee 0.77 0.73 0.75 Twitter 0.92 0.91 0.91 Youtube 0.79 0.79 0.79 Medium 0.82 0.78 0.79 Reddit 0.88 0.91 0.89 Netflix 1.00 1.00 1.00 Overall Macro Average 0.89 0.88 0.88
  27. Service Application Precision Recall F-measure video Facebook 1.00 1.00 1.00

    Youtube 1.00 1.00 1.00 Twitter 1.00 1.00 1.00 Web-Whatsapp 1.00 1.00 1.00 Telegram 1.00 1.00 1.00 Netflix 1.00 1.00 1.00 Macro Average 1.00 1.00 1.00 react Facebook 1.00 1.00 1.00 Youtube 1.00 0.22 0.36 Twitter 1.00 1.00 1.00 Shopee 1.00 1.00 1.00 Lazada 1.00 1.00 1.00 Medium 0.84 0.75 0.79 Reddit 0.83 0.89 0.86 Netflix 1.00 1.00 1.00 Macro Average 0.96 0.86 0.91 Overall Macro Average 0.98 0.92 0.93
  28. Classifier Metric Application name Application service Shbr_RF Precision 1.00 1.00

    Recall 0.99 1.00 F-measure 0.99 1.00 Flat_RF Precision 0.82 Recall - 0.85 F-measure 0.83 Precision 1.00 1.00 GIANT Recall 0.99 0.93 F-measure 0.99 0.94 • Evaluation on ISCX VPN-nonVPN public dataset against the best baseline classifier and a flat classifier.
  29. • Evaluates the App- and Service- Classifier in a streaming

    environment. • Utilizes Apache Kafka to manage streaming data for its efficiency and scalability. • Apache Kafka extracts network flow into separate partitions.
  30. 1. An updated classification taxonomy in the domain. 2. A

    framework for granular network traffic classification. 3. A reliable granular network traffic ground tool. 4. A robust feature set. 5. A context-aware multi-label incremental learner using classifier chain. Publications Zaki, F., Gani, A., Tahaei, H., Furnell, S., & Anuar, N. B. (2021). Grano-GT: A granular ground truth collection tool for encrypted browser-based Internet traffic. Computer Networks, 184, 107617. doi:https://doi.org/10.1016/j.comnet.2020.107617 Zaki, F., Gani, A., & Anuar, N. B. (2020). Applications and use Cases of Multilevel Granularity for Network Traffic Classification. Paper presented at the 2020 16th IEEE International Colloquium on Signal Processing & Its Applications (CSPA), Langkawi, Malaysia. Tahaei, H., Afifi, F., Asemi, A., Zaki, F., & Anuar, N. B. (2020). The rise of traffic classification in IoT networks: A survey. Journal of Network and Computer Applications, 154, 102538. doi:https://doi.org/10.1016/j.jnca.2020.102538 (Secondary contribution) Zaki, F., Afifi, F., Abd Razak, S., Gani, A., & Anuar, N.B. GRAIN: Granular multi-label encrypted traffic classification using classifier chain, Computer Networks (Under Review)
  31. 1. Traffic type coverage (current: browser-based only) 2. Application name

    and service coverage (current: 43 services across 10 applications) 3. Feature complexity for real-time classification.
  32. • Explored granular network traffic classification, particularly at the application

    name, inter- and intra-application service levels. • Proposed the GIANT framework, an end-to-end classification pipeline for granular network traffic classification. • Recorded comparable results with the best baseline classifier while utilizing lower feature complexity. • Evaluated in a simulated streaming environment and demonstrated marginal impact on latency and classification performance.