The Challenge of only One Flow Problem for Traffic Classification in Identity Obfuscation Environments
Journal
IEEE Access
Journal Volume
9
Pages
84110-84121
Date Issued
2021
Author(s)
Chen H.-Y
Abstract
As encrypted traffic grows, network flow classification has become a significant issue because of the impossibility to parse the payload in an encrypted packet. A possible packet sniffing location for organizations is an under control gateway between intranet and internet to inspect network traffic. However, when an intranet user uses an identity obfuscation protocol such as VPN or TOR, the packet IP and port would be rewritten to preserve user privacy. The same user's packet sniffed between a user and TOR entry node/VPN proxy always has the same 5-tuples (packets with the same source IP, destination IP, source port, destination port, and IP protocol defined as flow). Thus, we cannot rely on the 5-tuples rule to split traffic into flows. This challenge is called the 'only one flow problem' and poses an obstacle for flow classification. A previous solution uses timeout value to determine flow separation points to address this issue. However, the predefined static time threshold cannot fit all user habits, which leads to separation errors. To overcome timeout limitations, we propose a flexible method called AI-FlowDet by leveraging the scene change concept and a CNN model to find behavior change points of traffic based on learning data. AI-FlowDet can apply to the traffic of the identity obfuscation protocols. Next, we propose 294 size-based and direction-based features that can be used with AI-FlowDet to evaluate flow type classification performance. Every experiment leverages different machine learning algorithms. The results show that AI-FlowDet with the proposed features can achieve 98.5% weighted accuracy, which is increased by 12.6% versus the previous timeout method with baseline features. We proved that the proposed splitting methods for the only one flow problem and proposed features for flow type classification are effective based on the good results obtained for both the VPN and TOR datasets. ? 2013 IEEE.
Subjects
AI-FlowDet
flow classification
only one flow problem
TOR
VPN
Classification (of information)
Cryptography
Flow separation
Gateways (computer networks)
Learning algorithms
Machine learning
Privacy by design
Virtual private networks
Behavior change
Encrypted traffic
Flow classification
Flow separation point
Intranet users
Network traffic
Splitting method
Traffic classification
Internet protocols
Type
journal article