报告题目:Automatic Network Traffic Classification
报告日期及时间:2015年11月12日上午9点30分
报告地点: E406
报告人:王宇 博士
报告人单位:澳大利亚迪肯大学
报告人简介:王宇, 2013年在澳大利亚迪肯大学网络安全与计算实验室获得计算机科学博士,目前留校从事博士后研究工作。主要的研究领域包括网络流量建模与分类、社交网络安全、网络和系统安全、机器学习等方面。王宇博士曾在《IEEE Transactions on Parallel and Distributed Systems》等国际顶级期刊上发表多篇学术论文。
报告摘要:Network traffic classification is the process of associating network traffic flows with their underlying network protocols or applications, which is a fundamental technique of broad interest. The classification decisions can be made based on a variety of information carried in the network traffic, such as the port number fields in packet headers, the application-layer payload content, and the statistical properties of the traffic flows. Nonetheless, the state of the art approaches all rely on some sort of a priori knowledge, such as the well-known and registered port list, protocol specifications, protocol signatures, and pre-labelled training data sets. Therefore, labour-intensive pre-processing is required and the ability to deal with previously unknown applications is limited.
In this talk, we will review some of our recent research towards automatic network traffic classification. First, we will look at unsupervised learning (i.e. clustering), which is a useful and important tool in practice, where the training data usually come without class labels and unknown patterns are always emerging. Although previous studies have reported promising results of applying some classic clustering algorithms such as K-Means and EM for the task, the quality of resultant traffic clusters was far from satisfactory. To address the problem, we have proposed a constrained traffic clustering scheme that makes decisions with consideration of some background information in addition to the observed traffic statistics. Specifically, we make use of equivalence set constraints indicating that particular sets of flows are using the same application layer protocols, which can be efficiently inferred from packet headers according to the background knowledge of TCP/IP networking. We model the observed data and the constraints using Gaussian mixture density and adapt an approximate algorithm for the maximum likelihood estimation of model parameters. Next we will cover another work that proposes to make use of unlabelled background data in the process of supervised learning, with the purpose to enhance the statistics-based traffic classifiers' ability to distinguish novel traffic patterns that are unknown during the time of training.
邀请人: 何德彪 副教授