摘要
It is widely common that mobile applications collect non-critical personally identifiable information(PII)from users'devices to the cloud by application service providers(ASPs)in a positive manner to provide precise and recommending services.Meanwhile,Internet service providers(ISPs)or local network providers also have strong requirements to collect PIIs for finer-grained traffic control and security services.However,it is a challenge to locate PIIs accurately in the massive data of network traffic just like looking a needle in a haystack.In this paper,we address this challenge by presenting an efficient and light-weight approach,namely TPII,which can locate and track PIIs from the HTTP layer rebuilt from raw network traffics.This approach only collects three features from HTTP fields as users'behaviors and then establishes a tree-based decision model to dig PIIs efficiently and accurately.Without any priori knowledge,TPII can identify any types of PIIs from any mobile applications,which has a broad vision of applications.We evaluate the proposed approach on a real dataset collected from a campus network with more than 13k users.The experimental results show that the precision and recall of TPII are 91.72%and 94.51%respectively and a parallel implementation of TPII can achieve 213 million records digging and labelling within one hour,reaching near to support 1Gbps wirespeed inspection in practice.Our approach provides network service providers a practical way to collect PIIs for better services.
基金
supported by the National Natural Science Foundation of China(Grant Nos.61672101,U1636119.6186603S,61962059)
2018 College Students’Innovation and Entrepreneurship Training Program(D2018127)。
作者简介
Yi LIU is a PhD candidate of Computer Science of Beijing Institute of Technology,China.He received a MS degree from Xi-dian University,China in 2010.He is now working at the network information center of Yan’an University,China.His research interests include network information security,network traffic analysis,and privacy protection on network,E-mail:songtian@bit.edu.cn;Tian SONG is an associate professor of Computer Science of Beijing Institute of Technology,China.He obtained his PhD degree from Tsinghua University,China in 2008.His research interests include network content security,next generation internet,and computer architecture;Lejian LIAO is a professor in School of Computer Sciences,Beijing Institute of Technology,China.He got his PhD degree in 1994 and MS degree in 1988 respectively from Institute of Computing Technology,Chinese Academy of Sciences.China.His current academic interest includes Web intelligence,semantic computing,ontology engineering,and constraint-based technologies.He has published more than 100 academic papers as first author or co-author.