Title:Mining Maximal Clique Summary with Effective Sampling
Speaker: Dr Rui Zhou
Time: 10：00-11：30，Novomber 12
Place: 1-310, FIT Building
Organizer: Research Institute of Information Technology (RIIT), Tsinghua University
Dr Rui Zhou, IEEE Member. He serves as Deputy Course Director of Bachelor of Computer Science (Professional) at Swinburne University of Technology, Australia. His research interests include database, data mining, health informatics. He is mainly focusing on designing algorithms for data analytics problems including query, search, storage, index and mining on a variety of data types, such as graph data, XML data, text data, streaming data, trajectory data, web service data, blockchain data, etc. His works were published in reputed conferences and journals, such as VLDB, ICDE, EDBT, CIKM, TKDE, TSE. He has won two competitive Australian Research Council Discovery Projects and a Data61 industry-based project. He is currently serving as Assistant Editor-in-Chief of World Wide Web Journal.
Maximal clique enumeration (MCE) is a fundamental problem in graph theory and is used in many applications, such as social network analysis, bioinformatics, intelligent agent systems, cyber security. Most existing MCE algorithms focus on improving the efficiency rather than reducing the size of the output, which could consist of a large number of maximal cliques. In this talk, we will discuss how to report a summary of less overlapping maximal cliques. The problem was studied before, however, after examining the pioneer approach, we consider it still not satisfactory. To advance the research along this line, this work attempts to make two contributions: (a) we propose a more effective sampling strategy, which produces a much smaller summary but still ensures that the summary can somehow witness all the maximal cliques and the expectation of each maximal clique witnessed by the summary is above a predefined threshold; (b) to verify experimentally, we tested ten real benchmark datasets that have a variety of graph characteristics. The results show that our new sampling strategy consistently outperforms the state-of-the-art method by producing smaller summaries and running faster on all the datasets.