Title:Approximate Analytic Queries over Compressed Time Series with Tight Deterministic Error Guarantees
Speaker: Chunbin Lin,American Amazon Research Institute
Time: 10：00-11：30，Novomber 4
Place: 1-310, FIT Building
Organizer: Research Institute of Information Technology (RIIT), Tsinghua University
Chunbin Lin is a R&D at Amazon Web Services (AWS) and he is working on Amazon Redshift data warehouse. He received his Ph.D. in computer science from the University of California, San Diego (UCSD) in 2018 and received his B.Sc and M.Sc from Renmin University of China in 2010 and 2013 respectively. His research interests are in the field of data management, including data integration, big data analytics, and approximate query processing. He has more than 20 journals and conference papers published in SIGMOD, VLDB, VLDB J, and TODS, etc.
We provide sound and tight deterministic error guarantees for approximate queries over compressed data. The query expressions are compositions of well-known linear algebra operators over vectors, along with arithmetic operators. Such queries can express most common statistics, including statistics (such as correlation and cross-correlation) that combine multiple time series. The queries are evaluated over classic time segment-based compressions of IoT time series. The error guarantees computations use common and straightforward precomputed error measures for each segment. Overall, the error guarantees are applicable (as an add-on) to most prior works on compression. In particular, the compression may use either fixed-length segmentation or (the far more effective in practice) variable-length segmentation. The estimation functions used to approximate the actual values may come from any function family.
This work identifies two broad estimation function families (namely, the already existing Vector Space family (VS) and the presently defined Linear Scalable Family (LSF)) that lead to theoretically and practically high-quality guarantees, even for queries that combine multiple time series. Well known function families (e.g., the polynomials) belong to LSF. The theoretical aspect of "high quality" is crisply captured by the Amplitude Independence (AI) property: An AI guarantee does not depend on the amplitude of the involved time series, even when we combine multiple time series. The experiments on four real-life datasets validated the importance of the Amplitude Independent (AI) error guarantees: When the novel AI guarantees were applicable, the guarantees could ensure that the approximate query results were very close (typically 1%) to the true results.