Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/23: Cardinality and frequency estimation ??? Vasiliki Kalavri | Boston University 2020 Counting distinct elements 2 University 2020 LogLog algorithm Input: stream S, array of m counters, hash fiction h Output: cardinality of S for j=0 to m-1 do: COUNT[j] = 0 for x in S do: i = h(x) j = getLeftBits(i, p) r = Boston University 2020 26 • Query approximation error • Error probability Guarantee: The estimation error for frequencies will not exceed with probability • A higher number of hash functions0 码力 | 69 页 | 630.01 KB | 1 年前3
Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020subgraph of G with fewer edges and the same set of vertices: . E(H) ⊆ E(G), V(H) = V(G) Distance estimation ??? Vasiliki Kalavri | Boston University 2020 48 A k-spanner is a graph synopsis that preserves0 码力 | 72 页 | 7.77 MB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020tuples from the dataset • Providing an estimate via a sample can be much more expensive than estimation via other methods: • Evaluating a query over a 5% sample of a dataset may take 5% of the time0 码力 | 74 页 | 1.06 MB | 1 年前3
共 3 条
- 1













