Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020least r 0s among k elements is (1 − 2−r)k We know that (1 − ϵ)1/ϵ = 1/e For ϵ = 2−r → (1 − 2−r)k = e−k2−r 8 ??? Vasiliki Kalavri | Boston University 2020 The probability of not seeing a tail with at elements is (1 − 2−r)k We know that (1 − ϵ)1/ϵ = 1/e For ϵ = 2−r → (1 − 2−r)k = e−k2−r • If k ≫ 2r : k 2r → 0 and e−k2−r → 1 8 ??? Vasiliki Kalavri | Boston University 2020 The probability of not seeing We know that (1 − ϵ)1/ϵ = 1/e For ϵ = 2−r → (1 − 2−r)k = e−k2−r • If k ≫ 2r : k 2r → 0 and e−k2−r → 1 • If k ≪ 2r : k 2r → ∞ and e−k2−r → 0 8 ??? Vasiliki Kalavri | Boston University 2020 The0 码力 | 69 页 | 630.01 KB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020Emit(key, AsString(result)); MapReduce combiners example: URL access frequency (k2, list(v2)) → list(v2) (k1, v1) → list(k2, v2) map() reduce() 25 ??? Vasiliki Kalavri | Boston University 2020 MapReduce0 码力 | 54 页 | 2.83 MB | 1 年前3
共 2 条
- 1













