Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 20201/10th sample, S? Each of the s unique queries has a probability Ps = 1/10 to be selected: • an expected number of s/10 of those queries will be in S. ??? Vasiliki Kalavri | Boston University 2020 10 Kalavri | Boston University 2020 24 • A bit array of size n, where n is generally higher than the expected number of elements in the input • k independent and uniformly distributed hash functions, where The probability of false positives depends on the choice of k and n: • Let m be the number of expected elements: • If the allocated bits per element, n/m, is too small, the filter will fill up too0 码力 | 74 页 | 1.06 MB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020plans ordered by how much load shedding they will cause. • Each row contains a plan with • expected cycle savings • locations for drop operations • drop amounts • QoS effects (provided that0 码力 | 43 页 | 2.42 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020operations that process input streams and produce output streams. • Declarative languages specify the expected results of the computation rather than the execution flow. • Imperative languages are used to0 码力 | 53 页 | 532.37 KB | 1 年前3
Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020?? Vasiliki Kalavri | Boston University 2020 14 Combining estimates • Average won’t work: The expected value of 2R is too large. • Median won’t work: it is always a power of 2, thus, if the correct0 码力 | 69 页 | 630.01 KB | 1 年前3
共 4 条
- 1













