Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 11 ??? Vasiliki Kalavri | Boston University 2020 Freeze the world Naive algorithm 1. Pause the ingestion of all input streams 2. Wait for all in-flight data to be completely processed ingestion 12 ??? Vasiliki Kalavri | Boston University 2020 –Leslie Lamport The distributed snapshot algorithm described here came about when I visited Chandy, who was then at the University of Texas in Austin Obtain a valid system configuration A full system configuration is eventually captured A snapshot algorithm attempts to capture a coherent global state of a distributed system ??? Vasiliki Kalavri | Boston0 码力 | 81 页 | 13.18 MB | 1 年前3
Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020• x6=1, h5(1) = 11010 1 1 3 2 ??? Vasiliki Kalavri | Boston University 2020 LogLog algorithm Input: stream S, array of m counters, hash fiction h Output: cardinality of S for j=0 to m-1 do: 2−COUNT[j]) ??? Vasiliki Kalavri | Boston University 2020 15 The standard error of the LogLog algorithm is inversely related to the number of counters m: Standard error δ ≈ 1.3 m For m = 256, the Flajolet, Philippe, et al. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. 2007. https://hal.archives-ouvertes.fr/file/index/docid/406166/ filename/FlFuGaMe07.pdf • Cormode0 码力 | 69 页 | 630.01 KB | 1 年前3
Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020we remove infrequent elements. 6 ??? Vasiliki Kalavri | Boston University 2020 Lossy counting algorithm D = {} // empty list wcur = 1 // first window id N = 0 // elements seen so far Insert step 3 3 3 3 1 2 0 1 1 3 5 input stream ε=0.2 w1 w4 w3 w2 1 5 1 f1 ε1 3 4 2 f3 ε3 When the algorithm terminates D contains an item x if its actual frequency is fx > ε*N Worst case: O( 1 ε *0 码力 | 31 页 | 1.47 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020Algebraic re-orderings ??? Vasiliki Kalavri | Boston University 2020 20 Safety • Ensure same algorithm: the redundant operators must perform an equivalent computation • Ensure mergeable state: even0 码力 | 54 页 | 2.83 MB | 1 年前3
共 4 条
- 1













