High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020local buffer and possibly update state 3. produce output What can go wrong: • lost events • duplicate or lost state updates • wrong result 5 mi mo Was mi fully processed? Was mo delivered downstream = Oe • Rollback recovery allows duplicate tuples downstream: • repeating: duplicate tuples are identical to those produced by the primary • convergent: duplicate tuples are different but eliminating eliminating them leads to output identical to an output without failure • divergent: duplicate tuples are different and eliminating them produces different output 9 Vasiliki Kalavri | Boston University 20200 码力 | 49 页 | 2.08 MB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020Filter out all URLs that contain malware? • Filter out all compromised passwords? • Remove duplicate tuples on recovery when using upstream backup? The membership problem ??? Vasiliki Kalavri | Filter out all URLs that contain malware? • Filter out all compromised passwords? • Remove duplicate tuples on recovery when using upstream backup? The membership problem A hash table requires O(logn)0 码力 | 74 页 | 1.06 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020over a window 18 Vasiliki Kalavri | Boston University 2020 Flow Management Operators (II) • Duplicate/Copy Operator replicates a stream, commonly to be used as input to multiple downstream operators0 码力 | 53 页 | 532.37 KB | 1 年前3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020fault-tolerant publish-subscribe messaging system and serves as the ingestion, storage, and messaging layer for large production streaming pipelines. Kafka is commonly deployed on a cluster of one or more0 码力 | 26 页 | 3.33 MB | 1 年前3
Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 20202020 System model: • No failures during snapshotting • FIFO reliable channels: no lost or duplicate messages • Strongly connected execution graph: each process can reach every other process in0 码力 | 81 页 | 13.18 MB | 1 年前3
共 5 条
- 1













