Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020automatic if there is a ResourceManager, e.g. in a YARN setup • A manual TaskManager re-start or a backup is required in standalone mode • The restart strategy determines how often the JobManager tries University 2020 • State is mapped into key-groups • Key-groups are mapped to subtasks as ranges • On restore, reads are sequential within each key-group, and often across multiple key-groups • The metadata0 码力 | 41 页 | 4.09 MB | 1 年前3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020recovery (at-least-once) • It avoids information loss • The output may contain duplicates • A backup needs to rebuild state of the failed node 8 Vasiliki Kalavri | Boston University 2020 Recovery output may contain duplicates • A backup needs to rebuild state of the failed node • Gap recovery (at-most-once) • It drops data during failure • The backup starts from most recent information Can you see any disadvantage in this approach? Vasiliki Kalavri | Boston University 2020 Upstream Backup Upstream nodes act as backups for their downstream operators by logging tuples in their output0 码力 | 49 页 | 2.08 MB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flinkunique IDs. • Operators send acks when a record has been processed. • Records are dropped from the backup when the have been fully acknowledged. ▶ Fault tolerance in Flink • More coarse-grained approach unique IDs. • Operators send acks when a record has been processed. • Records are dropped from the backup when the have been fully acknowledged. ▶ Fault tolerance in Flink • More coarse-grained approach unique IDs. • Operators send acks when a record has been processed. • Records are dropped from the backup when the have been fully acknowledged. ▶ Fault tolerance in Flink • More coarse-grained approach0 码力 | 113 页 | 1.22 MB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020Filter out all compromised passwords? • Remove duplicate tuples on recovery when using upstream backup? The membership problem ??? Vasiliki Kalavri | Boston University 2020 22 What data structure Filter out all compromised passwords? • Remove duplicate tuples on recovery when using upstream backup? The membership problem A hash table requires O(logn) bits per element which might still be0 码力 | 74 页 | 1.06 MB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020checkpoint it, restore it, re- scale it Unmanaged Managed What are the advantages and disadvantages of each approach? Vasiliki Kalavri | Boston University 2020 • Copy, checkpoint, restore, merge, split0 码力 | 24 页 | 914.13 KB | 1 年前3
PyFlink 1.15 DocumentationrestoreInternal(StreamTask. ˓→java:687) at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654) at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 DocumentationrestoreInternal(StreamTask. ˓→java:687) at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654) at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task0 码力 | 36 页 | 266.80 KB | 1 年前3
Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 20202020 Fault-tolerance approaches recap 3 Vasiliki Kalavri | Boston University 2020 Upstream Backup Upstream nodes act as backups for their downstream operators by logging tuples in their output0 码力 | 81 页 | 13.18 MB | 1 年前3
共 8 条
- 1













