Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 20202020 • To recover from failures, the system needs to • restart failed processes • restart the application and recover its state 2 Checkpointing guards the state from failures, but what about process tries to restart the application and how long it waits between restart attempts. 4 TaskManager failures ??? Vasiliki Kalavri | Boston University 2020 • The JobManager is a single point of failure Flink persistent storage system • Zookeeper also holds state handles and checkpoint locations 5 JobManager failures ??? Vasiliki Kalavri | Boston University 2020 When the JobManager fails all tasks are automatically0 码力 | 41 页 | 4.09 MB | 1 年前3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 4 Distributed streaming systems will fail • how can we guard state against failures and guarantee correct results after recovery? • how can we ensure minimal downtime and fast order-preserving, reliable message transport, e.g. TCP. • Failures are single-node and fail- stop, i.e. no network partitions or multiple simultaneous failures are considered. • The secondary node uses keep- keep- alive requests to detect primary failures. 7 Vasiliki Kalavri | Boston University 2020 Recovery types 8 Vasiliki Kalavri | Boston University 2020 Recovery types • Precise recovery (exactly-once)0 码力 | 49 页 | 2.08 MB | 1 年前3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020stored in the log. • For a topic with replication factor N, we will tolerate up to N-1 server failures without losing any records committed to the log. Vasiliki Kalavri | Boston University 2020 Resources0 码力 | 26 页 | 3.33 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020recovery) 5. Combine batch (historical) and stream processing 6. Ensure availability despite failures 7. Support distribution and automatic elasticity 8. Offer low-latency 7 2005 Vasiliki Kalavri0 码力 | 45 页 | 1.22 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020in memory • Use Spark's RDDs instead of replication • Parallel recovery mechanism in case of failures 44 input stream time-based micro-batches D-Streams • During an interval, input data received0 码力 | 54 页 | 2.83 MB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020brokers Where do stream processors read data from? 2 Challenges • can be distributed • out-of-sync sources may produce out-of-order streams • can be connected to the network • latency and unpredictable0 码力 | 33 页 | 700.14 KB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flinkevery 5 minutes. // streaming DataFrame of schema {time: Timestamp, word: String} val calls = ... val actionHours = calls.groupBy(col("action"), window(col("time"), "1 hour", "5 minutes")) 64 / 79 Late Data0 码力 | 113 页 | 1.22 MB | 1 年前3
监控Apache Flink应用程序(入门)message queue, before it is processed by Apache Flink, which then writes the results to a database or calls a downstream system. In such a pipeline, latency can be introduced at each stage and for various0 码力 | 23 页 | 148.62 KB | 1 年前3
Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020whereas if two 0s is the maximum we’ve seen, that indicates 4 distinct elements, … It takes 2r hash calls before we encounter a result with r 0s. 6 ??? Vasiliki Kalavri | Boston University 2020 Is this0 码力 | 69 页 | 630.01 KB | 1 年前3
Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020p1 p2 p3 m m’ C’ A B ??? Vasiliki Kalavri | Boston University 2020 System model: • No failures during snapshotting • FIFO reliable channels: no lost or duplicate messages • Strongly connected0 码力 | 81 页 | 13.18 MB | 1 年前3
共 10 条
- 1
相关搜索词













