Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020
University 2020 • Flink requires a sufficient number of processing slots in order to execute all tasks of an application. • The JobManager cannot restart the application until enough slots become available JobManager failures ??? Vasiliki Kalavri | Boston University 2020 When the JobManager fails all tasks are automatically cancelled. The new JobManager performs the following steps: 1. It requests It requests processing slots. 3. It restarts the application and resets the state of all its tasks to the last completed checkpoint. Highly available Flink setup ??? Vasiliki Kalavri | Boston University0 码力 | 41 页 | 4.09 MB | 1 年前3Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020
TaskManager can execute several tasks at the same time. • It is statically configured with a certain number of processing slots that defines the maximum number of concurrent tasks it can execute. • A processing for each receiving task that any of its tasks need to send data to. Batching in Apache Flink • The TaskManagers ship data from sending tasks to receiving tasks. • The network component of a TaskManager0 码力 | 54 页 | 2.83 MB | 1 年前3PyFlink 1.15 Documentation
apache.flink.streaming.runtime.tasks.RegularOperatorChain. ˓→initializeStateAndOpenOperators(RegularOperatorChain.java:110) at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask ˓→711) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1. ˓→call(StreamTaskActionExecutor.java:55) at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask. ˓→java:687) at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654) at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java: ˓→958) at org.apache0 码力 | 36 页 | 266.77 KB | 1 年前3PyFlink 1.16 Documentation
apache.flink.streaming.runtime.tasks.RegularOperatorChain. ˓→initializeStateAndOpenOperators(RegularOperatorChain.java:110) at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask ˓→711) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1. ˓→call(StreamTaskActionExecutor.java:55) at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask. ˓→java:687) at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654) at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java: ˓→958) at org.apache0 码力 | 36 页 | 266.80 KB | 1 年前3Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020
producer slows down according to the rate the consumer recycles buffers. Remote exchange: If tasks run on different worker nodes, the buffer can be recycled as soon as it is on the TCP channel. • The maximum throughput is limited by the processing rate of the slowest task. • Parallel tasks are connected via virtual channels multiplexed over TCP connections: • In the presence of skew 29 Remarks on CFC • Bakcpressure is inflicted on pairs of communicating tasks only • it does not interfere with other tasks sharing the same TCP connection. • CFC maximizes network utilization and0 码力 | 43 页 | 2.42 MB | 1 年前3Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020
watermark captures the progress of upstream stages • minimum of output watermarks of all upstream tasks • The output watermark captures the progress of the stage itself • minimum of input watermarks 1. Watermarks must be monotonically increasing in order to ensure that the event time clocks of tasks are progressing and not going backwards. 2. A watermark with a timestamp T indicates that all subsequent0 码力 | 22 页 | 2.22 MB | 1 年前3Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020
or shared subscription • A logical producer/consumer can be implemented by multiple physical tasks running in parallel • Ιf a producer generates events with high rate, we can balance the load by Publishers and Subscribers are applications. 23 Use-cases • Balancing workloads in network clusters • tasks can be efficiently distributed among multiple workers, such as Google Compute Engine instances.0 码力 | 33 页 | 700.14 KB | 1 年前3State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020
the same parallel task have access to the same state • It cannot be accessed by other parallel tasks of the same or different operators Keyed state is scoped to a key defined in the operator’s input one state instance. • The keyed state instances of a function are distributed across all parallel tasks of the function’s operator. Keyed state can only be used by functions that are applied on a KeyedStream:0 码力 | 24 页 | 914.13 KB | 1 年前3Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020
completely processed 3. Copy the state of each task to a remote, persistent storage 4. Wait until all tasks have finished their copies 5. Resume processing and stream ingestion 12 ??? Vasiliki Kalavri | post-snapshot events (order maintained by FIFO channels) Termination is satisfied if initiator can reach all tasks (possible in DAGs via multiple initiators, e.g., sources.) p1 p2 p3 p4 p5 p6 p7 p7 p5 p6 p1 p1 p2 p3 p4 34 ??? Vasiliki Kalavri | Boston University 2020 • Assumptions: • DAG of tasks • Epoch change events triggered on each source task (⟨ep1⟩,⟨ep2⟩,…) • Issued by a coordinator or generated0 码力 | 81 页 | 13.18 MB | 1 年前3Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020
patterns • Raise alerts for abnormal system metrics • Detect invariant violations • Identify outlier tasks Inspired by this paper : “SAQL: A Stream-based Query System for Real- Time Abnormal System Behavior0 码力 | 34 页 | 2.53 MB | 1 年前3
共 11 条
- 1
- 2