State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 2/25: State Management Vasiliki Kalavri | Boston University 2020 Logic State<#Brexit, 520> <#WorldCup, 480> <#StarWars, 300> <#Brexit> maintains state: • rolling aggregations • window contents • input offsets • machine learning models State in dataflow computations 2 Vasiliki Kalavri | Boston University 2020 • No explicit state primitives • Users define state using arbitrary types • The system is unaware of which parts of an operator constitute state Streaming state 3 • Explicit state primitives including state types and interfaces 0 码力 | 24 页 | 914.13 KB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020Elasticity policies and state migration ??? Vasiliki Kalavri | Boston University 2020 Streaming applications are long-running • Workload will change • Conditions might change • State is accumulated over terminate processes • Adjust dataflow channels and network connections • Re-partition and migrate state in a consistent manner • Block and unblock computations to ensure result correctness ??? Vasiliki Re-configuration requires state migration with correctness guarantees. ??? Vasiliki Kalavri | Boston University 2020 State migration 29 ??? Vasiliki Kalavri | Boston University 2020 State migration strategies0 码力 | 93 页 | 2.42 MB | 1 年前3
Scalable Stream Processing - Spark Streaming and FlinkawaitTermination() 42 / 79 State and DStream 43 / 79 What is State? ▶ Accumulate and aggregate the results from the start of the streaming job. ▶ Need to check the previous state of the RDD in order to supports stateful streams. 44 / 79 What is State? ▶ Accumulate and aggregate the results from the start of the streaming job. ▶ Need to check the previous state of the RDD in order to do something with range of keys in DStream. • The performance of these operation is proportional to the size of the state. ▶ mapWithState • It is executed only on set of keys that are available in the last micro batch0 码力 | 113 页 | 1.22 MB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020needs to • restart failed processes • restart the application and recover its state 2 Checkpointing guards the state from failures, but what about process failure? High-availability ??? Vasiliki as the application’s JAR file, into a remote persistent storage system • Zookeeper also holds state handles and checkpoint locations 5 JobManager failures ??? Vasiliki Kalavri | Boston University JobGraph, the JAR file, and the state handles of the last checkpoint from remote storage. 2. It requests processing slots. 3. It restarts the application and resets the state of all its tasks to the last0 码力 | 41 页 | 4.09 MB | 1 年前3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri | Boston University 2020 Logic State<#Brexit, 520> <#WorldCup, 480> <#StarWars, 300> Any non-trivial streaming computation maintains state: • rolling aggregations • window contents contents • input offsets • machine learning models State in dataflow computations 3 Vasiliki Kalavri | Boston University 2020 Logic State <#Brexit, 520> <#WorldCup, 480> <#StarWars, 300> maintains state: • rolling aggregations • window contents • input offsets • machine learning models State in dataflow computations 3 Vasiliki Kalavri | Boston University 2020 Logic State 0 码力 | 49 页 | 2.08 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020initialized local state. • ITERATE: update state based on new element and current state. • TERMINATE: produce the result. Note that it is allowed to define and maintain local tables as state. 36 Vasiliki myavg(Next Int): Real { TABLE state(tsum Int, cnt Int); INITIALIZE: { INSERT INTO state VALUES(Next, 1); } ITERATE: { UPDATE state SET tsum=tsum+Next cnt=cnt+1; } TERMINATE: { INSERT INTO RETURN SELECT tsum/cnt FROM state; } } Allocated just before INITIALIAZE and deallocated just after TERMINATE. 370 码力 | 53 页 | 532.37 KB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020operator execution • state, parallelism, selectivity • Dataflow optimizations • plan translation alternatives • Runtime optimizations • load management, scheduling, state management • Optimization University 2020 Logic State<#Brexit, 521> <#WorldCup, 480> <#StarWars, 300> <#Brexit> <#Brexit, 521> Stateful operators 5 • Stateful operators maintain state that reflect part of the history they have seen • windows, continuous aggregations, distinct… • State is commonly partitioned by key • State can be cleared based on watermarks or punctuations • window fires, post 0 码力 | 54 页 | 2.83 MB | 1 年前3
Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020mutate the graph state 11 ??? Vasiliki Kalavri | Boston University 2020 1. Load: read the graph from disk and partition it in memory 2. Compute: read and mutate the graph state 11 ??? Vasiliki from disk and partition it in memory 2. Compute: read and mutate the graph state 3. Store: write the final graph state back to disk 12 ??? Vasiliki Kalavri | Boston University 2020 13 • We express 4 3 2 5 6 7 8 ??? Vasiliki Kalavri | Boston University 2020 Batch Connected Components • State: the graph and a component ID per vertex • initially equal to vertex ID • Iterative step: For0 码力 | 72 页 | 7.77 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020the same t(A) value to form a new relation state. • as a sliding window with length k in which each subsequence of k tuples represents a relation state in the sequence. 26 Vasiliki Kalavri | Boston Model Vasiliki Kalavri | Boston University 2020 Dataflow Systems Distributed execution Partitioned state Exact results Out-of-order support Single-node execution Synopses and sketches Approximate results Acyclic Graphs (DAGs) • nodes are operators and edges are data channels • operators can accumulate state, have multiple inputs, express event- time custom window-based logic • some systems, like Timely0 码力 | 45 页 | 1.22 MB | 1 年前3
Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020• Each primary periodically checkpoints its state and sends it to the secondary 6 Ni primary secondary I1 O1 N’i update checkpoint send state ??? Vasiliki Kalavri | Boston University 2020 ingestion of all input streams 2. Wait for all in-flight data to be completely processed 3. Copy the state of each task to a remote, persistent storage 4. Wait until all tasks have finished their copies configuration is eventually captured A snapshot algorithm attempts to capture a coherent global state of a distributed system ??? Vasiliki Kalavri | Boston University 2020 Snapshotting Protocols p10 码力 | 81 页 | 13.18 MB | 1 年前3
共 20 条
- 1
- 2
相关搜索词













