Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/02: Elasticity policies and state migration ??? Vasiliki Kalavri | Boston University 2020 Streaming applications are long-running • Workload requires state migration with correctness guarantees. ??? Vasiliki Kalavri | Boston University 2020 State migration 29 ??? Vasiliki Kalavri | Boston University 2020 State migration strategies • Stop-and-restart during migration if the state is large • Progressive • move state to be migrated in smaller pieces, e.g. key-by-key • can be used to interleave state transfer with processing • migration duration0 码力 | 93 页 | 2.42 MB | 1 年前3Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020
reconfiguration mechanism often relies on fault-tolerance mechanism • State re-partitioning and migration • minimize communication • keep duration short • minimize performance disruption, e.g. latency • Partitioning function performance • space required to implement routing • lookup cost • Migration performance • re-assignment computation cost • state movement cost 25 State redistribution • Evenly distributes keys across parallel tasks • Fast to compute, no routing state • High migration cost • When a new node is added, state is shuffled across existing and new nodes • Random I/O0 码力 | 41 页 | 4.09 MB | 1 年前3Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Challenges: computation progress, fault-tolerance and result guarantees, automatic scaling and state migration, out-of-order processing 37 Vasiliki Kalavri | Boston University 2020 • No particular basic stream0 码力 | 45 页 | 1.22 MB | 1 年前3Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020
• Ensure security constraints: what are the trusted hosts for each operator? • Ensure state migration: if placement is dynamic and the operator is stateful, its state must be moved in a consistent0 码力 | 54 页 | 2.83 MB | 1 年前3PyFlink 1.15 Documentation
QuickStart: DataStream API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.2 User Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 execute() [8]:1.2 User Guide 1.2.1 RealTime Feature 1.2.1.1 Coming Soon. 1.2.2 PyFlink + Flink ML 1.2.2.1 Coming Soon. 1.3 0 码力 | 36 页 | 266.77 KB | 1 年前3PyFlink 1.16 Documentation
QuickStart: DataStream API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.2 User Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 execute() [8]:1.2 User Guide 1.2.1 RealTime Feature 1.2.1.1 Coming Soon. 1.2.2 PyFlink + Flink ML 1.2.2.1 Coming Soon. 1.3 0 码力 | 36 页 | 266.80 KB | 1 年前3Scalable Stream Processing - Spark Streaming and Flink
Asynchronous barriers 76 / 79 Summary 77 / 79 References ▶ M. Zaharia et al., “Spark: The Definitive Guide”, O’Reilly Media, 2018 - Chapters 20-23. ▶ M. Zaharia et al., “Discretized Streams: An Efficient0 码力 | 113 页 | 1.22 MB | 1 年前3
共 7 条
- 1