project management - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Processing and Analytics Vasiliki (Vasia) Kalavri  vkalavri@bu.edu Spring 2020 2/25: State Management Vasiliki Kalavri | Boston University 2020 Logic State <#Brexit, 520> <#WorldCup, 480> key of the current record so that all records with the same key access the same state State management in Apache Flink 5 Vasiliki Kalavri | Boston University 2020 Operator state Keyed state State state is stored, accessed, and maintained. State backends are responsible for: • local state management • checkpointing state to remote and persistent storage, e.g. a distributed filesystem or a database

0 码力 | 24 页 | 914.13 KB | 1 年前
3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Algorithms Architecture and design Scheduling and load management Scalability and elasticity Fault-tolerance and guarantees State management Operator semantics Window optimizations Filtering Assignment #3 contributes 20% 7 Vasiliki Kalavri | Boston University 2020 Grading Scheme (2) Final Project (50%): • A real-time monitoring and anomaly detection framework • To be implemented individually experts with decades of hands-on experience in building and using distributed systems and data management platforms • Have fun! 10 Vasiliki Kalavri | Boston University 2020 Important dates Deliverable

0 码力 | 34 页 | 2.53 MB | 1 年前
3
PyFlink 1.15 Documentation

isolate the Python dependencies of different projects by creating a separate environment for each project. It is a directory tree which contains its own Python executable files and the installed Python packages It’s supported to use Python virtual environment in your PyFlink jobs, see PyFlink Dependency Management for more details. Create a virtual environment using virtualenv To create a virtual environment Submitting PyFlink jobs for more details. 1.1.1.4 YARN Apache Hadoop YARN is a cluster resource management framework for managing the resources and scheduling jobs in a Hadoop cluster. It’s supported to

0 码力 | 36 页 | 266.77 KB | 1 年前
3
PyFlink 1.16 Documentation

isolate the Python dependencies of different projects by creating a separate environment for each project. It is a directory tree which contains its own Python executable files and the installed Python packages It’s supported to use Python virtual environment in your PyFlink jobs, see PyFlink Dependency Management for more details. Create a virtual environment using virtualenv To create a virtual environment Submitting PyFlink jobs for more details. 1.1.1.4 YARN Apache Hadoop YARN is a cluster resource management framework for managing the resources and scheduling jobs in a Hadoop cluster. It’s supported to

0 码力 | 36 页 | 266.80 KB | 1 年前
3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020

DBMS SDW DSMS Database Management System • ad-hoc queries, data manipulation tasks • insertions, updates, deletions of single row or groups of rows Data Stream Management System • continuous materialized view updates • pre-aggregated, pre-processed streams and historical data Data Management Approaches 4 storage analytics static data streaming data Vasiliki Kalavri | Boston University data State limited, in-memory partitioned, virtually unlimited, persisted to backends Load management shedding backpressure, elasticity Fault tolerance limited support, high availability full support

0 码力 | 45 页 | 1.22 MB | 1 年前
3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020

State is scoped to a single task • Each stateful task is responsible for processing and state management 31 ??? Vasiliki Kalavri | Boston University 2020 Pause-and-restart state migration • State State is scoped to a single task • Each stateful task is responsible for processing and state management 31 block channels and upstream operators ??? Vasiliki Kalavri | Boston University 2020 Pause-and-restart State is scoped to a single task • Each stateful task is responsible for processing and state management 31 snapshot snapshot block channels and upstream operators buffer incoming records

0 码力 | 93 页 | 2.42 MB | 1 年前
3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020

in a period during which a user was active 17 Vasiliki Kalavri | Boston University 2020 Flow Management Operators (I) • Join operators merge two streams by matching elements satisfying a condition blocking and must be defined over a window 18 Vasiliki Kalavri | Boston University 2020 Flow Management Operators (II) • Duplicate/Copy Operator replicates a stream, commonly to be used as input to Article 15 (June 2012). • Minos Garofalakis, Johannes Gehrke, and Rajeev Rastogi. Data Stream Management: Processing High-Speed Data Streams. Springer-Verlag, Berlin, Heidelberg. • David Maier, Jin

0 码力 | 53 页 | 532.37 KB | 1 年前
3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020

producer (back-pressure, flow control) 2 ??? Vasiliki Kalavri | Boston University 2020 Load management approaches 3 ! Load shedder (a) Load shedding (b) Back-pressure (c) Elasticity Selectively Credit-based flow control • This classic networking technique turns out to be very useful for load management in modern, highly-parallel stream processors and is implemented in Apache Flink. • Each task

0 码力 | 43 页 | 2.42 MB | 1 年前
3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020

optimizations • plan translation alternatives • Runtime optimizations • load management, scheduling, state management • Optimization semantics, correctness, profitability Topics covered in this lecture

0 码力 | 54 页 | 2.83 MB | 1 年前
3
Apache Flink的过去、现在和未来

P_2 S_0 S_1 Order Inventory Payment Shipping Flow-Control Async Call Auto Scale State Management Event Driven Flink 的未来 offline Real-time Batch Processing Continuous Processing & Streaming

0 码力 | 33 页 | 3.36 MB | 1 年前
3

共 13 条前往

页

分类

语言

格式

State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020

PyFlink 1.15 Documentation

PyFlink 1.16 Documentation

Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Apache Flink的过去、现在和未来