Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020Maintenance Synopsis for S1 Synopsis for Sr … Fast approximate answers … S1 S2 Sr Input Manager Scheduler QoS Monitor Load Shedder Query Execution Engine Qm Q2 Q1 Ad-hoc or continuous degradation! • Load shedding components rely on statistics gathered during execution: • A statistics manager module monitors processing and input rates and periodically estimates operator selectivities. cost, ci, in cycles per tuple, and a selectivity, si, to each operator i. • The statistics manager collects metrics and estimates those parameters either continuously or by running the system for0 码力 | 43 页 | 2.42 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020(SOSP ’13). • Fabian Hueske, and Vasiliki Kalavri. Stream Processing with Apache Flink. (O’Reilly Media ’19). Lecture references ??? Vasiliki Kalavri | Boston University 2020 54 • Re-ordering • Shivnath Data Stream Systems. SIGMOD 2003. • Donald Carney et. al. Operator Scheduling in a Data Stream Manager. VLDB 2003. • Load balancing and skew mitigation • Muhammad Anis Uddin Nasir et. al. The power0 码力 | 54 页 | 2.83 MB | 1 年前3
PyFlink 1.15 Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.3.2.1 O1: How to prepare Python Virtual Environment . . . . . . . . . . . . . . . . . . . 24 1.3.2.2 O2: How to add Python Files . . . following: 3 pyflink-docs, Release release-1.15 python3 --version Create a Python virtual environment Virtual environment gives you the ability to isolate the Python dependencies of different projects supported to use Python virtual environment in your PyFlink jobs, see PyFlink Dependency Management for more details. Create a virtual environment using virtualenv To create a virtual environment using virtualenv0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.3.2.1 O1: How to prepare Python Virtual Environment . . . . . . . . . . . . . . . . . . . 24 1.3.2.2 O2: How to add Python Files . . . following: 3 pyflink-docs, Release release-1.16 python3 --version Create a Python virtual environment Virtual environment gives you the ability to isolate the Python dependencies of different projects supported to use Python virtual environment in your PyFlink jobs, see PyFlink Dependency Management for more details. Create a virtual environment using virtualenv To create a virtual environment using virtualenv0 码力 | 36 页 | 266.80 KB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020from the following sources: • Martin Kleppmann. Designing data-intensive applications (O’Reilly Media) • Patrick Th. Eugster, Pascal A. Felber, Rachid Guerraoui, and Anne-Marie Kermarrec. The many0 码力 | 33 页 | 700.14 KB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flink/ 79 Summary 77 / 79 References ▶ M. Zaharia et al., “Spark: The Definitive Guide”, O’Reilly Media, 2018 - Chapters 20-23. ▶ M. Zaharia et al., “Discretized Streams: An Efficient and Fault-Tolerant0 码力 | 113 页 | 1.22 MB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020are a Windows user, you are advised to use Windows subsystem for Linux (WSL), Cygwin, or a Linux virtual machine to run Flink in a UNIX environment. • A Java 8.x installation. To develop Flink applications0 码力 | 34 页 | 2.53 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020database tables • Continuous queries on data streams • New streams (derived) are defined as virtual views in SQL • Semantics are equivalent to having an append-only table to which new tuples are0 码力 | 53 页 | 532.37 KB | 1 年前3
Apache Flink的过去、现在和未来Client Dispatcher Job Manager Task Manager Resource Manager Cluster Manager Task Manager 1. Submit job 2. Start job 3. Request slots 4. Allocate Container 5. Start Task Manager 6. Schedule Task YARN TopN 高效的 流式去重 完整的 批处理支持 批处理错误恢复(1) 批处理错误恢复(2) 批处理错误恢复(3) 批处理错误恢复(4) 批处理错误恢复(5) 插件化 Shuffle Manager 生态 Flink Hive Flink Zeppelin 中文社区 Flink 的现在 offline Real-time Batch Processing Continuous0 码力 | 33 页 | 3.36 MB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020minimizes error DS2 model properties 24 ??? Vasiliki Kalavri | Boston University 2020 25 Scaling Manager Scaling Policy Metrics Repository invoke re-scale job report metrics monitor pull metrics0 码力 | 93 页 | 2.42 MB | 1 年前3
共 11 条
- 1
- 2













