Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020keyed state are scaled by repartitioning keys • Operators with operator list state are scaled by redistributing the list entries. • Operators with operator broadcast state are scaled up by copying the The number of key groups limits the maximum number of parallel tasks to which keyed state can be scaled. • Trade-off between flexibility in rescaling and the maximum overhead involved in indexing and0 码力 | 41 页 | 4.09 MB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020throughput matches the data input rate • In the case of known aggregation functions, results can be scaled using approximate query processing techniques, where accuracy is measured in terms of relative error0 码力 | 43 页 | 2.42 MB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020restart • only temporarily block the affected dataflow subgraph • usually the operator to be scaled and upstream channels • All-at-once • move state to be migrated in one operation • high latency0 码力 | 93 页 | 2.42 MB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 20202020 Grading Scheme (2) Final Project (50%): • A real-time monitoring and anomaly detection framework • To be implemented individually Deliverables • One (1) written report of maximum 5 pages Apache Flink and Kafka to build a real-time monitoring and anomaly detection framework for datacenters. Your framework will: • Detect “suspicious” event patterns • Raise alerts for abnormal system0 码力 | 34 页 | 2.53 MB | 1 年前3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020Vasiliki Kalavri | Boston University 2020 Apache Flink • An open-source, distributed data analysis framework • True streaming at its core • Streaming & Batch API Historic data Kafka, RabbitMQ, ... HDFS0 码力 | 26 页 | 3.33 MB | 1 年前3
监控Apache Flink应用程序(入门)(e.g. in a time window) for functional reasons. 4. Each computation in your Flink topology (framework or user code), as well as each network shuffle, takes time and adds to latency. 5. If the application0 码力 | 23 页 | 148.62 KB | 1 年前3
PyFlink 1.15 DocumentationPyFlink jobs for more details. 1.1.1.4 YARN Apache Hadoop YARN is a cluster resource management framework for managing the resources and scheduling jobs in a Hadoop cluster. It’s supported to submit PyFlink0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 DocumentationPyFlink jobs for more details. 1.1.1.4 YARN Apache Hadoop YARN is a cluster resource management framework for managing the resources and scheduling jobs in a Hadoop cluster. It’s supported to submit PyFlink0 码力 | 36 页 | 266.80 KB | 1 年前3
共 8 条
- 1













