Streaming in Apache Flinkup an environment to develop Flink programs • Implement streaming data processing pipelines • Flink managed state • Event time Streaming in Apache Flink • Streams are natural • Events of any type0 码力 | 45 页 | 3.00 MB | 1 年前3
Scalable Stream Processing - Spark Streaming and FlinkScalable Stream Processing - Spark Streaming and Flink Amir H. Payberah payberah@kth.se 05/10/2018 The Course Web Page https://id2221kth.github.io 1 / 79 Where Are We? 2 / 79 Stream Processing Systems Spark streaming ▶ Flink 4 / 79 Spark Streaming 5 / 79 Contribution ▶ Design issues • Continuous vs. micro-batch processing • Record-at-a-Time vs. declarative APIs 6 / 79 Spark Streaming ▶ Run Run a streaming computation as a series of very small, deterministic batch jobs. • Chops up the live stream into batches of X seconds. • Treats each batch as RDDs and processes them using RDD operations0 码力 | 113 页 | 1.22 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 20204/14: Stream processing optimizations ??? Vasiliki Kalavri | Boston University 2020 2 • Costs of streaming operator execution • state, parallelism, selectivity • Dataflow optimizations • plan translation Pipeline: A || B Task: B || C Data: A || A ??? Vasiliki Kalavri | Boston University 2020 8 Distributed execution in Flink ??? Vasiliki Kalavri | Boston University 2020 9 Identify the most efficient ??? Vasiliki Kalavri | Boston University 2020 12 • What does efficient mean in the context of streaming? • queries run continuously • streams are unbounded • In traditional ad-hoc database queries0 码力 | 54 页 | 2.83 MB | 1 年前3
Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/28: Graph Streaming ??? Vasiliki Kalavri | Boston University 2020 Modeling the world as a graph 2 Social networks a vertex and all of its neighbors. Although this model can enable a theoretical analysis of streaming algorithms, it cannot adequately model real-world unbounded streams, as the neighbors cannot be continuously generated as a stream of edges? • How can we perform iterative computation in a streaming dataflow engine? How can we propagate watermarks? • Do we need to run the computation from scratch0 码力 | 72 页 | 7.77 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri vkalavri@bu.edu CS 591 K1: Data Stream Processing and Analytics Spring 2020 2/04: Streaming languages and operator semantics Vasiliki Kalavri | Boston University 2020 Vasiliki Kalavri | Boston interval of 5–15 s) by an item of type C with Z < 5. 8 Vasiliki Kalavri | Boston University 2020 Streaming Operators 9 Vasiliki Kalavri | Boston University 2020 Operator types (I) • Single-Item Operators println!("seen: {:?}", x)) .connect_loop(handle); }); t (t, l1) (t, (l1, l2)) Streaming Iteration Example Terminate after 100 iterations Create the feedback loop 13 Vasiliki Kalavri0 码力 | 53 页 | 532.37 KB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 What is this course about? The design and architecture of modern distributed streaming 4 Fundamental for representing, summarizing, and analyzing data streams Systems Algorithms Graph streaming algorithms Vasiliki Kalavri | Boston University 2020 Tools Apache Flink: flink.apache.org Apache Kafka: kafka.apache.org Apache Beam: beam.apache.org Google Cloud Platform: cloud compare features and processing guarantees of streaming systems • be proficient in using Apache Flink and Kafka to build end-to-end, scalable, and reliable streaming applications • have a solid understanding0 码力 | 34 页 | 2.53 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020relatively static and historical data • batched updates during downtimes, e.g. every night Streaming Data Warehouse • low-latency materialized view updates • pre-aggregated, pre-processed streams streams and historical data Data Management Approaches 4 storage analytics static data streaming data Vasiliki Kalavri | Boston University 2020 DBMS vs. DSMS DBMS DSMS Data persistent relations stream can be viewed as a massive, dynamic, one-dimensional vector A[1…N]. The size N of the streaming vector is defined as the product of the attribute domain size(s). Note that N might be unknown0 码力 | 45 页 | 1.22 MB | 1 年前3
PyFlink 1.15 Documentationrelease-1.15 PyFlink is a Python API for Apache Flink that allows you to build scalable batch and streaming workloads, such as real-time data processing pipelines, large-scale exploratory data analysis, Machine environment during submitting PyFlink jobs. In this way, the Python virtual environment will be distributed to the cluster nodes where PyFlink jobs are running on during job starting up. This is more flexible the above example, the Python virtual environment is specified via option -pyarch. It will be distributed to the cluster nodes during job execution. It should be noted that option -pyexec is also required0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentationrelease-1.16 PyFlink is a Python API for Apache Flink that allows you to build scalable batch and streaming workloads, such as real-time data processing pipelines, large-scale exploratory data analysis, Machine environment during submitting PyFlink jobs. In this way, the Python virtual environment will be distributed to the cluster nodes where PyFlink jobs are running on during job starting up. This is more flexible the above example, the Python virtual environment is specified via option -pyarch. It will be distributed to the cluster nodes during job execution. It should be noted that option -pyexec is also required0 码力 | 36 页 | 266.80 KB | 1 年前3
监控Apache Flink应用程序(入门)message queue until they are processed by Flink (see previous section). 3. Some operators in a streaming topology need to buffer events for some time (e.g. in a time window) for functional reasons. org/projects/flink/flink-docs-release-1.7/ops/config.html#configuring-the-network-buffers 8 https://www.da-platform.com/blog/manage-rocksdb-memory-size-apache-flink? __hstc=216506377.c9dc814ddd168ffc714fc8d2bf20623f data, which can be mitigated by pre-aggregating before the shuffle or keying on a more evenly distributed key. 4.13.2.1 Key Metrics Metrics Scope Description Status.JVM.CPU.Load job-/0 码力 | 23 页 | 148.62 KB | 1 年前3
共 24 条
- 1
- 2
- 3













