Streaming in Apache Flinkup an environment to develop Flink programs • Implement streaming data processing pipelines • Flink managed state • Event time Streaming in Apache Flink • Streams are natural • Events of any type0 码力 | 45 页 | 3.00 MB | 1 年前3
Scalable Stream Processing - Spark Streaming and FlinkScalable Stream Processing - Spark Streaming and Flink Amir H. Payberah payberah@kth.se 05/10/2018 The Course Web Page https://id2221kth.github.io 1 / 79 Where Are We? 2 / 79 Stream Processing Systems Spark streaming ▶ Flink 4 / 79 Spark Streaming 5 / 79 Contribution ▶ Design issues • Continuous vs. micro-batch processing • Record-at-a-Time vs. declarative APIs 6 / 79 Spark Streaming ▶ Run Run a streaming computation as a series of very small, deterministic batch jobs. • Chops up the live stream into batches of X seconds. • Treats each batch as RDDs and processes them using RDD operations0 码力 | 113 页 | 1.22 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 20204/14: Stream processing optimizations ??? Vasiliki Kalavri | Boston University 2020 2 • Costs of streaming operator execution • state, parallelism, selectivity • Dataflow optimizations • plan translation ??? Vasiliki Kalavri | Boston University 2020 12 • What does efficient mean in the context of streaming? • queries run continuously • streams are unbounded • In traditional ad-hoc database queries the-fly. Different plans can be used for two consecutive executions of the same query. • A streaming dataflow is generated once and then scheduled for execution. • Changing execution strategy while0 码力 | 54 页 | 2.83 MB | 1 年前3
Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/28: Graph Streaming ??? Vasiliki Kalavri | Boston University 2020 Modeling the world as a graph 2 Social networks a vertex and all of its neighbors. Although this model can enable a theoretical analysis of streaming algorithms, it cannot adequately model real-world unbounded streams, as the neighbors cannot be continuously generated as a stream of edges? • How can we perform iterative computation in a streaming dataflow engine? How can we propagate watermarks? • Do we need to run the computation from scratch0 码力 | 72 页 | 7.77 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri vkalavri@bu.edu CS 591 K1: Data Stream Processing and Analytics Spring 2020 2/04: Streaming languages and operator semantics Vasiliki Kalavri | Boston University 2020 Vasiliki Kalavri | Boston interval of 5–15 s) by an item of type C with Z < 5. 8 Vasiliki Kalavri | Boston University 2020 Streaming Operators 9 Vasiliki Kalavri | Boston University 2020 Operator types (I) • Single-Item Operators println!("seen: {:?}", x)) .connect_loop(handle); }); t (t, l1) (t, (l1, l2)) Streaming Iteration Example Terminate after 100 iterations Create the feedback loop 13 Vasiliki Kalavri0 码力 | 53 页 | 532.37 KB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 20204/02: Elasticity policies and state migration ??? Vasiliki Kalavri | Boston University 2020 Streaming applications are long-running • Workload will change • Conditions might change • State is input or on output • amounts to the time an operator instance runs for if executed in an ideal setting • when there is no waiting the useful time is equal to the observed time 18 Useful time Wu Roscoe. Three steps is all you need: fast, accurate, automatic scaling decisions for distributed streaming dataflows. (OSDI’18). • Moritz Hoffmann, Andrea Lattuada, Frank McSherry, Vasiliki Kalavri, John0 码力 | 93 页 | 2.42 MB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020cluster or software version 9 Reconfiguration cases ??? Vasiliki Kalavri | Boston University 2020 Streaming applications are long-running • Workload will change • Conditions might change • State is TemperatureAlertFunction(1.1)) // set the maximum parallelism for this operator .setMaxParallelism(1024) 33 Setting the max parallelism ??? Vasiliki Kalavri | Boston University 2020 • A Deep Dive into Rescalable0 码力 | 41 页 | 4.09 MB | 1 年前3
监控Apache Flink应用程序(入门)message queue until they are processed by Flink (see previous section). 3. Some operators in a streaming topology need to buffer events for some time (e.g. in a time window) for functional reasons. this operator will be reported. The granularity of these histograms can be further controlled by setting metrics.latency.granularity as desired. Due to the potentially high number of histograms (in particular0 码力 | 23 页 | 148.62 KB | 1 年前3
Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 The power of both choices • Applying the power of two choices in a streaming setting and preserving key semantics would require remembering the choices made previously • Partial0 码力 | 31 页 | 1.47 MB | 1 年前3
PyFlink 1.15 Documentationrelease-1.15 PyFlink is a Python API for Apache Flink that allows you to build scalable batch and streaming workloads, such as real-time data processing pipelines, large-scale exploratory data analysis, Machine context for creating Table and SQL API programs. Flink is an unified streaming and batch computing engine, which provides unified streaming and batch API to create a TableEnvironment. TableEnvironment is responsible table_environment.TableEnvironment at 0x7fcd16342ac8> [2]: # Create a streaming TableEnvironment env_settings = EnvironmentSettings.in_streaming_mode() table_env = TableEnvironment.create(env_settings) table_env0 码力 | 36 页 | 266.77 KB | 1 年前3
共 22 条
- 1
- 2
- 3













