Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020to Apache Flink and Apache Kafka Vasiliki Kalavri | Boston University 2020 Apache Flink • An open-source, distributed data analysis framework • True streaming at its core • Streaming & Batch API aggregations, ... 2 Vasiliki Kalavri | Boston University 2020 Basic API Concept Source Data Stream Operator Data Stream Sink Source Data Set Operator Data Set Sink Writing a Flink Program 1.Bootstrap University 2020 Resources • Documentation • https://flink.apache.org/ • Community • https://flink.apache.org/community.html#mailing-lists • Conference • http://flink-forward.org/ 20 Vasiliki0 码力 | 26 页 | 3.33 MB | 1 年前3
PyFlink 1.15 DocumentationConda install pandoc conda install pandoc 3. Build the docs python3 setup.py build_sphinx 4. Open the pyflink-docs/build/sphinx/html/index.html in the Browser 1.1 Getting Started This page summarizes 1.1 Preparation This page shows you how to install PyFlink using pip, conda, installing from the source, etc. Python Version Supported PyFlink Version Python Version Supported PyFlink 1.16 Python 3 virtual environment needs to be activated before to use it. To activate the virtual environment, run: source venv/bin/activate That is, execute the activate script under the bin directory of your virtual environment0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 DocumentationConda install pandoc conda install pandoc 3. Build the docs python3 setup.py build_sphinx 4. Open the pyflink-docs/build/sphinx/html/index.html in the Browser 1.1 Getting Started This page summarizes 1.1 Preparation This page shows you how to install PyFlink using pip, conda, installing from the source, etc. Python Version Supported PyFlink Version Python Version Supported PyFlink 1.16 Python 3 virtual environment needs to be activated before to use it. To activate the virtual environment, run: source venv/bin/activate That is, execute the activate script under the bin directory of your virtual environment0 码力 | 36 页 | 266.80 KB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020Double)] { // the state handle object private var lastTempState: ValueState[Double] = _ override def open(parameters: Configuration): Unit = { // create state descriptor val lastTempDescriptor = new Valu declare state handle 2. assign name and get the state handle In the operator (FlatMap) class In the open() method Vasiliki Kalavri | Boston University 2020 class TemperatureAlertFunction(val threshold: ValueStaterideState; private ValueState fareState; @Override public void open(Configuration config) { // initialize the state descriptors here rideState = getRuntimeContext() 0 码力 | 24 页 | 914.13 KB | 1 年前3
Scalable Stream Processing - Spark Streaming and FlinkOperations ▶ Every input DStream is associated with a Receiver object. • It receives the data from a source and stores it in Spark’s memory for processing. ▶ Three categories of streaming sources: 1. Basic Operations ▶ Every input DStream is associated with a Receiver object. • It receives the data from a source and stores it in Spark’s memory for processing. ▶ Three categories of streaming sources: 1. Basic [number of partitions]) 15 / 79 Input Operations - Custom Sources (1/3) ▶ To create a custom source: extend the Receiver class. ▶ Implement onStart() and onStop(). ▶ Call store(data) to store received0 码力 | 113 页 | 1.22 MB | 1 年前3
Streaming in Apache Flinkcluster grows and shrinks • queryable: Flink state can be queried via a REST API Rich Functions • open(Configuration c) • close() • getRuntimeContext() DataStream> input = Tuple2 > { private ValueState averageState; @Override public void open (Configuration conf) { ValueStateDescriptor descriptor = new ValueStat RichCoFlatMapFunction { private ValueState blocked; @Override public void open(Configuration config) { blocked = getRuntimeContext().getState(new ValueStateDescriptor<>("blocked" 0 码力 | 45 页 | 3.00 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020the attribute domain size(s). Note that N might be unknown. up-to-date frequencies for specific (source, destination) pairs observed in IP connections that are currently active 10 The vector is updated Vasiliki Kalavri | Boston University 2020 Types of streams • Base stream: produced by an external source • e.g. TCP packet stream• Derived Derived stream: produced by a continuous query and its operators, e.g. total traffic from a source every minute packet generation time bytes in packet total bytes this minute 0 码力 | 45 页 | 1.22 MB | 1 年前3
监控Apache Flink应用程序(入门)4 可能的报警条件 • recordsOutPerSecond = 0 (for a non-Sink operator) 请注意:目前由于metrics体系只考虑Flink的内部通信,所以source operators的输入记录数是0,而sink operators的输出记录数也是0. caolei – 监控Apache Flink应用程序(入门) 进度和吞吐量监控 – 12 Up" 当从消息队列中消费消息时,通常有一种直接的方法来监控应用程序是否正常运行。通过使用特定于连接器的 metrics,您可以监视当前消费者组的消息队列头部的落后程度。Flink可以从大多数source获得基本metrics。 4.10 关键指标 Metric Scope Description records-lag-max user applies to FlinkKafkaConsumer latency markers periodically at all sources. For each sub-task, a latency distribution from each source to this operator will be reported. The granularity of these histograms can be further controlled0 码力 | 23 页 | 148.62 KB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020excess data for later processing, once input rates stabilize. • Requires a persistent input source. • Suitable for transient load increase. Scale resource allocation: • Addresses the case of increased at any location in the query plan • Dropping near the source avoids wasting work but it might affect results of multiple queries if the source is connected to multiple queries. 14 ??? Vasiliki Kalavri 2020 22 ??? Vasiliki Kalavri | Boston University 2020 22 Durably buffer events in a channel or source Adjust processing rate of all operators to that of the slowest part of the pipeline ??? Vasiliki0 码力 | 43 页 | 2.42 MB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020identify the minimum parallelism πi per operator i, such that the physical dataflow can sustain all source rates. S1 S2 λ1 λ2 S1 S2 π=2 π=3 logical dataflow physical dataflow ??? Vasiliki Kalavri 2020 –Johnny Appleseed “Type a quote here.” 21 Recursively computed as: True output rate of source j ??? Vasiliki Kalavri | Boston University 2020 –Johnny Appleseed “Type a quote here.” 21 Recursively computed for all operators by traversing the dataflow from left to right once True output rate of source j ??? Vasiliki Kalavri | Boston University 2020 Example 22 i=1 i=2 i=3 i=4 λ1 o = 2000r/s0 码力 | 93 页 | 2.42 MB | 1 年前3
共 17 条
- 1
- 2













