Streaming in Apache Flink
develop Flink programs • Implement streaming data processing pipelines • Flink managed state • Event time Streaming in Apache Flink • Streams are natural • Events of any type like sensors, click subset of stream processing Processing Data Dataflows Let's Talk About Time • Processing Time • Event Time • Events may arrive out of order! What Can Be Streamed? • Anything (if you write a serializer/deserializer none exists for this key if (average == null) average = new MovingAverage(2); // add this event to the moving average average.add(item.f1); averageState.update(average); // return0 码力 | 45 页 | 3.00 MB | 1 年前3Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Vasiliki Kalavri | Boston University 2020 • Processing time • the time of the local clock where an event is being processed • a processing-time window wouldn’t account for game activity while the train results depend on the processing speed and aren’t deterministic • Event time • the time when an event actually happened • an event-time window would give you the extra life • results are deterministic Clones Episode III: Revenge of the Sith Episode VII: The Force Awakens This is called event time This is called processing time Vasiliki Kalavri | Boston University 2020 • What if you were0 码力 | 22 页 | 2.22 MB | 1 年前3Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020
• might fail (or seem as if they failed) Streaming sources… 3 Producers and consumers • An event is typically generated by a producer (or publisher or sender) and processed by one or multiple consumers consumer • Event retrieval is not defined by content / structure but its order • FIFO, priority producer consumer queue 6 Message brokers Message broker: a system that connects event producers with with event consumers. • It receives messages from the producers and pushes them to the consumers. • A TCP connection is a simple messaging system which connects one sender with one recipient.0 码力 | 33 页 | 700.14 KB | 1 年前3Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020
execution environment val env = StreamExecutionEnvironment.getExecutionEnvironment // use event time for the application env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) // window assigners for the most common windowing use cases: • They assign an element based on its event-time timestamp or the current processing time to windows. • Time windows have a start and an assigners provide a default trigger that triggers the evaluation of a window once the (processing or event) time passes the end of the window. • A window is created when the first element is assigned to0 码力 | 35 页 | 444.84 KB | 1 年前3Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020
bound advances for every new event • all events since 1/1/2019 • Sliding windows have fixed size but both their bounds advance for new events • last 10 events or event in the last minute • Tumble • Group by / Partition Operators split a stream into sub-streams according to a function or the event contents. • one stream per customer Id • round-robin assignment 19 Vasiliki Kalavri | Boston order, ask for a refund immediately, and then cancel the order webevents(CustomerID, ItemID, Event, Amount, Time) 0 3 2 1 order refund cancel 42 Vasiliki Kalavri | Boston University 20200 码力 | 53 页 | 532.37 KB | 1 年前3监控Apache Flink应用程序(入门)
this operator has emitted caolei – 监控Apache Flink应用程序(入门) 进度和吞吐量监控 – 13 4.7 仪表盘示例 Figure 4: Event Time Lag per Subtask of a single operator in the topology. In this case, the watermark is lagging speaking, latency is the delay between the creation of an event and the time at which results based on this event become visible. Once the event is created it is usually stored in a persistent message queue for growing state are very application-specific. Typically, an increasing number of keys, a large event-time skew between different input streams or simply missing state cleanup may cause growing state0 码力 | 23 页 | 148.62 KB | 1 年前3Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020
streams update relation tables and derived streams update materialized views. • An operator outputs event streams that describe the changing view computed over the input stream according to the relational streams update relation tables and derived streams update materialized views. • An operator outputs event streams that describe the changing view computed over the input stream according to the relational streams update relation tables and derived streams update materialized views. • An operator outputs event streams that describe the changing view computed over the input stream according to the relational0 码力 | 45 页 | 1.22 MB | 1 年前3High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020
1. receive an event 2. store in local buffer and possibly update state 3. produce output 5 mi mo Vasiliki Kalavri | Boston University 2020 What is a failure? op 1. receive an event 2. store in output 5 mi mo Vasiliki Kalavri | Boston University 2020 What is a failure? op 1. receive an event 2. store in local buffer and possibly update state 3. produce output 5 mi mo Was mi fully processed delivered downstream? Vasiliki Kalavri | Boston University 2020 What is a failure? op 1. receive an event 2. store in local buffer and possibly update state 3. produce output What can go wrong: • lost0 码力 | 49 页 | 2.08 MB | 1 年前3Scalable Stream Processing - Spark Streaming and Flink
has been consumed or not. ▶ No built-in timeouts • Think what would happen in our example, if the event signaling the end of the user session was lost, or had not arrived for some reason. 48 / 79 mapWithState returns another streaming DF 63 / 79 Window Operation ▶ Aggregations over a sliding event-time window. • Event-time is the time embedded in the data, not the time Spark receives them. ▶ Use groupBy() streaming uses watermarks to measure progress in event time. ▶ Watermarks flow as part of the data stream and carry a timestamp t. ▶ A W(t) declares that event time has reached time t in that stream • There0 码力 | 113 页 | 1.22 MB | 1 年前3Apache Flink的过去、现在和未来
17亿/秒 Flink 的过去 offline Real-time Batch Processing Continuous Processing & Streaming Analytics Event-driven Applications ✔ 现在 Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor 中文社区 Flink 的现在 offline Real-time Batch Processing Continuous Processing & Streaming Analytics Event-driven Applications ✔ ✔ 未来 Micro Services O_0 O_1 I_0 I_1 I_2 P_0 P_1 P_2 S_0 S_1 Order Async Call Auto Scale State Management Event Driven Flink 的未来 offline Real-time Batch Processing Continuous Processing & Streaming Analytics Event-driven Applications ✔ ✔ ✔ 扫码加入社群 与志同道合的码友一起0 码力 | 33 页 | 3.36 MB | 1 年前3
共 18 条
- 1
- 2