Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020different things • last 5 sec • last 10 events • last 1h every 10 min • last user session Window operators 2 Vasiliki Kalavri | Boston University 2020 object MaxSensorReadings { def main(args: 0))) .keyBy(_.id) .timeWindow(Time.minutes(1)) .max("temp") } } 3 Example: Window sensor readings Vasiliki Kalavri | Boston University 2020 In the DataStream API, you can use the or IngestionTime Vasiliki Kalavri | Boston University 2020 Window operators can be applied on a keyed or a non-keyed stream: • Window operators on keyed windows are evaluated in parallel • Non-keyed0 码力 | 35 页 | 444.84 KB | 1 年前3
Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020Processing time • the time of the local clock where an event is being processed • a processing-time window wouldn’t account for game activity while the train is in the tunnel • results depend on the processing and aren’t deterministic • Event time • the time when an event actually happened • an event-time window would give you the extra life • results are deterministic and independent of the processing event-times of non-late data Watermark propagation 12 Vasiliki Kalavri | Boston University 2020 13 Event-time update Vasiliki Kalavri | Boston University 2020 1. Watermarks must be monotonically increasing0 码力 | 22 页 | 2.22 MB | 1 年前3
Streaming in Apache Flinkkey MovingAverage average = averageState.value(); // create a new MovingAverage (with window size 2) if none exists for this key if (average == null) average = new MovingAverage(2); keyBy() .window(<window assigner>) .reduce|aggregate|process(<window function>) stream. .windowAll(<window assigner>) .reduce|aggregate|process(<window function>) ◦TumblingEventTimeWindows s.withGap(Time.minutes(30)) DataStream input = ... input .keyBy(“key”) .window(TumblingEventTimeWindows.of(Time.minutes(1))) .process(new MyWastefulMax()); public static class 0 码力 | 45 页 | 3.00 MB | 1 年前3
监控Apache Flink应用程序(入门)FlinkKafkaConsumer The maximum lag in terms of the number of records for any partition in this window. An increasing value over time is your best indication that the consumer group is not keeping up section). 3. Some operators in a streaming topology need to buffer events for some time (e.g. in a time window) for functional reasons. 4. Each computation in your Flink topology (framework or user code), growing state are very application-specific. Typically, an increasing number of keys, a large event-time skew between different input streams or simply missing state cleanup may cause growing state.0 码力 | 23 页 | 148.62 KB | 1 年前3
Scalable Stream Processing - Spark Streaming and FlinkDStream. 23 / 79 Window Operations (1/3) ▶ Spark provides a set of transformations that apply to a over a sliding window of data. ▶ A window is defined by two parameters: window length and slide interval interval. ▶ A tumbling window effect can be achieved by making slide interval = window length 24 / 79 Window Operations (2/3) ▶ window(windowLength, slideInterval) • Returns a new DStream which is computed based on windowed batches. ▶ countByWindow(windowLength, slideInterval) • Returns a sliding window count of elements in the stream. ▶ reduceByWindow(func, windowLength, slideInterval) • Returns0 码力 | 113 页 | 1.22 MB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020drop? • Window-aware load shedding applies shedding to entire windows instead of individual tuples • When discarding tuples at the sources or another point in a query with multiple window aggregations aggregations, it is unclear how shedding will affect the correctness of downstream window operators. • This approach preserves window integrity and guarantees that the results under shedding will not be approximations shedding measures tuple utility • The method selects tuples to discard by relying on the notion of a window-based concept drift. • The metric is defined by computing a similarity metric across windows.0 码力 | 43 页 | 2.42 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020SQuAl Queries are represented in graphical representation using boxes and arrows Tumble Window Tumble Window Join(S1.A = S2.A) S1 S2 7 Vasiliki Kalavri | Boston University 2020 Composite subscription records arrive. • projection, selection, union 14 Vasiliki Kalavri | Boston University 2020 Window Operators • Probably the most important operators in stream processing systems • Almost universally the stream on which computations can be performed 15 Vasiliki Kalavri | Boston University 2020 Window types (I) • Time-based (logical) windows define their contents as a function of time. • average0 码力 | 53 页 | 532.37 KB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020replaces any existing tuple with the same t(A) value to form a new relation state. • as a sliding window with length k in which each subsequence of k tuples represents a relation state in the sequence data channels • operators can accumulate state, have multiple inputs, express event- time custom window-based logic • some systems, like Timely Dataflow support cyclic dataflows and iterations on streams continuously along edges Operators • receive one or more input streams • perform tuple-at-a-time, window, logic, pattern matching transformations • output one or more streams of possibly different0 码力 | 45 页 | 1.22 MB | 1 年前3
Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020g., if ε=0.2, w=5 (5 items per window) • wcur: the current window id • We keep a list D of element frequencies and their maximum associated error. • Once a window fills up, we remove infrequent Kalavri | Boston University 2020 Lossy counting algorithm D = {} // empty list wcur = 1 // first window id N = 0 // elements seen so far Insert step For each element x in wcur: if x ∈ D, increase0 码力 | 31 页 | 1.47 MB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020100 rec 100 recs Observation Window W 0.5s ??? Vasiliki Kalavri | Boston University 2020 16 src o1 o2 10 recs 10 recs 1 2 3 4 100 rec 100 recs Observation Window W 0.5s Instrumentation Metrics Vasiliki Kalavri | Boston University 2020 The DS2 model • Collect metrics per configurable observation window W • activity durations per worker • records processed Rprc and records pushed to output Rpsd Vasiliki Kalavri | Boston University 2020 The DS2 model • Collect metrics per configurable observation window W • activity durations per worker • records processed Rprc and records pushed to output Rpsd0 码力 | 93 页 | 2.42 MB | 1 年前3
共 15 条
- 1
- 2













