Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020applying B and then A. • holds if both operators are stateless Re-ordering split and merge split merge merge split merge split When might this be beneficial? ??? Vasiliki Kalavri | Boston University 2020 24 • Cost of Merge = 0.5 • Cost of A = 0.5 • Splitting A allows a pre-aggregation similar to what combiners do in MapReduce Operator separation merge X merge A A X merge A1 merge A2 A2 A1 X data because one channel is full and merge cannot receive data because another channel is empty Operator fission Data parallelism, replication A A A split merge ??? Vasiliki Kalavri | Boston University0 码力 | 54 页 | 2.83 MB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020accumulator and return it. OUT getResult(ACC accumulator); // merge two accumulators and return the result. ACC merge(ACC a, ACC b); } 16 AggregateFunction interface Vasiliki Kalavri | override def getResult(acc: (String, Double, Int)) = { (acc._1, acc._2 / acc._3) } override def merge(acc1: (String, Double, Int), acc2: (String, Double, Int)) = { (acc1._1, acc1._2 + acc2._2 type Accumulator type Output type Initialization Accumulate one element Compute the result Merge two partial accumulators Vasiliki Kalavri | Boston University 2020 Use the ProcessWindowFunction0 码力 | 35 页 | 444.84 KB | 1 年前3
Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020the 1st time, create a component with ID the min of the vertex IDs • if in different components, merge them and update the component ID to the min of the component IDs • if only one of the endpoints the edge stream, e.g. by source Id 2. maintain a disjoint set in each partition 3. periodically merge the partial disjoint sets into a global one ??? Vasiliki Kalavri | Boston University 2020 Connected do that for every incoming edge? Can we compute the distances in separate partitions and then merge them? Data-parallel streaming spanners on Flink? ??? Vasiliki Kalavri | Boston University 20200 码力 | 72 页 | 7.77 MB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020disadvantages of each approach? Vasiliki Kalavri | Boston University 2020 • Copy, checkpoint, restore, merge, split, query, subscribe, … State operations and types 4 Consider you are designing a state interface Iterator/RangeScan: seek to a specified key and then scan one key at a time from that point (keys are sorted) • Merge: a lazy read-modify-write RocksDB 11 Vasiliki Kalavri | Boston University 2020 In conf/flink.conf0 码力 | 24 页 | 914.13 KB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020active 17 Vasiliki Kalavri | Boston University 2020 Flow Management Operators (I) • Join operators merge two streams by matching elements satisfying a condition • commonly applied on windows • Union0 码力 | 53 页 | 532.37 KB | 1 年前3
Flink如何实时分析Iceberg数据湖的CDC数据。 4、不支持增量SF。 h点 直接D入CDC到Hi2+分析 、流程能E作 2、Hi2+存量数据不受增量数据H响。 方案评估 优点 、数据不是CR写入; 2、每次数据D致都要 MERGE 存量数据 。T+ 方GT新3R效性差。 3、不M持CR1ps+rt。 缺点 SCaDk + )=AFa IL()(数据 MER,E .NTO GE=DE US.N, chan>=E ON0 码力 | 36 页 | 781.69 KB | 1 年前3
共 6 条
- 1













