Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020subscription language. • Filters define constraints in the form of name-value pairs and basic comparison operators. • Constraints can be logically combined to form complex event patterns. • company0 码力 | 33 页 | 700.14 KB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020performance depend on? • input data, intermediate data • operator properties • How can we estimate the cost of different strategies? • before execution or during runtime Query optimization (I) ??? Vasiliki Vasiliki Kalavri | Boston University 2020 Cost-based optimization 11 Parsed program representation Optimizer statistics input plan A plan B output Lowest-cost plan ??? Vasiliki Kalavri | Boston University parallelism pays off Safety Profitability ??? Vasiliki Kalavri | Boston University 2020 24 • Cost of Merge = 0.5 • Cost of A = 0.5 • Splitting A allows a pre-aggregation similar to what combiners do in0 码力 | 54 页 | 2.83 MB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020performance • space required to implement routing • lookup cost • Migration performance • re-assignment computation cost • state movement cost 25 State redistribution objectives ??? Vasiliki Kalavri Evenly distributes keys across parallel tasks • Fast to compute, no routing state • High migration cost • When a new node is added, state is shuffled across existing and new nodes • Random I/O and high0 码力 | 41 页 | 4.09 MB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020and input rates and periodically estimates operator selectivities. • The load shedder assigns a cost, ci, in cycles per tuple, and a selectivity, si, to each operator i. • The statistics manager prior to regular query execution. 10 ??? Vasiliki Kalavri | Boston University 2020 Estimating cost and selectivity 11 • Selectivity: how many records does the operator produce per record in its input? • map: 1 in 1 out • filter: 1 in, 1 or 0 out • flatMap, join: 1 in 0, 1, or more out • Cost: how many records can an operator process in a unit of time? #records_in #records_out ??? Vasiliki0 码力 | 43 页 | 2.42 MB | 1 年前3
共 4 条
- 1













