Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020analytics … Building a stream processor… 8 ? Vasiliki Kalavri | Boston University 2020 Basic Stream Models Vasiliki Kalavri | Boston University 2020 A stream can be viewed as a massive, dynamic, one-dimensional at any point in the stream. 13 It is the most general model Hard to develop space-efficient and time-efficient algorithms Vasiliki Kalavri | Boston University 2020 Relational Streaming Model Vasiliki previously emitted items 12:01 12:02 12:00 18 32 8 32 32 32 8 72 64 80 base derived Which basic models do base and derived streams correspond to? Vasiliki Kalavri | Boston University 2020 Results as0 码力 | 45 页 | 1.22 MB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020user queries approximate results ??? Vasiliki Kalavri | Boston University 2020 A simple and efficient synopsis Suppose that our data consists of a large numeric time series. What summary would let this series? 3 var = ∑ (xi − μ)2 N ??? Vasiliki Kalavri | Boston University 2020 A simple and efficient synopsis Suppose that our data consists of a large numeric time series. What summary would let observations var = ∑ (xi − μ)2 N ??? Vasiliki Kalavri | Boston University 2020 A simple and efficient synopsis Suppose that our data consists of a large numeric time series. What summary would let0 码力 | 74 页 | 1.06 MB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020Queuing theory models: for latency objectives • Control theory models: e.g., PID controller • Rule-based models, e.g. if CPU utilization > 70% => scale out • Analytical dataflow-based models Action Predictive: at-once for all operators 8 ??? Vasiliki Kalavri | Boston University 2020 Queuing theory models 9 • Metrics • service time and waiting time per tuple and per task • total time spent processing predictive, at-once for all operators ??? Vasiliki Kalavri | Boston University 2020 Queuing theory models 9 • Metrics • service time and waiting time per tuple and per task • total time spent processing0 码力 | 93 页 | 2.42 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020Distributed execution in Flink ??? Vasiliki Kalavri | Boston University 2020 9 Identify the most efficient way to execute a query • There may exist several ways to execute a computation • query plans plan B output Lowest-cost plan ??? Vasiliki Kalavri | Boston University 2020 12 • What does efficient mean in the context of streaming? • queries run continuously • streams are unbounded • In0 码力 | 54 页 | 2.83 MB | 1 年前3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020maintains state: • rolling aggregations • window contents • input offsets • machine learning models State in dataflow computations 3 Vasiliki Kalavri | Boston University 2020 Logic Statemaintains state: • rolling aggregations • window contents • input offsets • machine learning models State in dataflow computations 3 Vasiliki Kalavri | Boston University 2020 Logic State maintains state: • rolling aggregations • window contents • input offsets • machine learning models State in dataflow computations 3 Vasiliki Kalavri | Boston University 2020 4 Distributed streaming 0 码力 | 49 页 | 2.08 MB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020working set. If consumers are slow, throughput might degrade. • DBs support secondary indexes for efficient search while MBs only offer topic-based subscription. • DB query results depend on a snapshot0 码力 | 33 页 | 700.14 KB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020queries over data streams. (VLDB’06) • N. Tatbul, U. Çetintemel, and S. Zdonik. Staying fit: Efficient load shedding techniques for distributed stream processing. (VLDB’07) • N. R. Katsipoulakis0 码力 | 43 页 | 2.42 MB | 1 年前3
Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020high-frequency elements Counting Bloom Filter ??? Vasiliki Kalavri | Boston University 2020 20 • A space-efficient probabilistic data structure that can be used to estimate frequencies and heavy hitters in data0 码力 | 69 页 | 630.01 KB | 1 年前3
Scalable Stream Processing - Spark Streaming and FlinkDefinitive Guide”, O’Reilly Media, 2018 - Chapters 20-23. ▶ M. Zaharia et al., “Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters”, HotCloud’12. ▶ P. Carbone et0 码力 | 113 页 | 1.22 MB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020maintains state: • rolling aggregations • window contents • input offsets • machine learning models State in dataflow computations 2 Vasiliki Kalavri | Boston University 2020 • No explicit state0 码力 | 24 页 | 914.13 KB | 1 年前3
共 11 条
- 1
- 2
相关搜索词













