State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020operations and types 4 Consider you are designing a state interface. What operations should state support? What state types can you think of? • Count, sum, list, map, … Vasiliki Kalavri | Boston University output and state update atomic Vasiliki Kalavri | Boston University 2020 • Working with State: https://ci.apache.org/projects/flink/flink-docs- release-1.10/dev/stream/state/state.html • Managing State0 码力 | 24 页 | 914.13 KB | 1 年前3
监控Apache Flink应用程序(入门)caolei – 监控Apache Flink应用程序(入门) 1 https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html#registering-metrics 2 https://ci.apache.org/projects/flink/flink-docs-release-1 numberOfFailedCheckpoints > threshold caolei – 监控Apache Flink应用程序(入门) 进度和吞吐量监控 – 10 3 https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/operators/#task-chaining-and-resource-groups events are persisted in the message queue. caolei – 监控Apache Flink应用程序(入门) 进度和吞吐量监控 – 15 4 https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html#latency-tracking 2. During0 码力 | 23 页 | 148.62 KB | 1 年前3
Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020range {1, 2, …, m} ??? Vasiliki Kalavri | Boston University 2020 22 for j=1 to p do i = hj(x) ci,j++ Adding an element to the sketch stream elements x All counters are initialized to 0s 0 0 average of all counters, but the minimum. let f: array of length p for j=1 to p do i = hj(x) f[j] = ci,j return min(f[1], f[2], …, f[p]) ??? Vasiliki Kalavri | Boston University 2020 24 Computing top-k0 码力 | 69 页 | 630.01 KB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020input rates and periodically estimates operator selectivities. • The load shedder assigns a cost, ci, in cycles per tuple, and a selectivity, si, to each operator i. • The statistics manager collects0 码力 | 43 页 | 2.42 MB | 1 年前3
Flink如何实时分析Iceberg数据湖的CDC数据## 未来规划 #4 #见的CDC分析方案 #1 离线 HBase 集u分析 CDC 数a 、CDC记录实时写入HBase。高吞P + 低延迟。 2、小vSg询延迟低。 3、集u可拓展 ci评C B点 、行存o引不适O分析A务。 2、HBase集ur护成e较高。 3、通过Re12o4Server定DHF23e, ServerlB化Rs存完H用不上。 4、数a格式q定HF23e,不cF拓展到0 码力 | 36 页 | 781.69 KB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 20206 Vasiliki Kalavri | Boston University 2020 1. Process events online without storing them 2. Support a high-level language (e.g. StreamSQL) 3. Handle missing, out-of-order, delayed data 4. Guarantee Combine batch (historical) and stream processing 6. Ensure availability despite failures 7. Support distribution and automatic elasticity 8. Offer low-latency 7 2005 Vasiliki Kalavri | Boston University 2020 Dataflow Systems Distributed execution Partitioned state Exact results Out-of-order support Single-node execution Synopses and sketches Approximate results In-order data processing Stream0 码力 | 45 页 | 1.22 MB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020• MBs assume a small working set. If consumers are slow, throughput might degrade. • DBs support secondary indexes for efficient search while MBs only offer topic-based subscription. • DB query0 码力 | 33 页 | 700.14 KB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020iterate over the list of all collected elements when evaluated: • They require more space but support more complex logic. • ProcessWindowFunction Window functions 14 Vasiliki Kalavri | Boston University0 码力 | 35 页 | 444.84 KB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020value of B 20 Vasiliki Kalavri | Boston University 2020 What kind of queries can we express and support on data streams? 21 Vasiliki Kalavri | Boston University 2020 Non-blocking (monotonic) queries0 码力 | 53 页 | 532.37 KB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flink▶ Now instead, computation is kicked off explicitly by a call to the start() method. ▶ DStreams support many of the transformations available on normal Spark RDDs. 20 / 79 Transformations (2/4) ▶ map0 码力 | 113 页 | 1.22 MB | 1 年前3
共 12 条
- 1
- 2













