Apache Flink的过去、现在和未来Table API 增强 统一的 Catalog API Blink Planner What’s new in Blink Planner 数据结构 二进制化 更丰富的 内置函数 Minibatch 聚合函数 多种解热点 手段 维表关联 支持 TopN 高效的 流式去重 完整的 批处理支持 批处理错误恢复(1) 批处理错误恢复(2) 批处理错误恢复(3) 批处理错误恢复(4)0 码力 | 33 页 | 3.36 MB | 1 年前3
监控Apache Flink应用程序(入门)Flink提供了一套全面的内置Metrics: • JVM堆/非堆/直接内存的使用情况(任务粒度) • 作业重启次数(作业粒度) • 每秒处理的数据量(操作符粒度) • ...... 作为用户,您可以并且应该向函数中添加应用程序相关的metrics。这些metrics包括无效记录的counter或托管 状态下临时缓冲记录的counter等。除了counters之外,Flink还提供了其他类型的metrics,比如gauges和 那些时间戳小于t的operations将会被触发的触发。 例如,当watermarks超过30时,结束于t = 30的事件时间窗口将被关闭并计算。 因此,您应该在应用程序中对事件时间敏感的operators(如流程函数和窗口)上监控watermarks。如果当前处理 时间与被称为 even-time skew的watermarks之间的差异非常高,那么它通常意味着可能会出现两种情况。首 先,它可能意味着您只是在处0 码力 | 23 页 | 148.62 KB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020State is accumulated over time 2 events/s time rate decrease events/s time throughput degradation events/s time rate increase : input rate : throughput ??? Vasiliki Kalavri | Boston University networks • Action • predictive, at-once for all operators Too fine-grained, impractical for high-rate streams Sampling degrades accuracy ??? Vasiliki Kalavri | Boston University 2020 Queuing theory networks • Action • predictive, at-once for all operators Too fine-grained, impractical for high-rate streams Sampling degrades accuracy Simplified models make strong assumptions Unsuitable for0 码力 | 93 页 | 2.42 MB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020a higher rate than the rate consumers can process events. 2 ??? Vasiliki Kalavri | Boston University 2020 Keeping up with the producers • Producers can generate events in a higher rate than the rate rate consumers can process events. • What happens if consumers cannot keep up with the event rate? 2 ??? Vasiliki Kalavri | Boston University 2020 Keeping up with the producers • Producers can generate generate events in a higher rate than the rate consumers can process events. • What happens if consumers cannot keep up with the event rate? • drop messages 2 ??? Vasiliki Kalavri | Boston University0 码力 | 43 页 | 2.42 MB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020between two restart attempts. • The failure-rate strategy restarts an application as long as a configurable failure rate is not exceeded. The failure rate is specified as the maximum number of failures State is accumulated over time 10 events/s time rate decrease events/s time throughput degradation events/s time rate increase : input rate : throughput Why is it necessary? ??? Vasiliki Kalavri0 码力 | 41 页 | 4.09 MB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020fixed memory budget of 512MB • How many hash functions to use? • What would be the false positive rate? Parameter tuning example ??? Vasiliki Kalavri | Boston University 2020 28 Assume we expect around fixed memory budget of 512MB • How many hash functions to use? • What would be the false positive rate? Parameter tuning example k ≈ 3 Pfp ≈ 0.14 ??? Vasiliki Kalavri | Boston University 2020 28 fixed memory budget of 512MB • How many hash functions to use? • What would be the false positive rate? Parameter tuning example k ≈ 3 Pfp ≈ 0.14 What if we had 1GB of memory instead? ??? Vasiliki0 码力 | 74 页 | 1.06 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020produced by external sources, i.e. the DSMS has no control over their arrival order or the data rate. • They have unknown, possibly unbounded length, i.e. the DSMS does not know when the stream ends groups of rows Data Stream Management System • continuous queries • sequential data access, high-rate append-only updates Data Warehouse • complex, offline analysis • large and relatively static Query processing challenges • Memory requirements: we cannot store the whole stream history. • Data rate: we cannot afford to continuously update indexes and materialized views for high rates. • Incremental0 码力 | 45 页 | 1.22 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020al. A holistic view of stream partitioning costs. VLDB 2017. • Rate-based optimization • Statis Viglas and Jeffrey Naughton. Rate-based Query Optimization for Streaming Information Sources. SIGMOD0 码力 | 54 页 | 2.83 MB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020implemented by multiple physical tasks running in parallel • Ιf a producer generates events with high rate, we can balance the load by spawning several consumer processes • The broker can choose to send0 码力 | 33 页 | 700.14 KB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020[1, 4, 5, 23, 8, 0, 7] 5 median ‣ We cannot store the entire stream ‣ No control over arrival rate or order f’ ∞ ? Continuously arriving, possibly unbounded data f read write Complete data0 码力 | 34 页 | 2.53 MB | 1 年前3
共 11 条
- 1
- 2













