Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020
• minimize performance disruption, e.g. latency spikes • avoid introducing load imbalance • Resource management • utilization, isolation • Automation • continuous monitoring • bottleneck detection routed to the same parallel instance • Some kind of hashing is typically used • Maintaining routing tables or an index for all key mappings is usually impractical • Skewed load is challenging to channel of each parallel task • Partitioning function performance • space required to implement routing • lookup cost • Migration performance • re-assignment computation cost • state movement cost0 码力 | 41 页 | 4.09 MB | 1 年前3Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020
partitioning 2 w2 w1 w3 round-robin hash-based • Items are perfectly balanced among workers • No routing table required • Key semantics are not preserved: values of the same key might be routed to to different workers • Workers are responsible for roughly the same amount of keys • No routing table is required • Key semantics preserved: values of the same key are always processed by the same Θ(ln n/ln ln n), with high probability ??? Vasiliki Kalavri | Boston University 2020 Dynamic resource allocation • Choose one among n workers • check the load of each worker and send the item to0 码力 | 31 页 | 1.47 MB | 1 年前3Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020
hashing • indexing, pre-fetching • minimize disk access • scheduling Objectives • optimize resource utilization or minimize resources • decrease latency, increase throughput • minimize monetary Kalavri | Boston University 2020 28 Safety • Ensure resource kinds: all resources required by a fused operator should remain available. • Ensure resource amounts: the total amount of resources required by 2020 33 • if operator is costly enough to bring benefit when parallelized • split incurs a routing overhead • merge might incur overhead if ordering is required • p/s/o: parallel/sequential/overhead0 码力 | 54 页 | 2.83 MB | 1 年前3Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020
log2(230) = 30 bits. • We need 1024 counters, so m = 210 and we need p = log2m = 10 bits for routing. Space requirements ??? Vasiliki Kalavri | Boston University 2020 16 As we read the stream, it log2(230) = 30 bits. • We need 1024 counters, so m = 210 and we need p = log2m = 10 bits for routing. • Each counter needs to be able to count up to 20 0s, so we need to allocate log220 = 4.32 bits log2(230) = 30 bits. • We need 1024 counters, so m = 210 and we need p = log2m = 10 bits for routing. • Each counter needs to be able to count up to 20 0s, so we need to allocate log220 = 4.32 bits0 码力 | 69 页 | 630.01 KB | 1 年前3监控Apache Flink应用程序(入门)
https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/operators/#task-chaining-and-resource-groups 4 进度和吞吐量监控 知道您的应用程序正在运行并且检查点正常工作是件好事,但是它并不能告诉您应用程序是否正在实际取得进 展并与上游系统保持同步。 4.1 吞吐量 Fli overall memory consumption of the Job- and TaskManager containers to ensure they don’t exceed their resource limits. This is particularly important, when using the RocksDB statebackend, since RocksDB allocates Flink processes alone. System resource monitoring is disabled by default and requires additional dependencies on the classpath. Please check out the Flink system resource metrics documentation9 for additional0 码力 | 23 页 | 148.62 KB | 1 年前3Apache Flink的过去、现在和未来
Time Window 2015 年阿里巴巴开始使用 Flink 并持续贡献社区 重构分布式架构 Client Dispatcher Job Manager Task Manager Resource Manager Cluster Manager Task Manager 1. Submit job 2. Start job 3. Request slots 4. Allocate0 码力 | 33 页 | 3.36 MB | 1 年前3Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020
stabilize. • Requires a persistent input source. • Suitable for transient load increase. Scale resource allocation: • Addresses the case of increased load and additionally ensures no resources are0 码力 | 43 页 | 2.42 MB | 1 年前3Scalable Stream Processing - Spark Streaming and Flink
Output Operations (3/4) ▶ What’s wrong with this code? ▶ Creating a connection object has time and resource overheads. ▶ Creating and destroying a connection object for each record can incur unnecessarily0 码力 | 113 页 | 1.22 MB | 1 年前3PyFlink 1.15 Documentation
py See Submitting PyFlink jobs for more details. 1.1.1.4 YARN Apache Hadoop YARN is a cluster resource management framework for managing the resources and scheduling jobs in a Hadoop cluster. It’s supported0 码力 | 36 页 | 266.77 KB | 1 年前3PyFlink 1.16 Documentation
py See Submitting PyFlink jobs for more details. 1.1.1.4 YARN Apache Hadoop YARN is a cluster resource management framework for managing the resources and scheduling jobs in a Hadoop cluster. It’s supported0 码力 | 36 页 | 266.80 KB | 1 年前3
共 10 条
- 1