Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020
perfectly balanced among workers • No routing table required • Key semantics are not preserved: values of the same key might be routed to different workers • Workers are responsible for roughly • Consider the problem of throwing n balls to n bins sequentially (balls -> records, bins -> workers) • Bins are selected uniformly at random • At the end of the process, the maximum load is Θ(ln • Choose one among n workers • check the load of each worker and send the item to the least loaded one • load checking for every item can be expensive • Choose two workers at random and send the0 码力 | 31 页 | 1.47 MB | 1 年前3Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020
changes: external workload and system performance • Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions, predict their effects, and decide which and when to apply changes: external workload and system performance • Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions, predict their effects, and decide which and when to apply changes: external workload and system performance • Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions, predict their effects, and decide which and when to apply0 码力 | 41 页 | 4.09 MB | 1 年前3Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Balancing workloads in network clusters • tasks can be efficiently distributed among multiple workers, such as Google Compute Engine instances. • Distributing event notifications • a service that0 码力 | 33 页 | 700.14 KB | 1 年前3Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020
size for the JobManager (coordinator). taskmanager.heap.size: JVM heap size for the TaskManagers (workers). parallelism.default: Default parallelism for jobs. You can override this option by using env0 码力 | 26 页 | 3.33 MB | 1 年前3Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020
support cyclic dataflows and iterations on streams • Operators are data-parallel • distributed workers (threads) execute one parallel instance of one of more operators on disjoint data partitions 360 码力 | 45 页 | 1.22 MB | 1 年前3Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020
changes: external workload and system performance • Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions, predict their effects, and decide which and when to apply0 码力 | 93 页 | 2.42 MB | 1 年前3
共 6 条
- 1