Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020
routed to the same parallel instance • Some kind of hashing is typically used • Maintaining routing tables or an index for all key mappings is usually impractical • Skewed load is challenging to channel of each parallel task • Partitioning function performance • space required to implement routing • lookup cost • Migration performance • re-assignment computation cost • state movement cost Boston University 2020 • Evenly distributes keys across parallel tasks • Fast to compute, no routing state • High migration cost • When a new node is added, state is shuffled across existing and0 码力 | 41 页 | 4.09 MB | 1 年前3Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020
partitioning 2 w2 w1 w3 round-robin hash-based • Items are perfectly balanced among workers • No routing table required • Key semantics are not preserved: values of the same key might be routed to to different workers • Workers are responsible for roughly the same amount of keys • No routing table is required • Key semantics preserved: values of the same key are always processed by the same both choices: the partitioner sends the item to the worker with the currently lowest load • no routing history required • state needs to be merged to produce the final result: the computation must consist0 码力 | 31 页 | 1.47 MB | 1 年前3Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020
log2(230) = 30 bits. • We need 1024 counters, so m = 210 and we need p = log2m = 10 bits for routing. Space requirements ??? Vasiliki Kalavri | Boston University 2020 16 As we read the stream, it log2(230) = 30 bits. • We need 1024 counters, so m = 210 and we need p = log2m = 10 bits for routing. • Each counter needs to be able to count up to 20 0s, so we need to allocate log220 = 4.32 bits log2(230) = 30 bits. • We need 1024 counters, so m = 210 and we need p = log2m = 10 bits for routing. • Each counter needs to be able to count up to 20 0s, so we need to allocate log220 = 4.32 bits0 码力 | 69 页 | 630.01 KB | 1 年前3Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020
2020 33 • if operator is costly enough to bring benefit when parallelized • split incurs a routing overhead • merge might incur overhead if ordering is required • p/s/o: parallel/sequential/overhead0 码力 | 54 页 | 2.83 MB | 1 年前3
共 4 条
- 1