Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/09: Flow control and load shedding ??? Vasiliki Kalavri | Boston University 2020 Keeping up with the producers • Producers (back-pressure, flow control) 2 ??? Vasiliki Kalavri | Boston University 2020 Load management approaches 3 ! Load shedder (a) Load shedding (b) Back-pressure (c) Elasticity Selectively drop records: Suitable for transient load increase. Scale resource allocation: • Addresses the case of increased load and additionally ensures no resources are left idle when the input load decreases. ??? Vasiliki0 码力 | 43 页 | 2.42 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020selectivity • Dataflow optimizations • plan translation alternatives • Runtime optimizations • load management, scheduling, state management • Optimization semantics, correctness, profitability Topics University 2020 34 • Fission might be preferable to pipeline and task parallelism because it balances load more evenly • Data-parallel streaming languages enable fission by construction • Elastic scaling qualified: if load balancing is applied after fission, each instance must be capable of processing each item and have access to necessary state • Establish placement safety: if load balancing while performing0 码力 | 54 页 | 2.83 MB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020unpredictable delays • might be producing too fast • stream processor needs to keep up and not shed load • might be producing too slow or become idle • stream processor should be able to make progress communication, i.e. producer only needs to receive ack from broker 9 Communication patterns (I) Load balancing or shared subscription • A logical producer/consumer can be implemented by multiple physical tasks running in parallel • Ιf a producer generates events with high rate, we can balance the load by spawning several consumer processes • The broker can choose to send messages to consumers in0 码力 | 33 页 | 700.14 KB | 1 年前3
Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020the ball at the least full bin: • when d=2, the maximum load is ln ln n / ln 2 + O(1), with high probability • when d>2, the maximum load keeps decreasing, but only by a constant factor 10 • Consider the maximum load is Θ(ln n/ln ln n), with high probability ??? Vasiliki Kalavri | Boston University 2020 Dynamic resource allocation • Choose one among n workers • check the load of each worker worker and send the item to the least loaded one • load checking for every item can be expensive • Choose two workers at random and send the item to the least loaded of those two • the system uses two0 码力 | 31 页 | 1.47 MB | 1 年前3
监控Apache Flink应用程序(入门)org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html#latency-tracking 2. During periods of high load or during recovery, events might spend some time in the message queue until they are processed by is starting to degrade among the first metrics you want to look at are memory consumption and CPU load of your Task- & JobManager JVMs. 4.13.1 Memory Flink reports the usage of Heap, NonHeap, Direct 13.2 CPU Besides memory, you should also monitor the CPU load of the TaskManagers. If your TaskManagers are constantly under very high load, you might be able to improve the overall performance by decreasing0 码力 | 23 页 | 148.62 KB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020Vasiliki Kalavri | Boston University 2020 • Change parallelism • scale out to process increased load • scale in to save resources • Fix bugs or change business logic • Optimize execution plan keep duration short • minimize performance disruption, e.g. latency spikes • avoid introducing load imbalance • Resource management • utilization, isolation • Automation • continuous monitoring Sequential read pattern • Tasks read unnecessary data and the distributed file system receives high load of read requests • Track the state location for each key in the checkpoint, so that tasks locate0 码力 | 41 页 | 4.09 MB | 1 年前3
Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020Vasiliki Kalavri | Boston University 2020 1. Load: read the graph from disk and partition it in memory 10 ??? Vasiliki Kalavri | Boston University 2020 1. Load: read the graph from disk and partition University 2020 1. Load: read the graph from disk and partition it in memory 2. Compute: read and mutate the graph state 11 ??? Vasiliki Kalavri | Boston University 2020 1. Load: read the graph from0 码力 | 72 页 | 7.77 MB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri | Boston University 2020 Pause-and-restart state migration 32 re-configure state load state load buffer incoming records block channels and upstream operators • State is scoped to a Kalavri | Boston University 2020 Pause-and-restart state migration 32 re-configure state load state load buffer incoming records block channels and upstream operators All affected operators0 码力 | 93 页 | 2.42 MB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020summarizing, and analyzing data streams Systems Algorithms Architecture and design Scheduling and load management Scalability and elasticity Fault-tolerance and guarantees State management Operator their first and last cell towers Examples: • Location-based services • Monitor cell tower load • Continuously maintain call signatures for fraud detection • call frequency • top-K cell towers0 码力 | 34 页 | 2.53 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020historical recent and historical ETL process complex fast and light-weight ETL: Extract-Transform-Load e.g. unzipping compressed files, data cleaning and standardization 6 Vasiliki Kalavri | Boston pipeline, task, data State limited, in-memory partitioned, virtually unlimited, persisted to backends Load management shedding backpressure, elasticity Fault tolerance limited support, high availability full0 码力 | 45 页 | 1.22 MB | 1 年前3
共 12 条
- 1
- 2













