Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/09: Flow control and load shedding ??? Vasiliki Kalavri | Boston University 2020 Keeping up with the producers what if the queue grows larger than available memory? • block the producer (back-pressure, flow control) 2 ??? Vasiliki Kalavri | Boston University 2020 Load management approaches 3 ! Load shedder runtime and selectively drops tuples according to a QoS specification. • Similar to congestion control or video streaming in a lower quality. 4 ??? Vasiliki Kalavri | Boston University 2020 https://commons0 码力 | 43 页 | 2.42 MB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020time rate increase : input rate : throughput ??? Vasiliki Kalavri | Boston University 2020 Control: When and how much to adapt? Mechanism: How to apply the re-configuration? 3 • Detect environment to ensure result correctness ??? Vasiliki Kalavri | Boston University 2020 Automatic Scaling Control 4 ??? Vasiliki Kalavri | Boston University 2020 The automatic scaling problem 5 Given a logical congestion, back pressure, throughput Policy • Queuing theory models: for latency objectives • Control theory models: e.g., PID controller • Rule-based models, e.g. if CPU utilization > 70% => scale0 码力 | 93 页 | 2.42 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020Alternatives • data structures • sorting vs hashing • indexing, pre-fetching • minimize disk access • scheduling Objectives • optimize resource utilization or minimize resources • decrease example: URL access frequency (k2, list(v2)) → list(v2) (k1, v1) → list(k2, v2) map() reduce() 25 ??? Vasiliki Kalavri | Boston University 2020 MapReduce combiners example: URL access frequency google.com, 1 ??? Vasiliki Kalavri | Boston University 2020 MapReduce combiners example: URL access frequency 27 map() reduce() GET /dumprequest HTTP/1.1 Host: rve.org.uk Connection: keep-alive0 码力 | 54 页 | 2.83 MB | 1 年前3
Streaming in Apache Flink@Override public Tuple2map (Tuple2 item) throws Exception { // access the state for this key MovingAverage average = averageState.value(); // create a new MovingAverage DataStream control = env.fromElements("DROP", "IGNORE").keyBy(x -> x); DataStream streamOfWords = env.fromElements("data", "DROP", "artisans", "IGNORE") .keyBy(x -> x); control ValueStateDescriptor<>("blocked", Boolean.class)); } @Override public void flatMap1(String control_value, Collector out) throws Exception { blocked.update(Boolean.TRUE); } @Override 0 码力 | 45 页 | 3.00 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020arrival and/or a generation timestamp. • They are produced by external sources, i.e. the DSMS has no control over their arrival order or the data rate. • They have unknown, possibly unbounded length, i single row or groups of rows Data Stream Management System • continuous queries • sequential data access, high-rate append-only updates Data Warehouse • complex, offline analysis • large and relatively Kalavri | Boston University 2020 DBMS vs. DSMS DBMS DSMS Data persistent relations streams Data Access random sequential, single-pass Updates arbitrary append-only Update rates relatively low high,0 码力 | 45 页 | 1.22 MB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flinkautomatically converts this batch-like query to a streaming execution plan. ▶ 2. Specify triggers to control when to update the results. • Each time a trigger fires, Spark checks for new data (new row in the automatically converts this batch-like query to a streaming execution plan. ▶ 2. Specify triggers to control when to update the results. • Each time a trigger fires, Spark checks for new data (new row in the automatically converts this batch-like query to a streaming execution plan. ▶ 2. Specify triggers to control when to update the results. • Each time a trigger fires, Spark checks for new data (new row in the0 码力 | 113 页 | 1.22 MB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020Operator state is scoped to an operator task, i.e. records processed by the same parallel task have access to the same state • It cannot be accessed by other parallel tasks of the same or different operators that maintains the state for this key • State access is automatically scoped to the key of the current record so that all records with the same key access the same state State management in Apache Flink • Keys and values are arbitrary byte arrays: serialization and deserialization is required to access the state via a Flink program. • The keys are ordered according to a user-specified comparator0 码力 | 24 页 | 914.13 KB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020unblock computations to ensure result correctness ??? Vasiliki Kalavri | Boston University 2020 Control: When and how much to adapt? 12 • Detect environment changes: external workload and system performance unblock computations to ensure result correctness ??? Vasiliki Kalavri | Boston University 2020 Control: When and how much to adapt? Mechanism: How to apply the re-configuration? 12 • Detect environment0 码力 | 41 页 | 4.09 MB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020called with the key of the window, an Iterable to access the elements of the window, and a Collector to emit results. • A Context gives access to the metadata of the window (start and end timestamps custom logic for which predefined windows and transformations might not be suitable: • they provide access to record timestamps and watermarks • they can register timers that trigger at a specific time in the stream. Result records are emitted by passing them to the Collector. The Context object gives access to the timestamp and the key of the current record and to a TimerService. • onTimer(timestamp:0 码力 | 35 页 | 444.84 KB | 1 年前3
Apache Flink的过去、现在和未来Services O_0 O_1 I_0 I_1 I_2 P_0 P_1 P_2 S_0 S_1 Order Inventory Payment Shipping Flow-Control Async Call Auto Scale State Management Event Driven Flink 的未来 offline Real-time Batch Processing0 码力 | 33 页 | 3.36 MB | 1 年前3
共 13 条
- 1
- 2













