Scalable Stream Processing - Spark Streaming and Flinkappended. ▶ Built on the Spark SQL engine. ▶ Perform database-like query optimizations. 56 / 79 Programming Model (1/2) ▶ Two main steps to develop a Spark stuctured streaming: ▶ 1. Defines a query on checks for new data (new row in the input table), and incrementally updates the result. 57 / 79 Programming Model (1/2) ▶ Two main steps to develop a Spark stuctured streaming: ▶ 1. Defines a query on checks for new data (new row in the input table), and incrementally updates the result. 57 / 79 Programming Model (1/2) ▶ Two main steps to develop a Spark stuctured streaming: ▶ 1. Defines a query on0 码力 | 113 页 | 1.22 MB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020needs to be aware of message loss, producers and consumers always online 5 Message queues • Asynchronous point-to-point communication • Lightweight buffer for temporary storage • Messages stored on Effective failure handling, crashes or disconnects • Broker responsible for message durability • Asynchronous communication, i.e. producer only needs to receive ack from broker 9 Communication patterns Space Decoupling Time Decoupling Synchronization Decoupling Message-passing RPC/RMI Asynchronous RPC Futures Message Queues Pub/Sub Yes Yes Yes Can you fill this in? 19 Pub/Sub vs. other0 码力 | 33 页 | 700.14 KB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020Traditional DW vs. SDW Traditional DW SDW Update Frequency low high Update propagation synchronized asynchronous Data historical recent and historical ETL process complex fast and light-weight ETL: Extract-Transform-Load0 码力 | 45 页 | 1.22 MB | 1 年前3
Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020snapshot algorithm that is used in distributed systems for recording a consistent global state of an asynchronous system ??? Vasiliki Kalavri | Boston University 2020 Requirements: • Taking a snapshot does Committed The Epoch Commit Protocol Output Logs 38 ??? Vasiliki Kalavri | Boston University 2020 Asynchronous checkpoints in Apache Flink 39 ??? Vasiliki Kalavri | Boston University 2020 40 • A source need to checkpoint the complete application state in every checkpoint? • RocksDB supports both asynchronous and incremental checkpoints: • take a local snapshot and use a background thread to copy the0 码力 | 81 页 | 13.18 MB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020processing challenging? 28 Vasiliki Kalavri | Boston University 2020 Using pseudocode (or the programming language of your choice), write a program that reads a stream of integers and computes: 29 10 码力 | 34 页 | 2.53 MB | 1 年前3
共 5 条
- 1













