Scalable Stream Processing - Spark Streaming and Flinkobject. • It receives the data from a source and stores it in Spark’s memory for processing. ▶ Three categories of streaming sources: 1. Basic sources directly available in the StreamingContext API object. • It receives the data from a source and stores it in Spark’s memory for processing. ▶ Three categories of streaming sources: 1. Basic sources directly available in the StreamingContext API and incrementally updates the result. 57 / 79 Programming Model (2/2) 58 / 79 Output Modes ▶ Three output modes: 1. Append: only the new rows appended to the result table since the last trigger will0 码力 | 113 页 | 1.22 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020produce results from the matches. Language Types 3 Vasiliki Kalavri | Boston University 2020 Three classes of operators: • relation-to-relation: similar to standard SQL and define queries over User-Defined Aggregates (UDAs) Constructs that allow the definition of custom aggregations using three statement groups: • INITIALIZE: initialized local state. • ITERATE: update state based on new single Stream Every monotonic function F on an input data stream can be computed by a UDA that uses three local tables, IN, TAPE, and OUT, and performs the following operations for each new arriving tuple:0 码力 | 53 页 | 532.37 KB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020values • the sum of the squares of the values • the number of observations We can compute the three summary values in a single pass through the data. • μ = sum / count • var = (sum of squares / count) values • the sum of the squares of the values • the number of observations We can compute the three summary values in a single pass through the data. • μ = sum / count • var = (sum of squares / count)0 码力 | 74 页 | 1.06 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020Vasiliki Kalavri | Boston University 2020 Such a relation sequence could be represented in various ways: • as the concatenation of serializations of the relations. • as a list of tuple-index pairs,0 码力 | 45 页 | 1.22 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 9 Identify the most efficient way to execute a query • There may exist several ways to execute a computation • query plans, e.g. order of operators • scheduling and placement decisions0 码力 | 54 页 | 2.83 MB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020• e.g. you can configure that an application be restarted as long as it did not fail more than three times in the last ten minutes. • The no-restart strategy does not restart an application, but fails0 码力 | 41 页 | 4.09 MB | 1 年前3
Streaming in Apache FlinkStateful Enrichment of Rides and Fares Time and Analytics Event Time • Flink explicitly supports three different notions of time: • Event time • Ingestion time • Processing time (default) final0 码力 | 45 页 | 3.00 MB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri, John Liagouris, Moritz Hoffmann, Desislava Dimitrova, Matthew Forshaw, and Timothy Roscoe. Three steps is all you need: fast, accurate, automatic scaling decisions for distributed streaming dataflows0 码力 | 93 页 | 2.42 MB | 1 年前3
Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020process at least 30s without checkpointing cpConfig.setMinPauseBetweenCheckpoints(30000); // allow three checkpoints to be in progress at the same time cpConfig.setMaxConcurrentCheckpoints(3); // checkpoints0 码力 | 81 页 | 13.18 MB | 1 年前3
共 9 条
- 1













