Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020filtering, projection, renaming. • Logic Operators define rules for complex pattern detection without order constraints. • conjunction of items I1, I2, …, In is satisfied when all items have been detected pre-sequence of S of length k, denoted by Sk . [ ] is the zero-length pre-sequence of S. Partial Order: Let S and L be two sequences. Then, if for some k, Lk = S we say that S is a pre-sequence of L and and write S ⊂ L. 24 Vasiliki Kalavri | Boston University 2020 Given a relation R, ⊆ is a partial order on sequences of tuples from R. Streaming operators take sequences (streams) as input and return0 码力 | 53 页 | 532.37 KB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020timestamp. • They are produced by external sources, i.e. the DSMS has no control over their arrival order or the data rate. • They have unknown, possibly unbounded length, i.e. the DSMS does not know without storing them 2. Support a high-level language (e.g. StreamSQL) 3. Handle missing, out-of-order, delayed data 4. Guarantee deterministic (on replay) and correct results (on recovery) 5. Combine University 2020 Time-Series Model: The jth update is (j, A[j]) and updates arrive in increasing order of j, i.e. we observe the entries of A by increasing index. This can model time-series data streams:0 码力 | 45 页 | 1.22 MB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020an expected number of elements and a fixed memory budget, how many hash functions do we need in order to minimize Pfp? Optimal number of hash functions 1 0 1 1 1 0 0 1 1 1 1 0 1 1 n bits h1 h2 hk an expected number of elements and a fixed memory budget, how many hash functions do we need in order to minimize Pfp? Optimal number of hash functions After m elements have been inserted to the filter an expected number of elements and a fixed memory budget, how many hash functions do we need in order to minimize Pfp? Optimal number of hash functions After m elements have been inserted to the filter0 码力 | 74 页 | 1.06 MB | 1 年前3
Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020update Vasiliki Kalavri | Boston University 2020 1. Watermarks must be monotonically increasing in order to ensure that the event time clocks of tasks are progressing and not going backwards. 2. A watermark University 2020 Watermarks are essential to both event-time windows and operators handling out-of-order events: • When an operator receives a watermark with time T, it can assume that no further events events with timestamp less than T will be received. • It can then either trigger computation or order received events. 15 Evaluation of event-time windows Vasiliki Kalavri | Boston University 20200 码力 | 22 页 | 2.22 MB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020processors read data from? 2 Challenges • can be distributed • out-of-sync sources may produce out-of-order streams • can be connected to the network • latency and unpredictable delays • might be producing only once, by a single consumer • Event retrieval is not defined by content / structure but its order • FIFO, priority producer consumer queue 6 Message brokers Message broker: a system that connects Re-delivery complicates stream processing and fault-tolerance • might process a message out-of-order or twice 14 How can we avoid this? 15 Publish/Subscribe Systems publisher publisher publisher0 码力 | 33 页 | 700.14 KB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 202023, 8, 0, 7] 5 median ‣ We cannot store the entire stream ‣ No control over arrival rate or order f’ ∞ ? Continuously arriving, possibly unbounded data f read write Complete data accessible 31 Vasiliki Kalavri | Boston University 2020 Some hard problems in stream processing 32 Time Order Processing guarantees Progress Retractions & results amendment Reconfiguration & updates Debugging0 码力 | 34 页 | 2.53 MB | 1 年前3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020Guarantees • Messages sent by a producer to a particular topic partition will be appended in the order they are sent: • if a record M1 is sent by the same producer as a record M2, and M1 is sent first a lower offset than M2 and appear earlier in the log. • A consumer instance sees records in the order they are stored in the log. • For a topic with replication factor N, we will tolerate up to N-10 码力 | 26 页 | 3.33 MB | 1 年前3
Streaming in Apache FlinkDataflows Let's Talk About Time • Processing Time • Event Time • Events may arrive out of order! What Can Be Streamed? • Anything (if you write a serializer/deserializer for it) • Flink has setStreamTimeCharacteristic(TimeCharacteristic.EventTime); Watermarks • Data may arrive out of order • Sorting data is expensive and may not always be required • Watermark is a good heuristic to0 码力 | 45 页 | 3.00 MB | 1 年前3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020primary secondary I1 I2 O1 O2 N’i I’1 I’2 O’1 O’2 • The communication network ensures order-preserving, reliable message transport, e.g. TCP. • Failures are single-node and fail- stop, i t3 t’2 t’3 t’4 … The output semantics depend on the operator type: • arbitrary: it depends on order, randomness, or external system • deterministic: it produces the same output when starting from0 码力 | 49 页 | 2.08 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020to execute a query • There may exist several ways to execute a computation • query plans, e.g. order of operators • scheduling and placement decisions • different algorithms, e.g. hash-based vs. attribute • Ensure ordering constraints: if downstream operator expects elements in a particular order, merging should handle that • Avoid deadlocks: if split cannot push data because one channel is0 码力 | 54 页 | 2.83 MB | 1 年前3
共 16 条
- 1
- 2













