Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020offset, a monotonically increasing sequence number • Within a partition, all messages are totally ordered but there is no ordering guarantee across partitions 28 29 Failure handling • The broker0 码力 | 33 页 | 700.14 KB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri | Boston University 2020 Operator types (II) • Sequence Operators capture the arrival of an ordered set of events. • common in pattern languages • events must have associated timestamps • Iteration 2020 Model and formalization (I) A stream is a sequence of unbounded length, where tuples are ordered by their arrival time. Sequence: Let t1, … ,tn be tuples from a relation R. The list S = [t1, Kalavri | Boston University 2020 Timestamped streams Pre-sequence: Let S and R be two sequences ordered by their timestamp and Rτ be the set of tuples of R with timestamp less than or equal to τ > 0.0 码力 | 53 页 | 532.37 KB | 1 年前3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020written to it. For each topic, the Kafka cluster maintains a partitioned log. Each partition is an ordered, immutable sequence of records that is continually appended to—a structured commit log. An offset0 码力 | 26 页 | 3.33 MB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020serialization and deserialization is required to access the state via a Flink program. • The keys are ordered according to a user-specified comparator function. Basic operations • Get(key): fetch a single0 码力 | 24 页 | 914.13 KB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020Shedding Road Map (LSRM) • A pre-computed table that contains materialized load shedding plans ordered by how much load shedding they will cause. • Each row contains a plan with • expected cycle0 码力 | 43 页 | 2.42 MB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020inserted to the filter, what is the probability P0 that a bit is still 0? 1. The probability that h1 sets bit j is 1 n 1 0 1 1 1 0 0 1 1 1 1 0 1 1 n bits h1 h2 hk … k hash functions 1 0 ??? Vasiliki inserted to the filter, what is the probability P0 that a bit is still 0? 1. The probability that h1 sets bit j is 1 n 2. The probability that a bit was not set by any of the k hash functions is (1 − inserted to the filter, what is the probability P0 that a bit is still 0? 1. The probability that h1 sets bit j is 1 n 2. The probability that a bit was not set by any of the k hash functions is (1 −0 码力 | 74 页 | 1.06 MB | 1 年前3
Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020source Id 2. maintain a disjoint set in each partition 3. periodically merge the partial disjoint sets into a global one ??? Vasiliki Kalavri | Boston University 2020 Connected components in Flink University 2020 A graph is bipartite if its vertex set can be divided into two disjoint independent sets U, V, such that every edge connects a vertex in U to a vertex in V (no edges between vertices in0 码力 | 72 页 | 7.77 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020states over a common schema R: [r1(R), r2(R), ..., ], where the individual relations are unordered sets. src dest bytes 1 2 20K 2 5 32K 1 2 28K {(1, 2, 20K), (2, 5, 32K), (1, 2, 28K)} 25 Vasiliki0 码力 | 45 页 | 1.22 MB | 1 年前3
共 8 条
- 1













