Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020Worst case: O( 1 ε * log(εN)) counters ??? Vasiliki Kalavri | Boston University 2020 The power of two choices • Instead, we select d destination bins, each uniformly at random, and place the ball at be expensive • Choose two workers at random and send the item to the least loaded of those two • the system uses two hash functions, H1 and H2 and checks the load of the two sampled workers: P(k) Boston University 2020 The power of both choices • Applying the power of two choices in a streaming setting and preserving key semantics would require remembering the choices made previously • Partial0 码力 | 31 页 | 1.47 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020ad-hoc database queries, the query plan is generated on- the-fly. Different plans can be used for two consecutive executions of the same query. • A streaming dataflow is generated once and then scheduled B C D A B C D ??? Vasiliki Kalavri | Boston University 2020 B 21 Profitability • Running two applications together on a single core, one with operators B and C, the other with operators B and remove a no-op, e.g. a projection that keeps all attributes • remove idempotent operations, e.g. two selections on the same predicate • remove a dead subgraph, i.e. one that never produces output Redundancy0 码力 | 54 页 | 2.83 MB | 1 年前3
Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020at least one 0: • ********0 (the probability of a 0 is 1/2) • around 25% will end in at least two 0s: • *******00 (1/2 * 1/2) • and so on… 6 ??? Vasiliki Kalavri | Boston University 2020 The around 25% will end in at least two 0s: • *******00 (1/2 * 1/2) • and so on… If one 0 is the maximum we’ve seen, that indicates 2 distinct elements, whereas if two 0s is the maximum we’ve seen, that around 25% will end in at least two 0s: • *******00 (1/2 * 1/2) • and so on… If one 0 is the maximum we’ve seen, that indicates 2 distinct elements, whereas if two 0s is the maximum we’ve seen, that0 码力 | 69 页 | 630.01 KB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020Examples • Real-time statistics, e.g. weather maps • Monitor conditions to adjust resources, e.g. power generation • Monitor energy consumption for billing purposes 22 Vasiliki Kalavri | Boston University0 码力 | 34 页 | 2.53 MB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flinkof the source DStream. ▶ union • Returns a new DStream that contains the union of the elements in two DStreams. 22 / 79 Transformations (3/4) ▶ count • Returns a new DStream of single-element RDDs of the source DStream. ▶ union • Returns a new DStream that contains the union of the elements in two DStreams. 22 / 79 Transformations (4/4) ▶ reduce • Returns a new DStream of single-element RDDs a set of transformations that apply to a over a sliding window of data. ▶ A window is defined by two parameters: window length and slide interval. ▶ A tumbling window effect can be achieved by making0 码力 | 113 页 | 1.22 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020Management Operators (I) • Join operators merge two streams by matching elements satisfying a condition • commonly applied on windows • Union operators combine two or more streams without ordering guarantees guarantees • elements have to be of the same type • Difference operators take two streams and output elements present in the first but not in the second • it is blocking and must be defined over a window length k, denoted by Sk . [ ] is the zero-length pre-sequence of S. Partial Order: Let S and L be two sequences. Then, if for some k, Lk = S we say that S is a pre-sequence of L and write S ⊆ L. If0 码力 | 53 页 | 532.37 KB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020Non-keyed windows are processed in a single thread To create a window operator, you need to specify two window components: • A window assigner determines how the elements of the input stream are grouped Flink’s built-in window assigners create windows of type TimeWindow. : a time interval between the two timestamps, where start is inclusive and end is exclusive. 7 Built-in Window assigners in Flink compute the result from the accumulator and return it. OUT getResult(ACC accumulator); // merge two accumulators and return the result. ACC merge(ACC a, ACC b); } 16 AggregateFunction interface0 码力 | 35 页 | 444.84 KB | 1 年前3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020flatMap( new CoFlatMapFunction[(String, Double), Item, Offer] { // shared state between the two streams val factorValues: HashMap[String, Double] = HashMap.empty // flatMap method for the0 码力 | 26 页 | 3.33 MB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020fixed-delay strategy restarts an application a fixed number of times and waits a configured time between two restart attempts. • The failure-rate strategy restarts an application as long as a configurable0 码力 | 41 页 | 4.09 MB | 1 年前3
Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020Vasiliki Kalavri | Boston University 2020 A graph is bipartite if its vertex set can be divided into two disjoint independent sets U, V, such that every edge connects a vertex in U to a vertex in V (no0 码力 | 72 页 | 7.77 MB | 1 年前3
共 14 条
- 1
- 2













