Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020(Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/28: Graph Streaming ??? Vasiliki Kalavri | Boston University 2020 Modeling the world as a graph 2 Social networks friend follows The web Actor-movie results for the search term “graph” ??? Vasiliki Kalavri | Boston University 2020 Basics 1 5 4 3 2 “node” or “vertex” “edge” 1 5 4 3 2 undirected graph directed graph 4 ??? Vasiliki Kalavri Kalavri | Boston University 2020 Graph streams Graph streams model interactions as events that update an underlying graph structure 5 Edge events: A purchase, a movie rating, a like on an online post0 码力 | 72 页 | 7.77 MB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020poles placement, sampling period, damping Cannot identify individual bottlenecks neither model 2-input operators ??? Vasiliki Kalavri | Boston University 2020 Heuristic models 11 • Metrics University 2020 src o1 o2 10 recs 10 recs 1 2 3 4 100 rec 100 recs Intuition: use the dataflow graph to extract operator dependencies and system instrumentation to collect accurate, representative University 2020 src o1 o2 10 recs 10 recs 1 2 3 4 100 rec 100 recs Intuition: use the dataflow graph to extract operator dependencies and system instrumentation to collect accurate, representative0 码力 | 93 页 | 2.42 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020single-pass Updates arbitrary append-only Update rates relatively low high, bursty Processing Model query-driven / pull-based data-driven / push-based Queries ad-hoc continuous Latency relatively University 2020 Time-Series Model: The jth update is (j, A[j]) and updates arrive in increasing order of j, i.e. we observe the entries of A by increasing index. This can model time-series data streams: sequence of measurements from a temperature sensor • the volume of NASDAQ stock trades over time This model poses a severe limitation on the stream: updates cannot change past entries in A. 11 Useful in0 码力 | 45 页 | 1.22 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020basics 3 source sink input port output port dataflow graph ??? Vasiliki Kalavri | Boston University 2020 Revisiting the basics 4 Dataflow graph • operators are nodes, data channels are edges • 0.5 Operator re-ordering B A A B ??? Vasiliki Kalavri | Boston University 2020 17 • A static graph transformation that enables re-ordering at runtime • It dynamically routes data after measuring Kalavri | Boston University 2020 22 • Multi-tenancy • in streaming systems that build one dataflow graph for several queries • when applications analyze data streams from a small set of sources • Operator0 码力 | 54 页 | 2.83 MB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 Rate control • In a network of consumers and producers such as a streaming execution graph with multiple operators, back-pressure has the effect that all operators slow down to match the processing speed of the slowest consumer. • If the bottleneck operator is far down the dataflow graph, back-pressure propagates to upstream operators, eventually reaching the data stream sources.0 码力 | 43 页 | 2.42 MB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020guarantees State management Operator semantics Window optimizations Filtering, counting, sampling Graph streaming algorithms Vasiliki Kalavri | Boston University 2020 Tools Apache Flink: flink.apache0 码力 | 34 页 | 2.53 MB | 1 年前3
Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 System model: • No failures during snapshotting • FIFO reliable channels: no lost or duplicate messages • Strongly connected execution graph: each process can reach every0 码力 | 81 页 | 13.18 MB | 1 年前3
Scalable Stream Processing - Spark Streaming and FlinkBuilt on the Spark SQL engine. ▶ Perform database-like query optimizations. 56 / 79 Programming Model (1/2) ▶ Two main steps to develop a Spark stuctured streaming: ▶ 1. Defines a query on the input new data (new row in the input table), and incrementally updates the result. 57 / 79 Programming Model (1/2) ▶ Two main steps to develop a Spark stuctured streaming: ▶ 1. Defines a query on the input new data (new row in the input table), and incrementally updates the result. 57 / 79 Programming Model (1/2) ▶ Two main steps to develop a Spark stuctured streaming: ▶ 1. Defines a query on the input0 码力 | 113 页 | 1.22 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020be expressed using only non-blocking operators? 22 Vasiliki Kalavri | Boston University 2020 Model and formalization (I) A stream is a sequence of unbounded length, where tuples are ordered by their t ∈ S to denote that, for some 1 ≤ i ≤ n, ti = t. 23 Vasiliki Kalavri | Boston University 2020 Model and formalization (II) Pre-sequence (prefix): Let S = [t1, … ,tn] be a sequence and 0 < k ≤ n. Then streaming and static data. Requirements (or why SQL is not enough) • Push-based model as opposed to the pull-based model of SQL, i.e. an application or client asks for the query results when they need0 码力 | 53 页 | 532.37 KB | 1 年前3
Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020g-102 • Watermarks, Tables, Event Time, and the Dataflow Model: https:// www.confluent.jp/blog/watermarks-tables-event-time-dataflow-model/ Further reading 220 码力 | 22 页 | 2.22 MB | 1 年前3
共 11 条
- 1
- 2













