Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020“Inside job” you might also like “The Bourne Identity” What’s the cheapest way to reach Zurich from London through Berlin? These are the top-10 relevant results for the search term “graph” ??? Vasiliki events: A purchase, a movie rating, a like on an online post, a bitcoin transaction, a packet routed from a source to destination Vertex events: A new product, a new movie, a user ??? Vasiliki Kalavri University 2020 1. Load: read the graph from disk and partition it in memory 10 ??? Vasiliki Kalavri | Boston University 2020 1. Load: read the graph from disk and partition it in memory 2. Compute:0 码力 | 72 页 | 7.77 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020of A by increasing index. This can model time-series data streams: • a sequence of measurements from a temperature sensor • the volume of NASDAQ stock trades over time This model poses a severe limitation update (k, c[j]), can be either positive or negative. Events can be continuously inserted and deleted from the stream. It can model fully dynamic situations: • Monitoring active IP network connections is size> • Derived stream: produced by a continuous query and its operators, e.g. total traffic from a source every minutepacket generation time bytes in packet total 0 码力 | 45 页 | 1.22 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 15 Safety • Attribute availability: the set of attributes B reads from must be disjoint from the set of attributes A writes to. • Commutativity: the results of applying A and then Boston University 2020 18 Safety • attribute availability: the set of attributes B reads from must be disjoint from the set of attributes A writes to. • commutativity: the results of applying A and then systems that build one dataflow graph for several queries • when applications analyze data streams from a small set of sources • Operator elimination • remove a no-op, e.g. a projection that keeps all0 码力 | 54 页 | 2.83 MB | 1 年前3
PyFlink 1.15 DocumentationInstallation 1.1.1.1 Preparation This page shows you how to install PyFlink using pip, conda, installing from the source, etc. Python Version Supported PyFlink Version Python Version Supported PyFlink 1.16 installed using PyPI as following: python3 -m pip install apache-flink Installing from Source To install PyFlink from source, you could refer to Build PyFlink. Check the installed package You could (continues on next page) 1.1. Getting Started 5 pyflink-docs, Release release-1.15 (continued from previous page) # -rw-r--r-- 1 dianfu staff 45K 10 18 20:54 flink-dianfu-python-B-7174MD6R-1908.0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 DocumentationInstallation 1.1.1.1 Preparation This page shows you how to install PyFlink using pip, conda, installing from the source, etc. Python Version Supported PyFlink Version Python Version Supported PyFlink 1.16 installed using PyPI as following: python3 -m pip install apache-flink Installing from Source To install PyFlink from source, you could refer to Build PyFlink. Check the installed package You could (continues on next page) 1.1. Getting Started 5 pyflink-docs, Release release-1.16 (continued from previous page) # -rw-r--r-- 1 dianfu staff 45K 10 18 20:54 flink-dianfu-python-B-7174MD6R-1908.0 码力 | 36 页 | 266.80 KB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 20202020 Guest Lectures • Learn about real-world use-cases of stream processing in industry • Learn from experts with decades of hands-on experience in building and using distributed systems and data management it during office hours. Vasiliki Kalavri | Boston University 2020 Dataset A subset of traces from a large (12.5k machines) Google cluster • https://github.com/google/cluster-data/blob/master/ ClusterData2011_2 setup. • If you are a Windows user, you are advised to use Windows subsystem for Linux (WSL), Cygwin, or a Linux virtual machine to run Flink in a UNIX environment. • A Java 8.x installation. To0 码力 | 34 页 | 2.53 MB | 1 年前3
Scalable Stream Processing - Spark Streaming and FlinksetAppName(appName).setMaster(master) val ssc = new StreamingContext(conf, Seconds(1)) ▶ It can also be created from an existing SparkContext object. val sc = ... // existing SparkContext val ssc = new StreamingContext(sc setAppName(appName).setMaster(master) val ssc = new StreamingContext(conf, Seconds(1)) ▶ It can also be created from an existing SparkContext object. val sc = ... // existing SparkContext val ssc = new StreamingContext(sc Input Operations ▶ Every input DStream is associated with a Receiver object. • It receives the data from a source and stores it in Spark’s memory for processing. ▶ Three categories of streaming sources:0 码力 | 113 页 | 1.22 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020input stream events on type, content, timing constraints. • Actions define how to produce results from the matches. Language Types 3 Vasiliki Kalavri | Boston University 2020 Three classes of operators: tables Declarative language: CQL 4 Vasiliki Kalavri | Boston University 2020 Select IStream(*) From S1 [Rows 5], S2 [Rows 10] Where S1.A = S2.A Last 5 elements of stream S1 and last 10 elements of Example Select IStream(S1.A, S2.B) From S1 [Rows 50], S2 [Rows 50] (A & B) || (C & D) Explicit conjunction and disjunction Implicit conjunction in CQL Consider events from stream S1 and stream S2 110 码力 | 53 页 | 532.37 KB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020sensors Databases and KV stores Message queues and brokers Where do stream processors read data from? 2 Challenges • can be distributed • out-of-sync sources may produce out-of-order streams • Message broker: a system that connects event producers with event consumers. • It receives messages from the producers and pushes them to the consumers. • A TCP connection is a simple messaging system responsible for message durability • Asynchronous communication, i.e. producer only needs to receive ack from broker 9 Communication patterns (I) Load balancing or shared subscription • A logical producer/consumer0 码力 | 33 页 | 700.14 KB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 • To recover from failures, the system needs to • restart failed processes • restart the application and recover its state 2 Checkpointing guards the state from failures, but what about following steps: 1. It requests the storage locations from ZooKeeper to fetch the JobGraph, the JAR file, and the state handles of the last checkpoint from remote storage. 2. It requests processing slots to complete after the savepoint! • Use the integrated savepoint-and-cancel command 15 Scaling from a Savepoint ??? Vasiliki Kalavri | Boston University 2020 16 ??? Vasiliki Kalavri | Boston University0 码力 | 41 页 | 4.09 MB | 1 年前3
共 22 条
- 1
- 2
- 3













