Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020functions is difficult ??? Vasiliki Kalavri | Boston University 2020 10 Stochastic averaging ??? Vasiliki Kalavri | Boston University 2020 10 Stochastic averaging Use one hash function to simulate many by by splitting the hash value into two parts ??? Vasiliki Kalavri | Boston University 2020 10 We split the input stream into m = 2p sub-streams S0, S1, …, Sm-1 For every element x, we compute h(x) and simulate many by splitting the hash value into two parts ??? Vasiliki Kalavri | Boston University 2020 10 We split the input stream into m = 2p sub-streams S0, S1, …, Sm-1 For every element x, we compute0 码力 | 69 页 | 630.01 KB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020frequencies for specific (source, destination) pairs observed in IP connections that are currently active 10 The vector is updated by a continuous stream events where the jth update has the general form average of a stream on integers? • The number of distinct users who have visited a website? • The top-10 queries inserted in a search engine? • The connected components of accounts in a stream of financial purpose-built and query-specific • different synopsis to count distinct elements than to keep track of top-K events 33 Vasiliki Kalavri | Boston University 2020 Dataflow Streaming Model Vasiliki Kalavri0 码力 | 45 页 | 1.22 MB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020Vasiliki Kalavri • Office: MCS 206 • Contact: vkalavri@bu.edu • Course Time & Location: Tue,Thu 9:30-10:45, MCS B33 • Office Hours: Tue,Thu 11:00-12:30, MCS 206 2 Vasiliki Kalavri | Boston University • 5 in-class quizzes (10%): • Each quiz contributes 2% to the final grade • 3 hands-on assignments (40%): • Assignment #1 contributes 10% • Assignment #2 contributes 10% • Assignment #3 contributes framework • To be implemented individually Deliverables • One (1) written report of maximum 5 pages (10%). • Code (including pre-processing, deployment, and testing): (40%) • code deliverables must be0 码力 | 34 页 | 2.53 MB | 1 年前3
Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020Bourne Identity” What’s the cheapest way to reach Zurich from London through Berlin? These are the top-10 relevant results for the search term “graph” ??? Vasiliki Kalavri | Boston University 2020 Basics Vasiliki Kalavri | Boston University 2020 1. Load: read the graph from disk and partition it in memory 10 ??? Vasiliki Kalavri | Boston University 2020 1. Load: read the graph from disk and partition it0 码力 | 72 页 | 7.77 MB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020continuously or by running the system for a designated period of time, prior to regular query execution. 10 ??? Vasiliki Kalavri | Boston University 2020 Estimating cost and selectivity 11 • Selectivity: I c=10 s=0.7 c=10 s=0.5 c=5 s=1.0 O ??? Vasiliki Kalavri | Boston University 2020 Overload detection (II) 12 Load coefficient for input I: Total load over m inputs: I c=10 s=0.7 c=10 s=0 inputs: I c=10 s=0.7 c=10 s=0.5 c=5 s=1.0 O 5 12.5 ??? Vasiliki Kalavri | Boston University 2020 Overload detection (II) 12 Load coefficient for input I: Total load over m inputs: I c=10 s=0.70 码力 | 43 页 | 2.42 MB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020don’t necessarily know the queries in advance • we can store a fixed proportion of the stream, e.g. 1/10th 7 ??? Vasiliki Kalavri | Boston University 2020 How can we select a representative sample of an don’t necessarily know the queries in advance • we can store a fixed proportion of the stream, e.g. 1/10th 7 search enginequery stream Example use-case: Web search user behavior University 2020 Solution #1: uniform sampling • Since we can store 1/10th of the stream, we select a stream element i with probability 10%. • We can use a random generator that produces an integer ri between 0 码力 | 74 页 | 1.06 MB | 1 年前3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020them produces different output 9 Vasiliki Kalavri | Boston University 2020 Outputs after recovery 10 Recovery type Before failure After failure Precise t1 t2 t3 t4 t5 t6 … Gap t1 t2 t3 t5 t6 semantics 11 sum 6 5 3 10 6 … 7 8 1 10 Vasiliki Kalavri | Boston University 2020 Processing guarantees and result semantics 11 sum 4 3 2 1 0 … sum 6 5 3 10 6 … 7 8 1 10 Vasiliki Kalavri | Boston 1 5 sum 6 5 3 10 6 … 7 8 1 10 Vasiliki Kalavri | Boston University 2020 Processing guarantees and result semantics 11 sum 4 3 1 3 3 … 5 6 sum 6 5 3 10 6 … 7 8 1 10 Vasiliki Kalavri |0 码力 | 49 页 | 2.08 MB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020g. sliding windows, joins ??? Vasiliki Kalavri | Boston University 2020 Control theory models 10 • Metrics • input and output signals • delay of tuples that have just entered the system • Policy • predictive, dataflow-wide ??? Vasiliki Kalavri | Boston University 2020 Control theory models 10 • Metrics • input and output signals • delay of tuples that have just entered the system • Policy output signal is the delay time ??? Vasiliki Kalavri | Boston University 2020 Control theory models 10 • Metrics • input and output signals • delay of tuples that have just entered the system • Policy0 码力 | 93 页 | 2.42 MB | 1 年前3
PyFlink 1.15 Documentationpyflink-docs, Release release-1.15 (continued from previous page) # -rw-r--r-- 1 dianfu staff 45K 10 18 20:54 flink-dianfu-python-B-7174MD6R-1908. ˓→local.log Besides, you could also check if the files -rw-r--r-- 1 dianfu staff 190K 10 18 20:43 flink-cep-1.15.2.jar # -rw-r--r-- 1 dianfu staff 475K 10 18 20:43 flink-connector-files-1.15.2.jar # -rw-r--r-- 1 dianfu staff 93K 10 18 20:43 flink-csv-1.15.2.jar -rw-r--r-- 1 dianfu staff 110M 10 18 20:43 flink-dist-1.15.2.jar # -rw-r--r-- 1 dianfu staff 171K 10 18 20:43 flink-json-1.15.2.jar # -rw-r--r-- 1 dianfu staff 20M 10 18 20:43 flink-scala_2.12-1.15.20 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentationpyflink-docs, Release release-1.16 (continued from previous page) # -rw-r--r-- 1 dianfu staff 45K 10 18 20:54 flink-dianfu-python-B-7174MD6R-1908. ˓→local.log Besides, you could also check if the files -rw-r--r-- 1 dianfu staff 190K 10 18 20:43 flink-cep-1.15.2.jar # -rw-r--r-- 1 dianfu staff 475K 10 18 20:43 flink-connector-files-1.15.2.jar # -rw-r--r-- 1 dianfu staff 93K 10 18 20:43 flink-csv-1.15.2.jar -rw-r--r-- 1 dianfu staff 110M 10 18 20:43 flink-dist-1.15.2.jar # -rw-r--r-- 1 dianfu staff 171K 10 18 20:43 flink-json-1.15.2.jar # -rw-r--r-- 1 dianfu staff 20M 10 18 20:43 flink-scala_2.12-1.15.20 码力 | 36 页 | 266.80 KB | 1 年前3
共 23 条
- 1
- 2
- 3













