Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020execution graph: each process can reach every other process in the system • Single initiating process 18 The Chandy-Lamport Algorithm A snapshot algorithm that is used in distributed systems for recording pPu15vWrNrbszkGXiFaQGBZq96le3n7As5gqZpMZ0PDf FIKcaBZN8UulmhqeUjeiAdyxVNOYmyGenTsiJVfokSrQ thWSm/p7IaWzMOA5tZ0xaBa9qfif18kwugxyodIMuWL zRVEmCSZk+jfpC80ZyrElGlhbyVsSDVlaNOp2BC8xZeX iX9Wv6p7d+e1xnWRhmO4BhOwYMLaMAtNMEHBgN4hld4 pPu15vWrNrbszkGXiFaQGBZq96le3n7As5gqZpMZ0PDf FIKcaBZN8UulmhqeUjeiAdyxVNOYmyGenTsiJVfokSrQ thWSm/p7IaWzMOA5tZ0xaBa9qfif18kwugxyodIMuWL zRVEmCSZk+jfpC80ZyrElGlhbyVsSDVlaNOp2BC8xZeX iX9Wv6p7d+e1xnWRhmO4BhOwYMLaMAtNMEHBgN4hld40 码力 | 81 页 | 13.18 MB | 1 年前3
PyFlink 1.15 Documentationpyflink-docs, Release release-1.15 (continued from previous page) # -rw-r--r-- 1 dianfu staff 45K 10 18 20:54 flink-dianfu-python-B-7174MD6R-1908. ˓→local.log Besides, you could also check if the files -rw-r--r-- 1 dianfu staff 190K 10 18 20:43 flink-cep-1.15.2.jar # -rw-r--r-- 1 dianfu staff 475K 10 18 20:43 flink-connector-files-1.15.2.jar # -rw-r--r-- 1 dianfu staff 93K 10 18 20:43 flink-csv-1.15.2.jar -rw-r--r-- 1 dianfu staff 110M 10 18 20:43 flink-dist-1.15.2.jar # -rw-r--r-- 1 dianfu staff 171K 10 18 20:43 flink-json-1.15.2.jar # -rw-r--r-- 1 dianfu staff 20M 10 18 20:43 flink-scala_2.12-1.15.2.jar0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentationpyflink-docs, Release release-1.16 (continued from previous page) # -rw-r--r-- 1 dianfu staff 45K 10 18 20:54 flink-dianfu-python-B-7174MD6R-1908. ˓→local.log Besides, you could also check if the files -rw-r--r-- 1 dianfu staff 190K 10 18 20:43 flink-cep-1.15.2.jar # -rw-r--r-- 1 dianfu staff 475K 10 18 20:43 flink-connector-files-1.15.2.jar # -rw-r--r-- 1 dianfu staff 93K 10 18 20:43 flink-csv-1.15.2.jar -rw-r--r-- 1 dianfu staff 110M 10 18 20:43 flink-dist-1.15.2.jar # -rw-r--r-- 1 dianfu staff 171K 10 18 20:43 flink-json-1.15.2.jar # -rw-r--r-- 1 dianfu staff 20M 10 18 20:43 flink-scala_2.12-1.15.2.jar0 码力 | 36 页 | 266.80 KB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020in S twice. Probability that only one occurrence is in S: Pb = 1/10 * 9/10 + 9/10 * 1/10 = 18/100 => 18*d/100 will appear in S once. ??? Vasiliki Kalavri | Boston University 2020 10 How many of Ted’s 1/10 = 18/100 => 18*d/100 will appear in S once. one is selected the other is not ??? Vasiliki Kalavri | Boston University 2020 11 Q: How many queries did Ted repeat last month? d 100 s 10 + 18d 100 d 100 s 10 + 18d 100 + d 100 queries appearing in S twice ??? Vasiliki Kalavri | Boston University 2020 11 Q: How many queries did Ted repeat last month? d 100 s 10 + 18d 100 + d 100 queries0 码力 | 74 页 | 1.06 MB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020the case of a time window), the current processing time and the watermark. ProcessWindowFunction 18 Vasiliki Kalavri | Boston University 2020 public abstract class ProcessWindowFunction18 Window max over 5 last elements? 32 4 2 5 8 4 2 5 7 8 2 5 7 44 8 5 7 44 8 18 32 8 44 44 21 Can we compute the max more efficiently Boston University 2020 32 4 2 5 7 44 8 18 Window max over 5 last elements? 32 4 2 8 22 inwindow Vasiliki Kalavri | Boston University 2020 32 4 2 5 7 44 8 18 Window max over 5 last elements? 32 8 0 码力 | 35 页 | 444.84 KB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020Assignment 1 1/30 2/12 Assignment 2 2/13 2/26 Assignment 3 3/3 3/16 Final Project 3/17 4/30 2/18: No Class, Self-study 2/25: Last Day to DROP Clases (without a ‘W’ grade) 4/3: Last Day to DROP Classes “SAQL: A Stream-based Query System for Real- Time Abnormal System Behavior Detection”, USENIX Security '18 12 Interested in a more research-oriented project? Let’s discuss it during office hours. Vasiliki all data will be real-time data. By 2020, we will be able to store less than 15% of all data. 18 Vasiliki Kalavri | Boston University 2020 Can you give me some examples of streaming data sources0 码力 | 34 页 | 2.53 MB | 1 年前3
Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020that particular stream (or partition). Vasiliki Kalavri | Boston University 2020 Source 10 12 10 18 23 11 15 11 15 event time watermark 15 14 20 • The input watermark captures the progress of generate watermarks every 5 seconds env.getConfig.setAutoWatermarkInterval(5000) Watermarks in Flink 18 Vasiliki Kalavri | Boston University 2020 class PeriodicAssigner extends AssignerWithPeriodicWatermarks[Reading]0 码力 | 22 页 | 2.22 MB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020window-based concept drift. • The metric is defined by computing a similarity metric across windows. 18 ??? Vasiliki Kalavri | Boston University 2020 How many tuples to drop? • The amount of tuples to Concept-driven load shedding: Reducing size and error of voluminous and variable data streams. (IEEE Big Data ’18) • H. T. Kung, T. Blackwell, and A. Chapman. Credit-based flow control for atm networks: Credit0 码力 | 43 页 | 2.42 MB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020executed in an ideal setting • when there is no waiting the useful time is equal to the observed time 18 Useful time Wu ??? Vasiliki Kalavri | Boston University 2020 True processing / output rates Aggregated you need: fast, accurate, automatic scaling decisions for distributed streaming dataflows. (OSDI’18). • Moritz Hoffmann, Andrea Lattuada, Frank McSherry, Vasiliki Kalavri, John Liagouris, Timothy0 码力 | 93 页 | 2.42 MB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020Subscribers get notified asynchronously while possibly performing some other concurrent action. 18 Paradigm Space Decoupling Time Decoupling Synchronization Decoupling Message-passing RPC/RMI0 码力 | 33 页 | 700.14 KB | 1 年前3
共 21 条
- 1
- 2
- 3













