Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020sample of an unbounded stream? • we want to ask queries and get statistically meaningful answers about the entire stream • we don’t necessarily know the queries in advance • we can store a fixed proportion statistically meaningful answers about the entire stream • we don’t necessarily know the queries in advance • we can store a fixed proportion of the stream, e.g. 1/10th 7 search enginetimestamp> query stream Example use-case: Web search user behavior study Q: How many queries did users repeat last month? ??? Vasiliki Kalavri | Boston University 2020 Solution #1: uniform sampling 0 码力 | 74 页 | 1.06 MB | 1 年前3
PyFlink 1.15 Documentationwill use the Python environment of the current shell environment. See Session Mode for more details about session mode of Kubernetes. Execute PyFlink jobs with Flink Kubernetes Operator See PyFlink Example page in the official Flink documen- tation. For example, you can open the Kafka connector page and search keyword “SQL Client JAR” which is a fat JAR of Kafka connector. • It should be noted that you should0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentationwill use the Python environment of the current shell environment. See Session Mode for more details about session mode of Kubernetes. Execute PyFlink jobs with Flink Kubernetes Operator See PyFlink Example page in the official Flink documen- tation. For example, you can open the Kafka connector page and search keyword “SQL Client JAR” which is a fat JAR of Kafka connector. • It should be noted that you should0 码力 | 36 页 | 266.80 KB | 1 年前3
监控Apache Flink应用程序(入门)addition to the JVM metrics above, it is also possible to use Flink’s metrics system to gather insights about system resources, i.e. memory, CPU & network-related metrics for the whole machine as opposed to Flink’s metrics and monitoring system. You can utilise it as a starting point when you first think about how to successfully monitor your Flink application. I highly recommend to start monitoring your Flink development phase. By doing so, you can ask the right questions about the runtime behaviour of your application, and learn much more about Flink’s internals early on. Last but not least, this post only0 码力 | 23 页 | 148.62 KB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020If consumers are slow, throughput might degrade. • DBs support secondary indexes for efficient search while MBs only offer topic-based subscription. • DB query results depend on a snapshot and clients0 码力 | 33 页 | 700.14 KB | 1 年前3
Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020way to reach Zurich from London through Berlin? These are the top-10 relevant results for the search term “graph” ??? Vasiliki Kalavri | Boston University 2020 Basics 1 5 4 3 2 “node” or “vertex”0 码力 | 72 页 | 7.77 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020• The number of distinct users who have visited a website? • The top-10 queries inserted in a search engine? • The connected components of accounts in a stream of financial transactions? What synopsis0 码力 | 45 页 | 1.22 MB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020assignment announcements & submissions 3 Vasiliki Kalavri | Boston University 2020 What is this course about? The design and architecture of modern distributed streaming 4 Fundamental for representing quizzes and announcements Vasiliki Kalavri | Boston University 2020 Guest Lectures • Learn about real-world use-cases of stream processing in industry • Learn from experts with decades of hands-on0 码力 | 34 页 | 2.53 MB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020the application and recover its state 2 Checkpointing guards the state from failures, but what about process failure? High-availability ??? Vasiliki Kalavri | Boston University 2020 3 Flink processes University 2020 • The JobManager is a single point of failure Flink applications • It keeps metadata about application execution, such as pointers to completed checkpoints. • A high-availability mode migrates0 码力 | 41 页 | 4.09 MB | 1 年前3
Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020more delayed events will arrive. • Watermarks provide a logical clock which informs the system about the current event time. http://streamingbook.net/fig/2-9 Vasiliki Kalavri | Boston University 20200 码力 | 22 页 | 2.22 MB | 1 年前3
共 15 条
- 1
- 2













