Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 1/23: Stream Processing Fundamentals Vasiliki Kalavri | Boston University 2020 What is a stream? • In applications, we know the entire dataset in advance, e.g. tables stored in a database. A data stream is a data set that is produced incrementally over time, rather than being available in full before high-volume, real-time data that might be unbounded • we cannot store the entire stream in an accessible way • we have to process stream elements on-the-fly using limited memory 2 Vasiliki Kalavri | Boston University0 码力 | 45 页 | 1.22 MB | 1 年前3Scalable Stream Processing - Spark Streaming and Flink
Scalable Stream Processing - Spark Streaming and Flink Amir H. Payberah payberah@kth.se 05/10/2018 The Course Web Page https://id2221kth.github.io 1 / 79 Where Are We? 2 / 79 Stream Processing Systems deterministic batch jobs. • Chops up the live stream into batches of X seconds. • Treats each batch as RDDs and processes them using RDD operations. • Discretized Stream Processing (DStream) 7 / 79 Spark Streaming deterministic batch jobs. • Chops up the live stream into batches of X seconds. • Treats each batch as RDDs and processes them using RDD operations. • Discretized Stream Processing (DStream) 7 / 79 Spark Streaming0 码力 | 113 页 | 1.22 MB | 1 年前3Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 1/28: Stream ingestion and pub/sub systems Streaming sources Sockets IoT devices and sensors Databases and KV stores Message queues and brokers Where do stream processors read data from? 2 Challenges • can be distributed • out-of-sync sources may produce unpredictable delays • might be producing too fast • stream processor needs to keep up and not shed load • might be producing too slow or become idle • stream processor should be able to make progress •0 码力 | 33 页 | 700.14 KB | 1 年前3【04 RocketMQ 王鑫】Stream Processing with Apache RocketMQ and Apache Flink
0 码力 | 30 页 | 24.22 MB | 1 年前3Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020
??? Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/16: Skew mitigation ??? Vasiliki Kalavri | Boston University 2020 Lossy Counting • Find all items x in a data stream such that: • freq(x) > δ*N, where N is the number of stream elements • The solution will not contain any item y with frequency: Boston University 2020 Notation (I) Input: a stream of items N: number of items in the stream fe: true frequency of the item e in the input stream f: estimated frequency of item δ: user-defined0 码力 | 31 页 | 1.47 MB | 1 年前3State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 2/25: State Management Vasiliki Kalavri | Boston key the stream on the sensor ID val keyedData: KeyedStream[Reading, String] = sensorData .keyBy(_.id) // apply a stateful FlatMapFunction on the keyed stream val alerts: University 2020 • Working with State: https://ci.apache.org/projects/flink/flink-docs- release-1.10/dev/stream/state/state.html • Managing State in Apache Flink - Tzu-Li (Gordon) Tai: https:// www.youtube0 码力 | 24 页 | 914.13 KB | 1 年前3Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/14: Stream processing optimizations ??? Vasiliki Kalavri | output one or more streams of possibly different type A series of transformations on streams in Stream SQL, Scala, Python, Rust, Java… ??? Vasiliki Kalavri | Boston University 2020 Logic StateStateful operators 5 • Stateful operators maintain state that reflect part of the stream history they have seen • windows, continuous aggregations, distinct… • State is commonly partitioned 0 码力 | 54 页 | 2.83 MB | 1 年前3Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 2/11: Windows and Triggers Vasiliki Kalavri | Boston application env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) // ingest sensor stream val sensorData: DataStream[SensorReading] = env.addSource(...) } } Or ProcessingTIme or Vasiliki Kalavri | Boston University 2020 Window operators can be applied on a keyed or a non-keyed stream: • Window operators on keyed windows are evaluated in parallel • Non-keyed windows are processed0 码力 | 35 页 | 444.84 KB | 1 年前3Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 1/21: Introduction Vasiliki Kalavri | Boston University Boston University 2020 Outcomes At the end of the course, you will hopefully: • know when to use stream processing vs other technology • be able to comprehensively compare features and processing guarantees end-to-end, scalable, and reliable streaming applications • have a solid understanding of how stream processing systems work and what factors affect their performance • be aware of the challenges0 码力 | 34 页 | 2.53 MB | 1 年前3Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020
CS 591 K1: Data Stream Processing and Analytics Spring 2020 2/06: Notions of time and progress Vasiliki Kalavri | Boston University 2020 Mobile game application • input stream: user activity Vasiliki Kalavri | Boston University 2020 Watermarks Vasiliki Kalavri | Boston University 2020 Stream progress 9 http://streamingbook.net/fig/3-1 Vasiliki Kalavri | Boston University 2020 10 • A application. A watermark for time T states that event time has progressed to T in that particular stream (or partition). Vasiliki Kalavri | Boston University 2020 Source 10 12 10 18 23 11 15 11 150 码力 | 22 页 | 2.22 MB | 1 年前3
共 1000 条
- 1
- 2
- 3
- 4
- 5
- 6
- 100