Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020K1: Data Stream Processing and Analytics Spring 2020 ## 1 /23: Stream Processing Fundamentals Vasiliki (Vasia) Kalavri vkalavri@bu.edu ## What is a stream? - In traditional data processing applications g. tables stored in a database. A data stream is a data set that is produced incrementally over time, rather than being available in full before its processing begins. • Data streams are high-volume high-volume, real-time data that might be unbounded • we cannot store the entire stream in an accessible way • we have to process stream elements on-the-fly using limited memory ## Properties of data streams •0 码力 | 45 页 | 1.22 MB | 2 年前3
Scalable Stream Processing - Spark Streaming and Flink## Scalable Stream Processing - Spark Streaming and Flink Amir H. Payberah payberah@kth.se 05/10/2018 https://id2221kth.github.io ## Data Processing Graph Data Pregel, GraphLab, PowerGraph GraphX GraphX, X-Stream, Chaos Batch Data MapReduce, Dryad FlumeJava, Spark Structured Data Spark SQL Machine Learning Mliib Tensorflow Streaming Data Storm, SEEP, Naiad, Spark Streaming, Flink, Millwheel Messaging Systems Kafka Resource Management Mesos. YARN ## Stream Processing Systems Design Issues ▶ Continuous vs. micro-batch processing Record-at-a-Time vs. declarative APIs ▶ Spark streaming ▶0 码力 | 113 页 | 1.22 MB | 2 年前3
【04 RocketMQ 王鑫】Stream Processing with Apache RocketMQ and Apache FlinkStream Processing with Apache RocketMQ and Apache Flink 王鑫 · The Apache Software Foundation Nov.4, 2018, Shanghai, Apache Flink China Meetup  Online Analytical Processing   Kalavri vkalavri@bu.edu ## Streaming sources  Where do stream processors read data from? Files, e.g. transaction logs Sockets IoT devices and sensors Databases and KV stores Message queues unpredictable delays • might be producing too fast • stream processor needs to keep up and not shed load • might be producing too slow or become idle • stream processor should be able to make progress • might0 码力 | 33 页 | 700.14 KB | 2 年前3
Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020# CS 591 K1: Data Stream Processing and Analytics Spring 2020 4/16: Skew mitigation Vasiliki (Vasia) Kalavri vkalavri@bu.edu ## Key partitioning  > δ*N, where N is the number of stream elements • The solution will not contain any item y with frequency: \\delta^{\*}N $|| ## Notation (I) Input: a stream of items N: number of items in the stream $ f_{e} $ : true frequency of the item e in the input stream f: estimated frequency of item δ: user-defined0 码力 | 31 页 | 1.47 MB | 2 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020# CS 591 K1: Data Stream Processing and Analytics Spring 2020 2/25: State Management Vasiliki (Vasia) Kalavri vkalavri@bu.edu ## State in dataflow computations Any non-trivial streaming computation and key the stream on the sensor ID val keyedData: KeyedStream[Reading, String] = sensorData .keyBy(_ .id) KeyedStream // apply a stateful FlatMapFunction on the keyed stream val alerts: operator. Keyed state can only be used by functions that are applied on a KeyedStream: - When the processing method of a function with keyed input is called, Flink’s runtime automatically puts all keyed state0 码力 | 24 页 | 914.13 KB | 2 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020## CS 591 K1: Data Stream Processing and Analytics Spring 2020 ## 4 /14: Stream processing optimizations Vasiliki (Vasia) Kalavri vkalavri@bu.edu ## Topics covered in this lecture • Costs of streaming 2db614d10387ee7/p3_1.jpg) ## Revisiting the basics A series of transformations on streams in Stream SQL, Scala, Python, Rust, Java...  - Stateful operators maintain state that reflect part of the stream history they have seen • windows, continuous aggregations, distinct... - State is commonly partitioned0 码力 | 54 页 | 2.83 MB | 2 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020## CS 591 K1: Data Stream Processing and Analytics Spring 2020 2/11: Windows and Triggers Vasiliki (Vasia) Kalavri vkalavri@bu.edu ## Window operators • Practical way to perform operations on unbounded ingest sensor stream val sensorData: DataStream[SensorReading] = env.addSource(...) } } ### Keyed vs. non-keyed windows Window operators can be applied on a keyed or a non-keyed stream: • Window need to specify two window components: • A window assigner determines how the elements of the input stream are grouped into windows. A window assigner produces a WindowedStream (or All WindowedStream if applied0 码力 | 35 页 | 444.84 KB | 2 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020## CS 591 K1: Data Stream Processing and Analytics Spring 2020 1/21: Introduction Vasiliki (Vasia) Kalavri vkalavri@bu.edu ## Course Information • Instructor: Vasiliki Kalavri • Office: MCS 206 • the course, you will hopefully: • know when to use stream processing vs other technology • be able to comprehensively compare features and processing guarantees of streaming systems • be proficient in end-to-end, scalable, and reliable streaming applications • have a solid understanding of how stream processing systems work and what factors affect their performance • be aware of the challenges and trade-offs0 码力 | 34 页 | 2.53 MB | 2 年前3
Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020## CS 591 K1: Data Stream Processing and Analytics Spring 2020 2/06: Notions of time and progress Vasiliki (Vasia) Kalavri vkalavri@bu.edu ## Mobile game application • input stream: user activity • 5752b4fce9120d6/p4_1.jpg) ## Notions of time ## • Processing time • the time of the local clock where an event is being processed • a processing-time window wouldn’t account for game activity while while the train is in the tunnel • results depend on the processing speed and aren’t deterministic ## • Event time • the time when an event actually happened • an event-time window would give you the extra0 码力 | 22 页 | 2.22 MB | 2 年前3
共 1000 条
- 1
- 2
- 3
- 4
- 5
- 6
- 100
相关搜索词
stream processingdata streamstream modelstream applicationreal-timeSpark StreamingFlink微批处理窗口语义分布式文件系统Apache RocketMQApache Flink分布式流数据平台流处理生态系统项目流数据处理发布/订阅系统Pub/Sub数据流处理消息队列Skew MitigationPartitioningLoad BalancingHybrid PartitioningLossy Countingstate managementkeyed stateoperator state流处理优化数据流图状态管理并行性编译器优化Window operatorsTime windowsWindow assignersTriggersKeyed vs non-keyed windows流处理系统分布式系统Apache KafkaProcessing timeEvent timeWatermarksStream progressAcknowledgment













