Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020## CS 591 K1: Data Stream Processing and Analytics Spring 2020 2/06: Notions of time and progress Vasiliki (Vasia) Kalavri vkalavri@bu.edu ## Mobile game application • input stream: user activity • How long do we have to wait before we decide that we have seen all events? ## Watermarks ## Stream progress  http://streamingbook 3b52765752b4fce9120d6/p10_1.jpg) http://streamingbook.net/fig/2-9 • A watermark is a global progress metric that indicates a certain point in time when we are confident that no more delayed events0 码力 | 22 页 | 2.22 MB | 2 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020## CS 591 K1: Data Stream Processing and Analytics Spring 2020 ## 1 /23: Stream Processing Fundamentals Vasiliki (Vasia) Kalavri vkalavri@bu.edu ## What is a stream? - In traditional data processing processing applications, we know the entire dataset in advance, e.g. tables stored in a database. A data stream is a data set that is produced incrementally over time, rather than being available in full before high-volume, real-time data that might be unbounded • we cannot store the entire stream in an accessible way • we have to process stream elements on-the-fly using limited memory ## Properties of data streams0 码力 | 45 页 | 1.22 MB | 2 年前3
Scalable Stream Processing - Spark Streaming and Flink## Scalable Stream Processing - Spark Streaming and Flink Amir H. Payberah payberah@kth.se 05/10/2018 https://id2221kth.github.io ## Data Processing Graph Data Pregel, GraphLab, PowerGraph GraphX GraphX, X-Stream, Chaos Batch Data MapReduce, Dryad FlumeJava, Spark Structured Data Spark SQL Machine Learning Mliib Tensorflow Streaming Data Storm, SEEP, Naiad, Spark Streaming, Flink, Millwheel BigTable, Cassandra ## Distributed Messaging Systems Kafka Resource Management Mesos. YARN ## Stream Processing Systems Design Issues ▶ Continuous vs. micro-batch processing Record-at-a-Time vs. declarative0 码力 | 113 页 | 1.22 MB | 2 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020# CS 591 K1: Data Stream Processing and Analytics Spring 2020 ## 1 /28: Stream ingestion and pub/sub systems Vasiliki (Vasia) Kalavri vkalavri@bu.edu ## Streaming sources  Where do stream processors read data from? Files, e.g. transaction logs Sockets IoT devices and sensors Databases and KV stores Message queues be producing too fast • stream processor needs to keep up and not shed load • might be producing too slow or become idle • stream processor should be able to make progress • might fail (or seem as0 码力 | 33 页 | 700.14 KB | 2 年前3
【04 RocketMQ 王鑫】Stream Processing with Apache RocketMQ and Apache FlinkStream Processing with Apache RocketMQ and Apache Flink 王鑫 · The Apache Software Foundation Nov.4, 2018, Shanghai, Apache Flink China Meetup 0 码力 | 30 页 | 24.22 MB | 2 年前3
Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020# CS 591 K1: Data Stream Processing and Analytics Spring 2020 4/16: Skew mitigation Vasiliki (Vasia) Kalavri vkalavri@bu.edu ## Key partitioning  > δ*N, where N is the number of stream elements • The solution will not contain any item y with frequency: \\delta^{\*}N $|| ## Notation (I) Input: a stream of items N: number of items in the stream $ f_{e} $ : true frequency of the item e in the input stream f: estimated frequency of item δ: user-defined0 码力 | 31 页 | 1.47 MB | 2 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020# CS 591 K1: Data Stream Processing and Analytics Spring 2020 2/25: State Management Vasiliki (Vasia) Kalavri vkalavri@bu.edu ## State in dataflow computations Any non-trivial streaming computation and key the stream on the sensor ID val keyedData: KeyedStream[Reading, String] = sensorData .keyBy(_ .id) KeyedStream // apply a stateful FlatMapFunction on the keyed stream val alerts: resources • Working with State: https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/state/state.html • Managing State in Apache Flink - Tzu-Li (Gordon) Tai: https://www.youtube.com/watch0 码力 | 24 页 | 914.13 KB | 2 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020## CS 591 K1: Data Stream Processing and Analytics Spring 2020 ## 4 /14: Stream processing optimizations Vasiliki (Vasia) Kalavri vkalavri@bu.edu ## Topics covered in this lecture • Costs of streaming 2db614d10387ee7/p3_1.jpg) ## Revisiting the basics A series of transformations on streams in Stream SQL, Scala, Python, Rust, Java...  - Stateful operators maintain state that reflect part of the stream history they have seen • windows, continuous aggregations, distinct... - State is commonly partitioned0 码力 | 54 页 | 2.83 MB | 2 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020## CS 591 K1: Data Stream Processing and Analytics Spring 2020 2/11: Windows and Triggers Vasiliki (Vasia) Kalavri vkalavri@bu.edu ## Window operators • Practical way to perform operations on unbounded ingest sensor stream val sensorData: DataStream[SensorReading] = env.addSource(...) } } ### Keyed vs. non-keyed windows Window operators can be applied on a keyed or a non-keyed stream: • Window need to specify two window components: • A window assigner determines how the elements of the input stream are grouped into windows. A window assigner produces a WindowedStream (or All WindowedStream if applied0 码力 | 35 页 | 444.84 KB | 2 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020## CS 591 K1: Data Stream Processing and Analytics Spring 2020 1/21: Introduction Vasiliki (Vasia) Kalavri vkalavri@bu.edu ## Course Information • Instructor: Vasiliki Kalavri • Office: MCS 206 • ab5473/p5_4.jpg) ## Outcomes At the end of the course, you will hopefully: • know when to use stream processing vs other technology • be able to comprehensively compare features and processing guarantees build end-to-end, scalable, and reliable streaming applications • have a solid understanding of how stream processing systems work and what factors affect their performance • be aware of the challenges and0 码力 | 34 页 | 2.53 MB | 2 年前3
共 1000 条
- 1
- 2
- 3
- 4
- 5
- 6
- 100
相关搜索词
Processing timeEvent timeWatermarksStream progressAcknowledgmentstream processingdata streamstream modelstream applicationreal-timeSpark StreamingFlink微批处理窗口语义分布式文件系统流数据处理发布/订阅系统Pub/Sub数据流处理消息队列Apache RocketMQApache Flink分布式流数据平台流处理生态系统项目Skew MitigationPartitioningLoad BalancingHybrid PartitioningLossy Countingstate managementkeyed stateoperator state流处理优化数据流图状态管理并行性编译器优化Window operatorsTime windowsWindow assignersTriggersKeyed vs non-keyed windows流处理系统分布式系统Apache Kafka













