Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 1/23: Stream Processing Fundamentals Vasiliki Kalavri | Boston University University 2020 What is a stream? • In traditional data processing applications, we know the entire dataset in advance, e.g. tables stored in a database. A data stream is a data set that is produced incrementally incrementally over time, rather than being available in full before its processing begins. • Data streams are high-volume, real-time data that might be unbounded • we cannot store the entire stream0 码力 | 45 页 | 1.22 MB | 1 年前3
Service Mesh的延伸 — 论道Database MeshService Mesh的延伸 之论道Database Mesh 分享人:张亮 日期:2018年07月25日Service Mesh风头正劲Service Mesh产品多样化Service Mesh的优势 云原生 零入侵 可观察性 面向运维服务化之后,数据库怎么办? 服务 • 无状态 • 根据规则路由 • 业务方处理事务 数据库 • 有状态 • 根据SQL路由 • 数据库自动处理事务数据库的进化趋势 • SQL • ACID+BASE • 分布式 NewSQLNewSQL的分类 New Architecture Transparent Sharding Middleware Database-as-a-Service What's Really New with NewSQL?数据库中间层的优势 系统 •事务 运维 • DBA 开发 • SQL数据库中间层应具备的能力 Sidecar 数据库 任意 单一 单一 连接数 高 低 高 异构语言 仅Java 任意 任意 性能 损耗低 损耗略高 损耗低 无中心化 是 否 是 静态入口 无 有 无 Sidecar的优势Database Mesh架构图Sharding-Sphere 核心功能 数据分片 分布式事务 数据库治理 弹性伸缩 管控界面 实现方案 Sharding-JDBC Sharding-Proxy Sharding-Sidecar0 码力 | 35 页 | 4.56 MB | 6 月前3
Scalable Stream Processing - Spark Streaming and FlinkScalable Stream Processing - Spark Streaming and Flink Amir H. Payberah payberah@kth.se 05/10/2018 The Course Web Page https://id2221kth.github.io 1 / 79 Where Are We? 2 / 79 Stream Processing Systems Design Design Issues ▶ Continuous vs. micro-batch processing ▶ Record-at-a-Time vs. declarative APIs 3 / 79 Outline ▶ Spark streaming ▶ Flink 4 / 79 Spark Streaming 5 / 79 Contribution ▶ Design issues issues • Continuous vs. micro-batch processing • Record-at-a-Time vs. declarative APIs 6 / 79 Spark Streaming ▶ Run a streaming computation as a series of very small, deterministic batch jobs. • Chops0 码力 | 113 页 | 1.22 MB | 1 年前3
【04 RocketMQ 王鑫】Stream Processing with Apache RocketMQ and Apache Flink0 码力 | 30 页 | 24.22 MB | 1 年前3
Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020??? Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/16: Skew mitigation ??? Vasiliki Kalavri | Uddin Nasir et. al. The power of both choices: Practical load balancing for distributed stream processing engines. ICDE 2015. • Mitzenmacher, Michael. The power of two choices in randomized load balancing0 码力 | 31 页 | 1.47 MB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 2/25: State Management Vasiliki Kalavri | Boston to an operator task, i.e. records processed by the same parallel task have access to the same state • It cannot be accessed by other parallel tasks of the same or different operators Keyed state is management • checkpointing state to remote and persistent storage, e.g. a distributed filesystem or a database system • Available state backends in Flink: • In-memory • File system • RocksDB State backends0 码力 | 24 页 | 914.13 KB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/14: Stream processing optimizations ??? Vasiliki Kalavri | Boston University context of streaming? • queries run continuously • streams are unbounded • In traditional ad-hoc database queries, the query plan is generated on- the-fly. Different plans can be used for two consecutive serialization cost • if operators are separate, throughput is bounded by either communication or processing cost • if fused, throughput is determined by operator cost only Operator fusion A B A B0 码力 | 54 页 | 2.83 MB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 2/11: Windows and Triggers Vasiliki Kalavri | Boston applied on a keyed or a non-keyed stream: • Window operators on keyed windows are evaluated in parallel • Non-keyed windows are processed in a single thread To create a window operator, you need to windowing use cases: • They assign an element based on its event-time timestamp or the current processing time to windows. • Time windows have a start and an end timestamp. • All built-in window assigners0 码力 | 35 页 | 444.84 KB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 1/21: Introduction Vasiliki Kalavri | Boston University course, you will hopefully: • know when to use stream processing vs other technology • be able to comprehensively compare features and processing guarantees of streaming systems • be proficient in using end-to-end, scalable, and reliable streaming applications • have a solid understanding of how stream processing systems work and what factors affect their performance • be aware of the challenges and trade-offs0 码力 | 34 页 | 2.53 MB | 1 年前3
Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020| Boston University 2020 Vasiliki (Vasia) Kalavri vkalavri@bu.edu CS 591 K1: Data Stream Processing and Analytics Spring 2020 2/06: Notions of time and progress Vasiliki Kalavri | Boston University minute? 4 Vasiliki Kalavri | Boston University 2020 • Processing time • the time of the local clock where an event is being processed • a processing-time window wouldn’t account for game activity while while the train is in the tunnel • results depend on the processing speed and aren’t deterministic • Event time • the time when an event actually happened • an event-time window would give you the0 码力 | 22 页 | 2.22 MB | 1 年前3
共 342 条
- 1
- 2
- 3
- 4
- 5
- 6
- 35













