Scalable Stream Processing - Spark Streaming and FlinkScalable Stream Processing - Spark Streaming and Flink Amir H. Payberah payberah@kth.se 05/10/2018 The Course Web Page https://id2221kth.github.io 1 / 79 Where Are We? 2 / 79 Stream Processing Systems Outline ▶ Spark streaming ▶ Flink 4 / 79 Spark Streaming 5 / 79 Contribution ▶ Design issues • Continuous vs. micro-batch processing • Record-at-a-Time vs. declarative APIs 6 / 79 Spark Streaming RDDs and processes them using RDD operations. • Discretized Stream Processing (DStream) 7 / 79 Spark Streaming ▶ Run a streaming computation as a series of very small, deterministic batch jobs. • Chops0 码力 | 113 页 | 1.22 MB | 1 年前3
Streaming in Apache FlinkEIT Summer School 2019 Apache Flink Based on https://training.ververica.com Maximilian Michelsapache.org> Software Engineer / Consultant Committer @ Apache Beam / Apache Flink @stadtlegende @stadtlegende Dr Paris Carbone Senior Researcher @ RISE Committer @ Apache Flink @SenorCarbone Contents • DataSet API • DataStream API • Concepts • Set up an environment to develop Implement streaming data processing pipelines • Flink managed state • Event time Streaming in Apache Flink • Streams are natural • Events of any type like sensors, click streams, logs • Batch processing 0 码力 | 45 页 | 3.00 MB | 1 年前3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics SpringKalavri vkalavri@bu.edu Spring 2020 1/30: Introduction to Apache Flink and Apache Kafka Vasiliki Kalavri | Boston University 2020 Apache Flink • An open-source, distributed data analysis framework file:///home/user/wordcount_out Run with a class entry point and arguments: ./bin/flink run -c org.apache.flink.examples.java.wordcount.WordCount \ ./examples/batch/WordCount.jar Kalavri | Boston University 2020 Resources • Documentation • https://flink.apache.org/ • Community • https://flink.apache.org/community.html#mailing-lists • Conference • http://flink-forward.org/0 码力 | 26 页 | 3.33 MB | 1 年前3
Apache Flink的过去、现在和未来Apache Flink的过去、现在和未来 杨克特(鲁尼) 阿里巴巴高级技术专家 过去 一切从2014年开始 2009 - 2014 2014 • 柏林工业大学博士生项目 • 基于流式 runtime 的批处理引擎 • 2014 年 8 月份 发布 Flink 0.6.0 Flink 0.7 Runtime Distributed Streaming Dataflow DataStream Processing & Streaming Analytics Event-driven Applications ✔ ✔ ✔ 扫码加入社群 与志同道合的码友一起 Code Up 阿里云开发者社区 Apache Flink China 2群 粘贴二维码 谢谢!0 码力 | 33 页 | 3.36 MB | 1 年前3
监控Apache Flink应用程序(入门)监控Apache Flink应用程序(入门) caolei Exported on 01/10/2020 caolei – 监控Apache Flink应用程序(入门) – 2 Table of Contents 1 Flink指标体系 ...................................................................... ............................................................................... 21 caolei – 监控Apache Flink应用程序(入门) – 3 4.13.2.1 Key Metrics ..................................................... caolei – 监控Apache Flink应用程序(入门) – 4 原文地址:https://www.ververica.com/blog/monitoring-apache-flink-applications-101 这篇博文介绍了Apache Flink内置的监控和度量系统,通过该系统,开发人员可以有效地监控他们的Flink作 业。通常,对于一个刚刚开始使用Apache Flink进行0 码力 | 23 页 | 148.62 KB | 1 年前3
Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics SpringVasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 3/24: Exactly-once fault-tolerance in Apache Flink ??? Vasiliki Kalavri | Boston University 2020 Some slides in this lecture have been generously Protocol Output Logs 38 ??? Vasiliki Kalavri | Boston University 2020 Asynchronous checkpoints in Apache Flink 39 ??? Vasiliki Kalavri | Boston University 2020 40 • A source of increasing numbers partitioned the position up to which they were consumed when the checkpoint was taken. • Event logs like Apache Kafka can provide records from a previous offset of the stream. 43 ??? Vasiliki Kalavri | Boston0 码力 | 81 页 | 13.18 MB | 1 年前3
【05 计算平台 蓉荣】Flink 批处理及其应⽤Flink 批处理理及其应⽤用 What is Apache Flink * Apache Flink 是⼀一个分布式⼤大数据处理理引擎 * 可对有限数据流和⽆无限数据流进⾏行行有状态计算 * 可部署在各种集群环境 * 对各种⼤大⼩小的数据规模进⾏行行快速计算 为什什么Flink能做批处理理 Table Stream Bounded Data Unbounded Data Data SQL Runtime SQL ⾼高吞吐 低延时 Hive vs. Spark vs. Flink Batch Hive/Hadoop Spark Flink 模型 MR MR(Memory/Disk) Pipeline 吞吐 TB-PB TB-PB 未经⼤大规模⽣生产验证 性能 ⼀一般(分钟⼩小时级别) 快(秒级) 优秀 x2 稳定性 好 ⼀一般 已在阿⾥里里内部验证0 码力 | 12 页 | 1.44 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020Profitability A A’ Spark Streaming • Treat streaming computation as a series of deterministic batch computations on small time intervals • Keep intermediate state in memory • Use Spark's RDDs instead network buffer for each receiving task that any of its tasks need to send data to. Batching in Apache Flink • The TaskManagers ship data from sending tasks to receiving tasks. • The network component computation at scale (SOSP ’13). • Fabian Hueske, and Vasiliki Kalavri. Stream Processing with Apache Flink. (O’Reilly Media ’19). Lecture references ??? Vasiliki Kalavri | Boston University 20200 码力 | 54 页 | 2.83 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020Systems 2000 1992 2013 MapReduce 2004 Tapestry NiagaraCQ Aurora TelegraphCQ STREAM Naiad Spark Streaming Samza Flink Millwheel Storm S4 Google Dataflow Now Evolution of Stream Processing max("temp") maxTemp.print() env.execute("Compute max sensor temperature”) } } Example: Apache Flink DataStream API 42 Vasiliki Kalavri | Boston University 2020 Relational Streaming vs. Dataflow0 码力 | 45 页 | 1.22 MB | 1 年前3
Flink如何实时分析Iceberg数据湖的CDC数据支持并DBA 吞吐量i够大 pTBA PartitionOBucket级C 并DMerge-On-Rea- Mkh取 支持I量P取便于进一 步数RTransform I量h Apache Iceberg asic Data Metadata Database Table Partition Spec Manifest File TableMetadata Snapshot 3、kDCDC增量拉T相关Tab1e API接 口。 Iceberg内uAS 1、实现CDCmi自动合并和g动合并对 接; 、kDF1i3k增量拉TCDCmi的能力 。 F1i3k集成 1、Spark Strea2i3g 对接CDC写F链 路 、Presto等bl对接t询链路。 3、借助axA11uxioP速mit询。 I他生态集成 谢谢 谢谢 谢谢0 码力 | 36 页 | 781.69 KB | 1 年前3
共 20 条
- 1
- 2













