Apache Spark - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

Scalable Stream Processing - Spark Streaming and Flink

Scalable Stream Processing - Spark Streaming and Flink Amir H. Payberah payberah@kth.se 05/10/2018 The Course Web Page https://id2221kth.github.io 1 / 79 Where Are We? 2 / 79 Stream Processing Systems Outline ▶ Spark streaming ▶ Flink 4 / 79 Spark Streaming 5 / 79 Contribution ▶ Design issues • Continuous vs. micro-batch processing • Record-at-a-Time vs. declarative APIs 6 / 79 Spark Streaming RDDs and processes them using RDD operations. • Discretized Stream Processing (DStream) 7 / 79 Spark Streaming ▶ Run a streaming computation as a series of very small, deterministic batch jobs. • Chops

0 码力 | 113 页 | 1.22 MB | 1 年前
3
Streaming in Apache Flink

EIT Summer School 2019 Apache Flink Based on https://training.ververica.com Maximilian Michels apache.org> Software Engineer / Consultant  Committer @ Apache Beam / Apache Flink @stadtlegende @stadtlegende Dr Paris Carbone Senior Researcher @ RISE Committer @ Apache Flink @SenorCarbone Contents • DataSet API • DataStream API • Concepts • Set up an environment to develop Implement streaming data processing pipelines • Flink managed state • Event time Streaming in Apache Flink • Streams are natural • Events of any type like sensors, click streams, logs • Batch processing

0 码力 | 45 页 | 3.00 MB | 1 年前
3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring

Kalavri  vkalavri@bu.edu Spring 2020 1/30: Introduction to Apache Flink and Apache Kafka Vasiliki Kalavri | Boston University 2020 Apache Flink • An open-source, distributed data analysis framework file:///home/user/wordcount_out Run with a class entry point and arguments:  ./bin/flink run -c org.apache.flink.examples.java.wordcount.WordCount \  ./examples/batch/WordCount.jar Kalavri | Boston University 2020 Resources • Documentation • https://flink.apache.org/ • Community • https://flink.apache.org/community.html#mailing-lists • Conference • http://flink-forward.org/

0 码力 | 26 页 | 3.33 MB | 1 年前
3
Apache Flink的过去、现在和未来

Apache Flink的过去、现在和未来杨克特（鲁尼）阿里巴巴高级技术专家过去一切从2014年开始 2009 - 2014 2014 • 柏林工业大学博士生项目 • 基于流式 runtime 的批处理引擎 • 2014 年 8 月份发布 Flink 0.6.0 Flink 0.7 Runtime Distributed Streaming Dataflow DataStream Processing & Streaming Analytics Event-driven Applications ✔ ✔ ✔ 扫码加入社群与志同道合的码友一起 Code Up 阿里云开发者社区 Apache Flink China 2群粘贴二维码谢谢！

0 码力 | 33 页 | 3.36 MB | 1 年前
3
监控Apache Flink应用程序(入门)

监控Apache Flink应用程序(入门) caolei Exported on 01/10/2020 caolei – 监控Apache Flink应用程序(入门) – 2 Table of Contents 1 Flink指标体系 ...................................................................... ............................................................................... 21 caolei – 监控Apache Flink应用程序(入门) – 3 4.13.2.1 Key Metrics ..................................................... caolei – 监控Apache Flink应用程序(入门) – 4 原文地址：https://www.ververica.com/blog/monitoring-apache-flink-applications-101 这篇博文介绍了Apache Flink内置的监控和度量系统，通过该系统，开发人员可以有效地监控他们的Flink作业。通常，对于一个刚刚开始使用Apache Flink进行

0 码力 | 23 页 | 148.62 KB | 1 年前
3
Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring

Vasiliki (Vasia) Kalavri  vkalavri@bu.edu Spring 2020 3/24: Exactly-once fault-tolerance in Apache Flink ??? Vasiliki Kalavri | Boston University 2020 Some slides in this lecture have been generously Protocol Output Logs 38 ??? Vasiliki Kalavri | Boston University 2020 Asynchronous checkpoints in Apache Flink 39 ??? Vasiliki Kalavri | Boston University 2020 40 • A source of increasing numbers partitioned the position up to which they were consumed when the checkpoint was taken. • Event logs like Apache Kafka can provide records from a previous offset of the stream. 43 ??? Vasiliki Kalavri | Boston

0 码力 | 81 页 | 13.18 MB | 1 年前
3
【05 计算平台蓉荣】Flink 批处理及其应⽤

Flink 批处理理及其应⽤用 What is Apache Flink * Apache Flink 是⼀一个分布式⼤大数据处理理引擎 * 可对有限数据流和⽆无限数据流进⾏行行有状态计算 * 可部署在各种集群环境 * 对各种⼤大⼩小的数据规模进⾏行行快速计算为什什么Flink能做批处理理 Table Stream Bounded Data Unbounded Data Data SQL Runtime SQL ⾼高吞吐低延时 Hive vs. Spark vs. Flink Batch Hive/Hadoop Spark Flink 模型 MR MR(Memory/Disk) Pipeline 吞吐 TB-PB TB-PB 未经⼤大规模⽣生产验证性能⼀一般(分钟⼩小时级别) 快(秒级) 优秀 x2 稳定性好⼀一般已在阿⾥里里内部验证

0 码力 | 12 页 | 1.44 MB | 1 年前
3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Profitability A A’ Spark Streaming • Treat streaming computation as a series of deterministic batch computations on small time intervals • Keep intermediate state in memory • Use Spark's RDDs instead network buffer for each receiving task that any of its tasks need to send data to. Batching in Apache Flink • The TaskManagers ship data from sending tasks to receiving tasks. • The network component computation at scale (SOSP ’13). • Fabian Hueske, and Vasiliki Kalavri. Stream Processing with Apache Flink. (O’Reilly Media ’19). Lecture references ??? Vasiliki Kalavri | Boston University 2020

0 码力 | 54 页 | 2.83 MB | 1 年前
3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Systems 2000 1992 2013 MapReduce 2004 Tapestry NiagaraCQ Aurora TelegraphCQ STREAM Naiad Spark Streaming Samza Flink Millwheel Storm S4 Google Dataflow Now Evolution of Stream Processing max("temp")  maxTemp.print()  env.execute("Compute max sensor temperature”)  }  } Example: Apache Flink DataStream API 42 Vasiliki Kalavri | Boston University 2020 Relational Streaming vs. Dataflow

0 码力 | 45 页 | 1.22 MB | 1 年前
3
Flink如何实时分析Iceberg数据湖的CDC数据

支持并DBA 吞吐量i够大 pTBA PartitionOBucket级C 并DMerge-On-Rea- Mkh取支持I量P取便于进一步数RTransform I量h Apache Iceberg asic Data Metadata Database Table Partition Spec Manifest File TableMetadata Snapshot 3、kDCDC增量拉T相关Tab1e API接口。 Iceberg内uAS 1、实现CDCmi自动合并和g动合并对接；、kDF1i3k增量拉TCDCmi的能力。 F1i3k集成 1、Spark Strea2i3g 对接CDC写F链路、Presto等bl对接t询链路。 3、借助axA11uxioP速mit询。 I他生态集成谢谢谢谢谢谢

0 码力 | 36 页 | 781.69 KB | 1 年前
3

共 20 条前往

页

分类

语言

格式

Scalable Stream Processing - Spark Streaming and Flink

Streaming in Apache Flink

Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring

Apache Flink的过去、现在和未来

监控Apache Flink应用程序(入门)

Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring

【05 计算平台蓉荣】Flink 批处理及其应⽤

Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Flink如何实时分析Iceberg数据湖的CDC数据