Flink - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

Streaming in Apache Flink

School 2019 ## Apache Flink Based on https://training.ververica.com Maximilian Michels Software Engineer / Consultant Committer @ Apache Beam / Apache Flink Dr Paris Carbone Flink @stadtlegende @SenorCarbone ## Contents • DataSet API • DataStream API • Concepts • Set up an environment to develop Flink programs • Implement Implement streaming data processing pipelines • Flink managed state • Event time ## Streaming in Apache Flink • Streams are natural • Events of any type like sensors, click streams, logs • Batch processing

0 码力 | 45 页 | 3.00 MB | 2 年前
3
Scalable Stream Processing - Spark Streaming and Flink

## Scalable Stream Processing - Spark Streaming and Flink Amir H. Payberah payberah@kth.se 05/10/2018 https://id2221kth.github.io ## Data Processing Graph Data Pregel, GraphLab, PowerGraph GraphX Spark SQL Machine Learning Mliib Tensorflow Streaming Data Storm, SEEP, Naiad, Spark Streaming, Flink, Millwheel, Google Dataflow ## Distributed File Systems ## Data Storage GFS, Flat FS NoSQL Databases Continuous vs. micro-batch processing Record-at-a-Time vs. declarative APIs ▶ Spark streaming ▶ Flink ## Spark Streaming ## ▶ Design issues • Continuous vs. micro-batch processing • Record-at-a-Time

0 码力 | 113 页 | 1.22 MB | 2 年前
3
Apache Flink的过去、现在和未来

## 阿里云 ## Apache Flink的过去、现在和未来杨克特（鲁尼）阿里巴巴高级技术专家 ## 过去 ## 一切从2014年开始 ![Image](/uploads/documents/4/4/7/6/44768622b352d818cb18d2791cad3421/p3_1.jpg) StratoSphere Above the Clouds ![Image](/u jpg) 2009 - 2014 2014 • 柏林工业大学博士生项目 - 基于流式 runtime 的批处理引擎 • 2014 年 8 月份发布 Flink 0.6.0 ## 阿里云 2019阿里云峰会·上海开发者大会 Flink 0.7 ## 2014 年 12 月份发布 – 开始正式支持 DataStream DataStream API Stream Processing 8d2791cad3421/p4_1.jpg) DataSet API Batch Processing Runtime Distributed Streaming Dataflow Flink 0.9 ## 2015 年 6 月份发布 – 开始内置支持 State ![Image](/uploads/documents/4/4/7/6/44768622b352d818cb18d2791cad3421/p5_1

0 码力 | 33 页 | 3.36 MB | 2 年前
3
【04 RocketMQ 王鑫】Stream Processing with Apache RocketMQ and Apache Flink

Stream Processing with Apache RocketMQ and Apache Flink 王鑫 · The Apache Software Foundation Nov.4, 2018, Shanghai, Apache Flink China Meetup ![Image](/uploads/documents/5/1/f/3/51f363b7213b31d1c40f613be3dd1945/p1_1 ents/5/1/f/3/51f363b7213b31d1c40f613be3dd1945/p3_3.jpg) Practices of integrating RocketMQ with Flink ![Image](/uploads/documents/5/1/f/3/51f363b7213b31d1c40f613be3dd1945/p3_4.jpg) The trend of RocketMQ ## Apache RocketMQ streaming ecosystem projects • RocketMQ-Flink: https://github.com/apache/rocketmq-externals/tree/master/rocketmq-flink • RocketMQ-Spark: https://github.com/apache/rocketmq-extern

0 码力 | 30 页 | 24.22 MB | 2 年前
3
【05 计算平台蓉荣】Flink 批处理及其应⽤

## Flink 批处理及其应用 ## What is Apache Flink $ ^{*} $ Apache Flink 是一个分布式大数据处理引擎 $ ^{*} $ 可对有限数据流和无限数据流进行有状态计算 * 可部署在各种集群环境 * 对各种大小的数据规模进行快速计算 ## 为什么Flink能做批处理 ![Image](/uploads/documents/6/6/9 /6/9/c/669c3f986785b2bb826b4400092e6438/p3_3.jpg) 低延时 #### Hive vs. Spark vs. Flink Batch ||Hive/Hadoop|Spark|Flink| |---|---|---|---| |模型|MR|MR(Memory/Disk)|Pipeline| |吞吐|TB-PB|TB-PB|未经大规模生产验证| | |易用性|一般|易用|一般| |工具/生态|一般|丰富|一般| ## Flink Batch应用 - 数据湖 ### Data Lake vs. Data Warehouse ![Image](/uploads/documents/6/6/9/c/669c3f986785b2bb826b4400092e6438/p5_1.jpg) ## Flink Batch应用 – 数据湖 ![Image](/upl

0 码力 | 12 页 | 1.44 MB | 2 年前
3
Flink如何实时分析Iceberg数据湖的CDC数据

## Flink如何实时分析Iceberg数据湖的CDC数据阿里巴巴李劲松/胡争 FLINK FORWARD #ASIA 2020 #1 #2 #3 #4 常见的CDC 为何选择 Flink 如何实时写未来规划分析方案 + Iceberg 入读取 FLINK FORWARD #ASIA 2020 ## #1 常见的CDC分析方案 ## 离线 HBase 集群分析 CDC 2、HBase集群维护成本较高。 3、通过RegionServer定位HFile，Server的优化和缓存完全用不上。 4、数据格式绑定HFile，不方便拓展到Parquet、Avro、Orc等。 FLINK FORWARD #ASIA 2020 ## Apache Kudu 维护 CDC 数据集 ## MySQL ## 方案评估优点 1、支持实时更新数据，时效性佳。 2、列存加速，适合OLAP分析。 4、不支持增量拉取。 FLINK FORWARD #ASIA 2020 ## MySQL → GQOOP → HVE ## 方案评估优点 1、流程能工作 2、Hive存量数据不受增量数据影响。 ## 缺点 1、数据不是实时写入； 2、每次数据导致都要 MERGE 存量数据。T+1 方式更新，时效性差。 3、不支持实时upsert。 FLINK FORWARD #ASIA 2020

0 码力 | 36 页 | 781.69 KB | 2 年前
3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring

and Analytics Spring 2020 ## 1 /30: Introduction to Apache Flink and Apache Kafka Vasiliki (Vasia) Kalavri vkalavri@bu.edu ## Apache Flink • An open-source, distributed data analysis framework • True Data Set Operator Data Set Sink Source Data Stream Operator Data Stream Sink Writing a Flink Program 1. Bootstrap Sources 2. Apply Operators 3. Output to Sinks ## Streaming word count textStream keyBy(0) .sum(1) .print() (live,1) (and,1) (let,1) (live,2) ## Distributed architecture TaskManager Flink program web dashboard TaskManager client JobManager TaskManager ## DataStream API Basics ##

0 码力 | 26 页 | 3.33 MB | 2 年前
3
Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring

Data Stream Processing and Analytics Spring 2020 ## 3 /24: Exactly-once fault-tolerance in Apache Flink Vasiliki (Vasia) Kalavri vkalavri@bu.edu Go read his PhD thesis: http://kth.diva-portal.org/sm nts/0/a/a/4/0aa43070543cf30310bdd99235d1d629/p59_1.jpg) ## Asynchronous checkpoints in Apache Flink ![Image](/uploads/documents/0/a/a/4/0aa43070543cf30310bdd99235d1d629/p61_1.jpg) • A source of increasing consistency (in Apache Flink) can be achieved only if all streaming sources are re-settable ![Image](/uploads/documents/0/a/a/4/0aa43070543cf30310bdd99235d1d629/p67_1.jpg) - Flink checkpoints are initiated

0 码力 | 81 页 | 13.18 MB | 2 年前
3
Apache Kyuubi 1.6.0 Documentation

Side Extensions ## Connectors • Connectors Connectors for Spark SQL Query Engine Connectors For Flink SQL Query Engine Connectors for Hive SQL Query Engine Connectors For Trino SQL Engine ## Kyuubi release is delivered without a Spark tarball.| |Flink|Distributed SQL Engine|Optional|1.14.0 and above|By default Kyuubi binary release is delivered without a Flink tarball.| |Trino|Distributed SQL Engine|Optional|363 with other Spark/Flink/Trino compatible systems or plugins, you only need to take care of them as using them with regular Spark/Flink/Trino applications. For example, you can run Spark/Flink/Trino SQL engines

0 码力 | 391 页 | 5.41 MB | 2 年前
3
PyFlink 1.15 Documentation

Guide ..... 22 1.2.1 RealTime Feature ..... 22 1.2.1.1 Coming Soon ..... 22 1.2.2 PyFlink + Flink ML ..... 22 1.2.2.1 Coming Soon ..... 22 1.3 Frequently Asked Questions (FAQ) ..... 22 1.3.1 apache.flink.table.factories.DynamicTableFactory’ in the classpath ..... 26 1.3.4.2 O2: ClassNotFoundException: com.mysql.cj.jdbc.Driver ..... 29 1.3.4.3 O3: NoSuchMethodError: org.apache.flink.table OverflowError: timeout value is too large ..... 30 1.3.5.2 Q2: An error occurred while calling z:org.apache.flink.client.python.PythonEnvUtils.resetCallbackClient ..... 31 1.3.6 Data type issues ..... 31 1.3

0 码力 | 36 页 | 266.77 KB | 2 年前
3

共 112 条前往

页

分类

语言

格式

Streaming in Apache Flink

Scalable Stream Processing - Spark Streaming and Flink

Apache Flink的过去、现在和未来

【04 RocketMQ 王鑫】Stream Processing with Apache RocketMQ and Apache Flink

【05 计算平台蓉荣】Flink 批处理及其应⽤

Flink如何实时分析Iceberg数据湖的CDC数据

Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring

Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring

Apache Kyuubi 1.6.0 Documentation

PyFlink 1.15 Documentation

搜索

分类

语言

格式