Scalable Stream Processing - Spark Streaming and FlinkScalable Stream Processing - Spark Streaming and Flink Amir H. Payberah payberah@kth.se 05/10/2018 The Course Web Page https://id2221kth.github.io 1 / 79 Where Are We? 2 / 79 Stream Processing Systems Outline ▶ Spark streaming ▶ Flink 4 / 79 Spark Streaming 5 / 79 Contribution ▶ Design issues • Continuous vs. micro-batch processing • Record-at-a-Time vs. declarative APIs 6 / 79 Spark Streaming RDDs and processes them using RDD operations. • Discretized Stream Processing (DStream) 7 / 79 Spark Streaming ▶ Run a streaming computation as a series of very small, deterministic batch jobs. • Chops0 码力 | 113 页 | 1.22 MB | 1 年前3
Spark 简介以及与 Hadoop 的对比Spark 简介以及与 Hadoop 的对比 1 Spark 简介 1.1 Spark 概述 Spark 是 UC Berkeley AMP lab 所开源的类 Hadoop MapReduce 的通用的并行计算框 架,Spark 基于 map reduce 算法实现的分布式计算,拥有 Hadoop MapReduce 所具有的 优点;但不同于 MapReduce 的是 Job 中间输出和结果可以保存在内存中,从而不再需要读 写 HDFS,因此 Spark 能更好地适用于数据挖掘与机器学习等需要迭代的 map reduce 的算 法。 1.2 Spark 核心概念 1.2.1 弹性分布数据集(RDD) RDD 是 Spark 的最基本抽象,是对分布式内存的抽象使用,实现了以操作本地集合的方式 来操作分布式数据集的抽象实现。RDD 是 Spark 最核心的东西,它表示已被分区,不可变的 的操作不是马上执行,Spark 在遇 到 Transformations 操作时只会记录需要这样的操作,并不会去执行,需要等到有 Actions 操作的时候才会真正启动计算过程进行计算。 2. 操作(Actions) (如:count, collect, save 等),Actions 操作会返回结果或把 RDD 数据写 到存储系统中。Actions 是触发 Spark 启动计算的动因。0 码力 | 3 页 | 172.14 KB | 1 年前3
MATLAB与Spark/Hadoop相集成:实现大数据的处理和价值挖MathWorks, Inc. MATLAB与Spark/Hadoop相集成:实现大 数据的处理和价值挖 马文辉 2 内容 ▪ 大数据及其带来的挑战 ▪ MATLAB大数据处理 ➢ tall数组 ➢ 并行与分布式计算 ▪ MATLAB与Spark/Hadoop集成 ➢ MATLAB访问HDFS(Hadoop分布式文件系统) ➢ 在Spark/Hadoop集群上运行MATLAB代码 ▪ MapReduce (MDCS/PCT) ▪ MATLAB API for Spark API ▪ Tall Arrays ▪ 计算 ▪ Desktop (Multicore, GPU) ▪ Clusters ▪ Cloud Computing (MDCS on EC2) ▪ Hadoop ▪ Spark ▪ 内存与数据访问 ▪ 64-bit processors ▪ Memory Parallel Computing Toolbox) ▪ MATLAB集群之上的分布式计算 (MDCS, MATLAB Distributed Computing Server) 9 MATLAB与Spark/Hadoop集成 MDCS 10 Hadoop Hadoop是跨计算机集群的分布式大数据处理平台,由两部分组成: • YARN (Yet Another Resource Negotiator)0 码力 | 17 页 | 1.64 MB | 1 年前3
GSoC 2020 Apache Proposal
Apache RocketMQ Scaler for KEDAGSoC 2020 Apache Proposal Apache RocketMQ Scaler for KEDA Application Name : Hien Nguyen University : Haaga-Helia University of Applied Sciences - Bachelor of Information Technology - (Location: Test, DevOps, Distributed system, Cloud(AWS, Azure) , Golang, Maven, Docker, Kubernetes GSoC - Apache RocketMQ Scaler for KEDA proposal Context KEDA allows for fine-grained autoscaling (including MySQL, RocketMQ,etc; multiple workloads type(jobs,deployments,trigger) - KEDA does not support Apache RocketMQ now. So we need to create PR in KEDA repo for new support for RocketMQ - KEDA has event-driven0 码力 | 7 页 | 140.48 KB | 1 年前3
Apache APISIX RoadmapApache APISIX Roadmap 王院生 co-founder & CTO of API7.ai Member of Apache APISIX PMC 01 About me 02 What we did in APISIX V2 03 What we will do in APISIX V3 04 Enjoy APISIX way CONTENT W r i t e e h e r e S o m e t h i n g a b o u t About me 01 • Yuansheng Wang • Apache APISIX PMC member • 《OpenResty Best Practices》 • API7.ai co-founder & CTO What we did in APISIX V2 02 • 丰富插件 • 70+0 码力 | 26 页 | 2.68 MB | 1 年前3
Streaming in Apache FlinkEIT Summer School 2019 Apache Flink Based on https://training.ververica.com Maximilian Michelsapache.org> Software Engineer / Consultant Committer @ Apache Beam / Apache Flink @stadtlegende @stadtlegende Dr Paris Carbone Senior Researcher @ RISE Committer @ Apache Flink @SenorCarbone Contents • DataSet API • DataStream API • Concepts • Set up an environment to develop Implement streaming data processing pipelines • Flink managed state • Event time Streaming in Apache Flink • Streams are natural • Events of any type like sensors, click streams, logs • Batch processing 0 码力 | 45 页 | 3.00 MB | 1 年前3
Apache RocketMQ 介绍链滴 Apache RocketMQ 介绍 作者:boccaro 原文链接:https://ld246.com/article/1588041859812 来源网站:链滴 许可协议:署名-相同方式共享 4.0 国际 (CC BY-SA 4.0) Apache RocketMQ 介绍 概要 Apache RocketMQ是一个分布式消息传递和流媒体平台,具有低延迟,高性能和可靠性,万亿级容 ,万亿级容 和灵活的可伸缩性。它的一个重要特性是支持非日志类型的可靠消息传送,非常适合运用在金融和电 商务领域。目前他是Apache社区的顶级项目,在全球有超过100家公司在其业务中使用RocketMQ 开源版本。 诞生 RocketMQ起源于阿里巴巴。阿里巴巴最初由于业务需求,需要使用消息中间件。早期使用过Notify ActiveMQ等。但在需求不断膨胀的情况下,ActiveMQ IO模块遇到了瓶颈,几经努力但改善成果不 2万亿个并发在线消息传输, 后阿里巴巴将RocketMQ捐献给Apache Incubator。 2017年9月25日 – Apache软件基金会,连同350多个开源项目的全体志愿者、开发人员、管理人员 和孵化项目组织,宣布Apache®RocketMQ™从Apache孵化器毕业成为顶级项目,这表明该项目的 区和产品已根据ASF的精英流程和原则得到了很好的管理。 现今,Apache RocketMQ在社区各方面的努力下,茁壮发展,很多功能都得到了加强。0 码力 | 5 页 | 375.48 KB | 1 年前3
【04 RocketMQ 王鑫】Stream Processing with Apache RocketMQ and Apache Flink0 码力 | 30 页 | 24.22 MB | 1 年前3
Apache Kyuubi 1.7.0 DocumentationWelcome Apache Kyuubi™ is a distributed and multi-tenant gateway to provide serverless SQL on Data Warehouses and Lakehouses. Kyuubi builds distributed SQL query engines on top of various kinds of modern modern computing frameworks, e.g., Apache Spark [https://spark.apache.org/], Flink [https://flink.apache.org/], Doris [https://doris.apache.org/], Hive [https://hive.apache.org/], and Trino [https://trino components above to build a modern data stack. For example, you can use Kyuubi, Spark and Iceberg [https://iceberg.apache.org/] to build and manage Data Lakehouse with pure SQL for both data processing0 码力 | 400 页 | 5.25 MB | 1 年前3
Apache Kyuubi 1.7.2 DocumentationWelcome Apache Kyuubi™ is a distributed and multi-tenant gateway to provide serverless SQL on Data Warehouses and Lakehouses. Kyuubi builds distributed SQL query engines on top of various kinds of modern modern computing frameworks, e.g., Apache Spark [https://spark.apache.org/], Flink [https://flink.apache.org/], Doris [https://doris.apache.org/], Hive [https://hive.apache.org/], and Trino [https://trino components above to build a modern data stack. For example, you can use Kyuubi, Spark and Iceberg [https://iceberg.apache.org/] to build and manage Data Lakehouse with pure SQL for both data processing0 码力 | 405 页 | 5.26 MB | 1 年前3
共 254 条
- 1
- 2
- 3
- 4
- 5
- 6
- 26













