Scalable Stream Processing - Spark Streaming and Flink## Scalable Stream Processing - Spark Streaming and Flink Amir H. Payberah payberah@kth.se 05/10/2018 https://id2221kth.github.io ## Data Processing Graph Data Pregel, GraphLab, PowerGraph GraphX Chaos Batch Data MapReduce, Dryad FlumeJava, Spark Structured Data Spark SQL Machine Learning Mliib Tensorflow Streaming Data Storm, SEEP, Naiad, Spark Streaming, Flink, Millwheel, Google Dataflow Issues ▶ Continuous vs. micro-batch processing Record-at-a-Time vs. declarative APIs ▶ Spark streaming ▶ Flink ## Spark Streaming ## ▶ Design issues • Continuous vs. micro-batch processing • Record-at-a-Time0 码力 | 113 页 | 1.22 MB | 2 年前3
Spark 简介以及与 Hadoop 的对比# Spark 简介以及与 Hadoop 的对比 ## 1 Spark 简介 ### 1.1 Spark 概述 Spark 是 UC Berkeley AMP lab 所开源的类 Hadoop MapReduce 的通用的并行计算框架,Spark 基于 map reduce 算法实现的分布式计算,拥有 Hadoop MapReduce 所具有的优点;但不同于 MapReduce 的是 Job 中间输出和结果可以保存在内存中,从而不再需要读写 HDFS,因此 Spark 能更好地适用于数据挖掘与机器学习等需要迭代的 map reduce 的算法。 ### 1.2 Spark 核心概念 #### 1.2.1 弹性分布数据集(RDD) RDD 是 Spark 的最基本抽象, 是对分布式内存的抽象使用, 实现了以操作本地集合的方式来操作分布式数据集的抽象实现。RDD 是 Spark 最核心的东西, 它表示已被分区, 不可变的并能够被并行操作的数据集合 RDD 的操作不是马上执行,Spark 在遇到 Transformations 操作时只会记录需要这样的操作,并不会去执行,需要等到有 Actions 操作的时候才会真正启动计算过程进行计算。 2. 操作(Actions) (如 : count, collect, save 等), Actions 操作会返回结果或把 RDD 数据写到存储系统中。Actions 是触发 Spark 启动计算的动因。 ####0 码力 | 3 页 | 172.14 KB | 2 年前3
MATLAB与Spark/Hadoop相集成:实现大数据的处理和价值挖MATLAB与Spark/Hadoop相集成:实现大数据的处理和价值挖 马文辉  ## 内容 ## 大数据及其带来的挑战 ## ■ MATLAB大数据处理 tall数组 并行与分布式计算 ## ■ MATLAB与Spark/Hadoop集成 MATLAB与Spark/Hadoop集成 MATLAB访问HDFS(Hadoop分布式文件系统) 在Spark/Hadoop集群上运行MATLAB代码 ## 应用演示-汽车传感器数据分析 ## 大数据概述 大数据的"4V"特征: - Volumes - 数据规模,数据规模巨大 互联网、社交网络的普及,全社会的数字化转型,数据规模向PB级发展 Variety - 数据种类,数据种类繁多 结构化数据,半结构化数据,非结构化数据 Arrays SPMD and Distributed Arrays MapReduce R2014b MapReduce (MDCS/PCT) R2014b - MATLAB API for Spark API R2016b Tall Arrays R2016b  above figure, with each layer loosely coupled to the other. For example, you can use Kyuubi, Spark and Apache Iceberg to build and manage Data Lake with pure SQL for both data processing e.g. ETL, and analytics create this project despite that the Spark Thrift JDBC/ODBC server already exists. 1. Supports multi-client concurrency and authentication 2. Supports one Spark application per account(SPA). 3. Supports0 码力 | 129 页 | 6.15 MB | 2 年前3
Apache OFBiz®Apache OFBiz $ ^{®} $ The Apache OFBiz Project Version Trunk ## Table of Contents 1. System requirements ..... 2 2. Quick start ..... 3 2.1. Download the Gradle wrapper: ..... 3 2.2. Prepare file in ASCIIDoc format you may want to see it at HTML or PDF format Welcome to Apache OFBiz! A powerful top level Apache software project. OFBiz is an Enterprise Resource Planning (ERP) System written reading section. ### 3. Security If you find a security issue, please report it to: security @ ofbiz.apache.org. Once proper mitigations to the security issues are complete the OFBiz team will disclose this0 码力 | 23 页 | 305.80 KB | 2 年前3
解读Apache## THE APACHE SOFTWARE FOUNDATION # 解读Apache 演讲者 Craig Russell Justin Mclean 姜宁 # 本演讲包含了 Bertrand Delacretaz, Roman Shaposhnik 以及其它ASF贡献者的工作 ## Craig Russell介绍 • 软件架构师 ■ Object Data Management Group (MySQL) • Apache Committer - from 2005 • Apache Member - from 2007 • Apache Secretary - from 2010-2019 • Apache 孵化器管理委员成员 • Apache 董事会主席 ## Apache 软件基金会: 世界上最大的开源基金会 ## Apache基金会的使命 Apache软基金会 (ASF) 501(c)(3) 注册公益性组织。ASF 的使命是通过向加入 ASF 的志同道合的软件项目社区提供服务,为公众提供开源软件。 ASF提供了一个独立于任何公司影响力的中立空间,保证其中的项目可以在商业友好Apache许可证2.0下繁荣发展,为公众利益创建开源软件。 公开成立于1999年的公益组织 使命: 为公众利益提供免费的软件。  is a US 501(c)(3) charitable organization. Its mission is to provide Open Source0 码力 | 40 页 | 6.27 MB | 2 年前3
Apache Explained## THE APACHE SOFTWARE FOUNDATION ## Apache Explained Presented by Craig Russell Justin Mclean Willem Jiang Including original work of Bertrand Delacretaz, Roman Shaposhnik and other amazing ASF contributors (MySQL) • Apache Committer - from 2005 • Apache Member - from 2007 • Apache Secretary - 2010 - 2019 • Member, Incubator Project Management Committee • Chairman, Apache Board of Directors ## Apache Software Software Foundation: The World's Largest Open Source Foundation ## The ASF's Mission The Apache Software Foundation (ASF) is a US 501(c)(3) charitable organization. Its mission is to provide Open0 码力 | 43 页 | 4.50 MB | 2 年前3
Apache Kyuubi 1.5.2 Documentationunified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark $ ^{™} $ . In general, the complete ecosystem of Kyuubi falls into the hierarchies shown in the above figure, with each layer loosely coupled to the other. For example, you can use Kyuubi, Spark and Apache Iceberg to build and manage Data Lake with pure SQL for both data processing e.g. ETL, and analytics create this project despite that the Spark Thrift JDBC/ODBC server already exists. 1. Supports multi-client concurrency and authentication 2. Supports one Spark application per account(SPA). 3. Supports0 码力 | 172 页 | 6.94 MB | 2 年前3
Apache OFBiz®
The Apache OFBiz Project
Version TrunkApache OFBiz $ ^{®} $ The Apache OFBiz Project Version Trunk ## Table of Contents 1. System requirements ..... 2 2. Quick start ..... 3 2.1. Download the Gradle wrapper: ..... 3 2.2. Prepare file in ASCIIDoc format you may want to see it at HTML or PDF format Welcome to Apache OFBiz! A powerful top level Apache software project. OFBiz is an Enterprise Resource Planning (ERP) System written reading section. ### 3. Security If you find a security issue, please report it to: security @ ofbiz.apache.org. Once proper mitigations to the security issues are complete the OFBiz team will disclose this0 码力 | 23 页 | 305.80 KB | 2 年前3
共 1000 条
- 1
- 2
- 3
- 4
- 5
- 6
- 100













