VMware 高级解决方案架构师## vmware® EXPLORE ## V Mware Data Solution 介绍 2022 王晓庆 VMware 高级解决方案架构师 ## 免责声明 本演示文稿可能包含当前正在开发的产品特性或功能。 本新技术概要介绍并不表示 VMware 承诺在任何正式推出的产品中提供这些功能特性。 产品的功能特性可能会有变更,因此不得在任何类型的合同、采购订单或销售协议中予以规定。 ^{®} $ ©2022 VMware, Inc. ## Data Transformation 企业如何定位未来架构  ## DevSecOps Transformation - 对于开发部门来说,更快、更频繁地将代码持续交付到生产环境平台和实践中,进行快速的迭代和更新 ## Application Transformation  ## 基于云原生的架构 - 支持应用层面的创新、扩展、弹性和生态 ## Data & Analytics Transformation  Kalavri vkalavri@bu.edu ## Key partitioning  > δ*N, where N is the number of stream elements • The solution will randomized load balancing. IEEE TPDS 2001. • Manku, G.S., Motwani, R. Approximate frequency counts over data streams. VLDB 2002.0 码力 | 31 页 | 1.47 MB | 2 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020# CS 591 K1: Data Stream Processing and Analytics Spring 2020 2/25: State Management Vasiliki (Vasia) Kalavri vkalavri@bu.edu ## State in dataflow computations Any non-trivial streaming computation state types can you think of? • Count, sum, list, map, ... ## State management in Apache Flink All data maintained by a task and used to compute results: a local or instance variable that is accessed by com/blog/manage-rocksdb-memory-size-apache-flink ## RocksDB - RocksDB is a persistent key value store: data lives on disk, state can grow larger than available memory and will not be lost upon failure. - Keys0 码力 | 24 页 | 914.13 KB | 2 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020## CS 591 K1: Data Stream Processing and Analytics Spring 2020 ## 4 /14: Stream processing optimizations Vasiliki (Vasia) Kalavri vkalavri@bu.edu ## Topics covered in this lecture • Costs of streaming 10387ee7/p4_1.jpg) ## Dataflow graph • operators are nodes, data channels are edges • channels have FIFO semantics • streams of data elements flow continuously along edges ## Operators • receive selectivity always known at development time? ## Types of Parallelism Pipeline: A || B Task: B || C Data: A || A   |02/12|||Assignment #1 due| |02/13|Assignment #1 discussion and feedback Handling out-of-order and late data||Assignment #2 available| |02/18|No class||Substitute Monday| |02/20|Guest Lecture: Learning How to0 码力 | 34 页 | 2.53 MB | 2 年前3
Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020## CS 591 K1: Data Stream Processing and Analytics Spring 2020 2/06: Notions of time and progress Vasiliki (Vasia) Kalavri vkalavri@bu.edu ## Mobile game application • input stream: user activity captures the progress of the stage itself • minimum of input watermarks and event-times of non-late data ## Event-time update 0 码力 | 22 页 | 2.22 MB | 2 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020CS 591 K1: Data Stream Processing and Analytics Spring 2020 ## 1 /23: Stream Processing Fundamentals Vasiliki (Vasia) Kalavri vkalavri@bu.edu ## What is a stream? - In traditional data processing applications database. A data stream is a data set that is produced incrementally over time, rather than being available in full before its processing begins. • Data streams are high-volume, real-time data that might accessible way • we have to process stream elements on-the-fly using limited memory ## Properties of data streams • They arrive continuously instead of being available a-priori. • They bear an arrival and/or0 码力 | 45 页 | 1.22 MB | 2 年前3
Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020## CS 591 K1: Data Stream Processing and Analytics Spring 2020 ## 4 /23: Cardinality and frequency estimation Vasiliki (Vasia) Kalavri vkalavri@bu.edu ## Counting distinct elements ## How can we count very large data streams with high-frequency elements ## The Count-Min Sketch - A space-efficient probabilistic data structure that can be used to estimate frequencies and heavy hitters in data streams 10^{6}$. The recommended number of counters is $ m = \frac{2.71828}{10^{6}} $ 2,718,280. The sketch data structure requires a counter array of size 5 * 2,718,280. ## Space requirements For a standard error0 码力 | 69 页 | 630.01 KB | 2 年前3
共 1000 条
- 1
- 2
- 3
- 4
- 5
- 6
- 100
相关搜索词
Data TransformationDevSecOpsApplication Transformation云原生架构Data & Analytics TransformationReal-Time Unified Data LayerAnalyticsSearchAICrateDBSkew MitigationPartitioningLoad BalancingHybrid PartitioningLossy Countingstate managementstream processingFlinkkeyed stateoperator state流处理优化数据流图状态管理并行性编译器优化Window operatorsTime windowsWindow assignersTriggersKeyed vs non-keyed windows数据流处理流处理系统分布式系统Apache FlinkApache KafkaProcessing timeEvent timeWatermarksStream progressAcknowledgmentdata streamstream modelstream applicationreal-time基数估计频率估计哈希函数计数器子流













