Large Language Models (LLMs) - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020

data access, high-rate append-only updates Data Warehouse • complex, offline analysis • large and relatively static and historical data • batched updates during downtimes, e.g. every night Boston University 2020 1. Process events online without storing them 2. Support a high-level language (e.g. StreamSQL) 3. Handle missing, out-of-order, delayed data 4. Guarantee deterministic (on analytics … Building a stream processor… 8 ? Vasiliki Kalavri | Boston University 2020 Basic Stream Models Vasiliki Kalavri | Boston University 2020 A stream can be viewed as a massive, dynamic, one-dimensional

0 码力 | 45 页 | 1.22 MB | 1 年前
3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020

type, content, timing constraints. • Actions define how to produce results from the matches. Language Types 3 Vasiliki Kalavri | Boston University 2020 Three classes of operators: • relation-to-relation: portions of a stream. • relation-to-stream: create streams through querying tables Declarative language: CQL 4 Vasiliki Kalavri | Boston University 2020 Select IStream(*) From S1 [Rows 5], S2 [Rows τ> whenever tuple s is in R at time τ. 6 Vasiliki Kalavri | Boston University 2020 Imperative language: Aurora SQuAl Queries are represented in graphical representation using boxes and arrows Tumble

0 码力 | 53 页 | 532.37 KB | 1 年前
3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Queuing theory models: for latency objectives • Control theory models: e.g., PID controller • Rule-based models, e.g. if CPU utilization > 70% => scale out • Analytical dataflow-based models Action Predictive: at-once for all operators 8 ??? Vasiliki Kalavri | Boston University 2020 Queuing theory models 9 • Metrics • service time and waiting time per tuple and per task • total time spent processing predictive, at-once for all operators ??? Vasiliki Kalavri | Boston University 2020 Queuing theory models 9 • Metrics • service time and waiting time per tuple and per task • total time spent processing

0 码力 | 93 页 | 2.42 MB | 1 年前
3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020

maintains state: • rolling aggregations • window contents • input offsets • machine learning models State in dataflow computations 2 Vasiliki Kalavri | Boston University 2020 • No explicit state as regular objects on TaskManager’s heap • Low read/write latencies • OutOfMemoryError if large grows too large, GC pauses • Checkpoints sent to JobManager's heap memory, i.e. the state is lost in case to a remote file system and supports incremental checkpoints • Use for applications with very large state Which backend to choose? 9 Vasiliki Kalavri | Boston University 2020 RocksDB 10 RocksDB

0 码力 | 24 页 | 914.13 KB | 1 年前
3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020

during office hours. Vasiliki Kalavri | Boston University 2020 Dataset A subset of traces from a large (12.5k machines) Google cluster • https://github.com/google/cluster-data/blob/master/ ClusterData2011_2 challenging? 28 Vasiliki Kalavri | Boston University 2020 Using pseudocode (or the programming language of your choice), write a program that reads a stream of integers and computes: 29 1. the maximum

0 码力 | 34 页 | 2.53 MB | 1 年前
3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020

maintains state: • rolling aggregations • window contents • input offsets • machine learning models State in dataflow computations 3 Vasiliki Kalavri | Boston University 2020 Logic State maintains state: • rolling aggregations • window contents • input offsets • machine learning models State in dataflow computations 3 Vasiliki Kalavri | Boston University 2020 Logic State maintains state: • rolling aggregations • window contents • input offsets • machine learning models State in dataflow computations 3 Vasiliki Kalavri | Boston University 2020 4 Distributed streaming

0 码力 | 49 页 | 2.08 MB | 1 年前
3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Kalavri | Boston University 2020 A simple and efficient synopsis Suppose that our data consists of a large numeric time series. What summary would let us compute the statistical variance of this series Kalavri | Boston University 2020 A simple and efficient synopsis Suppose that our data consists of a large numeric time series. What summary would let us compute the statistical variance of this series Kalavri | Boston University 2020 A simple and efficient synopsis Suppose that our data consists of a large numeric time series. What summary would let us compute the statistical variance of this series

0 码力 | 74 页 | 1.06 MB | 1 年前
3
Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Boston University 2020 14 Combining estimates • Average won’t work: The expected value of 2R is too large. • Median won’t work: it is always a power of 2, thus, if the correct estimate is between two University 2020 18 Detect DNS DDoS attacks • Flooding the resources of the targeted system by sending a large number of query from a botnet • Group queries by their top-level domain and investigate most popular functions increases the collision probability • Counter overestimation is almost certain for very large data streams with high-frequency elements Counting Bloom Filter ??? Vasiliki Kalavri | Boston

0 码力 | 69 页 | 630.01 KB | 1 年前
3
PyFlink 1.15 Documentation

. . . . . . . . . . . . . . . . . . . . . . 30 1.3.5.1 Q1: OverflowError: timeout value is too large . . . . . . . . . . . . . . . . . . . . 30 1.3.5.2 Q2: An error occurred while calling z:org.apache you to build scalable batch and streaming workloads, such as real-time data processing pipelines, large-scale exploratory data analysis, Machine Learning (ML) pipelines and ETL processes. If you’re already java: ˓→147) ... 39 more 1.3.5 Runtime issues 1.3.5.1 Q1: OverflowError: timeout value is too large File "D:\Anaconda3\envs\py37\lib\threading.py", line 926, in _bootstrap_inner self.run() File

0 码力 | 36 页 | 266.77 KB | 1 年前
3
PyFlink 1.16 Documentation

. . . . . . . . . . . . . . . . . . . . . . 30 1.3.5.1 Q1: OverflowError: timeout value is too large . . . . . . . . . . . . . . . . . . . . 30 1.3.5.2 Q2: An error occurred while calling z:org.apache you to build scalable batch and streaming workloads, such as real-time data processing pipelines, large-scale exploratory data analysis, Machine Learning (ML) pipelines and ETL processes. If you’re already java: ˓→147) ... 39 more 1.3.5 Runtime issues 1.3.5.1 Q1: OverflowError: timeout value is too large File "D:\Anaconda3\envs\py37\lib\threading.py", line 926, in _bootstrap_inner self.run() File

0 码力 | 36 页 | 266.80 KB | 1 年前
3

共 18 条前往

页

分类

语言

格式

Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020

State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020

High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020

PyFlink 1.15 Documentation

PyFlink 1.16 Documentation