low latency - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020

historical data • batched updates during downtimes, e.g. every night Streaming Data Warehouse • low-latency materialized view updates • pre-aggregated, pre-processed streams and historical data Data append-only Update rates relatively low high, bursty Processing Model query-driven / pull-based data-driven / push-based Queries ad-hoc continuous Latency relatively high low 5 Vasiliki Kalavri | Boston Boston University 2020 Traditional DW vs. SDW Traditional DW SDW Update Frequency low high Update propagation synchronized asynchronous Data historical recent and historical ETL process complex fast and

0 码力 | 45 页 | 1.22 MB | 1 年前
3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020

• scheduling Objectives • optimize resource utilization or minimize resources • decrease latency, increase throughput • minimize monetary costs (if running in the cloud) Query optimization running might be impractical. • state accumulation and re-partitioning • high-availability and low latency requirements • scheduling overhead Challenges in streaming optimization ??? Vasiliki Kalavri constraints or QoS latency constraints. Batching Process multiple data elements in a single batch A A’ ??? Vasiliki Kalavri | Boston University 2020 43 • Batching trades throughput for latency • It improves

0 码力 | 54 页 | 2.83 MB | 1 年前
3
Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Watermarks provide a configurable trade-off between results confidence and latency: • Eager watermarks ensure low latency but provide lower confidence • Late events might arrive after the watermark watermark • Slow watermarks increase confidence but they might lead to higher processing latency. Trade-offs 17 Vasiliki Kalavri | Boston University 2020 Periodic: periodically ask the user-defined function

0 码力 | 22 页 | 2.22 MB | 1 年前
3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Kafka, RabbitMQ, ... HDFS, JDBC, ... Event logs ETL, Graphs,  Machine Learning  Relational, … Low latency,  windowing, aggregations, ... 2 Vasiliki Kalavri | Boston University 2020 Basic API Concept

0 码力 | 26 页 | 3.33 MB | 1 年前
3
监控Apache Flink应用程序(入门)

........................................................................... 14 4.12 Monitoring Latency................................................................................................. records-lag-max > threshold • millisBehindLatest > threshold 4.12 Monitoring Latency Generally speaking, latency is the delay between the creation of an event and the time at which results based which then writes the results to a database or calls a downstream system. In such a pipeline, latency can be introduced at each stage and for various reasons including the following: 1. It might take

0 码力 | 23 页 | 148.62 KB | 1 年前
3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020

trades-off result accuracy for sustainable performance. • Suitable for applications with strict latency constraints that can tolerate approximate results. Slow down the flow of data: • The system buffers plan • It detects overload and decides what actions to take in order to maintain acceptable latency and minimize result quality degradation. 7 ??? Vasiliki Kalavri | Boston University 2020 DSMS University 2020 Load shedding decisions • When to shed load? • detect overload quickly to avoid latency increase • monitor input rates • Where in the query plan? • dropping at the sources vs. dropping

0 码力 | 43 页 | 2.42 MB | 1 年前
3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020

• out-of-sync sources may produce out-of-order streams • can be connected to the network • latency and unpredictable delays • might be producing too fast • stream processor needs to keep up and to a file or database • Consumer periodically polls and retrieves new data • polling overhead, latency? • Consumer receives a notification when new data is available • how to implement triggers? messages, as each partition is read by a single thread What would you use when priority is: - latency but not ordering? - throughput and ordering? 31 How long to keep the log? • Log compaction: a

0 码力 | 33 页 | 700.14 KB | 1 年前
3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020

• CPU utilization, congestion, back pressure, throughput Policy • Queuing theory models: for latency objectives • Control theory models: e.g., PID controller • Rule-based models, e.g. if CPU utilization scaled and upstream channels • All-at-once • move state to be migrated in one operation • high latency during migration if the state is large • Progressive • move state to be migrated in smaller pieces Andrea Lattuada, Frank McSherry, Vasiliki Kalavri, John Liagouris, Timothy Roscoe. Megaphone: Latency-conscious state migration for distributed streaming dataflows. (VLDB 2019). 37

0 码力 | 93 页 | 2.42 MB | 1 年前
3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020

migration • minimize communication • keep duration short • minimize performance disruption, e.g. latency spikes • avoid introducing load imbalance • Resource management • utilization, isolation

0 码力 | 41 页 | 4.09 MB | 1 年前
3
Scalable Stream Processing - Spark Streaming and Flink

coarse-grained approach than Storm. • Based on consistent global snapshots (inspired by Chandy-Lamport). • Low runtime overhead, stateful exactly-once semantics. 73 / 79 Fault Tolerance (1/2) ▶ Fault tolerance coarse-grained approach than Storm. • Based on consistent global snapshots (inspired by Chandy-Lamport). • Low runtime overhead, stateful exactly-once semantics. 73 / 79 Fault Tolerance (1/2) ▶ Fault tolerance coarse-grained approach than Storm. • Based on consistent global snapshots (inspired by Chandy-Lamport). • Low runtime overhead, stateful exactly-once semantics. 73 / 79 Fault Tolerance (2/2) ▶ Acks sequences

0 码力 | 113 页 | 1.22 MB | 1 年前
3

共 15 条前往

页

分类

语言

格式

Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020

监控Apache Flink应用程序(入门)

Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Scalable Stream Processing - Spark Streaming and Flink