adaptive query execution (AQE) - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020

processing optimizations ??? Vasiliki Kalavri | Boston University 2020 2 • Costs of streaming operator execution • state, parallelism, selectivity • Dataflow optimizations • plan translation alternatives Distributed execution in Flink ??? Vasiliki Kalavri | Boston University 2020 9 Identify the most efficient way to execute a query • There may exist several ways to execute a computation • query plans, strategies? • before execution or during runtime Query optimization (I) ??? Vasiliki Kalavri | Boston University 2020 10 Optimization strategies • enumerate equivalent execution plans • minimize intermediate

0 码力 | 54 页 | 2.83 MB | 1 年前
3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020

wikimedia.org/wiki/File:Adaptive_streaming_overview_daseddon_2011_07_28.png 5 ??? Vasiliki Kalavri | Boston University 2020 Load shedding as an optimization problem N: query network I: set of input continuously monitors input rates or other system metrics and can access information about the running query plan • It detects overload and decides what actions to take in order to maintain acceptable latency approximate answers … S1 S2 Sr Input Manager Scheduler QoS Monitor Load Shedder Query Execution Engine Qm Q2 Q1 Ad-hoc or continuous queries Input streams … ??? Vasiliki Kalavri |

0 码力 | 43 页 | 2.42 MB | 1 年前
3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020

JobManager is a single point of failure Flink applications • It keeps metadata about application execution, such as pointers to completed checkpoints. • A high-availability mode migrates the responsibility increased load • scale in to save resources • Fix bugs or change business logic • Optimize execution plan • Change operator placement • skew and straggler mitigation • Migrate to a different across existing and new nodes • Random I/O and high network communication • Not suitable for adaptive applications 26 Uniform hashing ??? Vasiliki Kalavri | Boston University 2020 27 ??? Vasiliki

0 码力 | 41 页 | 4.09 MB | 1 年前
3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020

single-pass Updates arbitrary append-only Update rates relatively low high, bursty Processing Model query-driven / pull-based data-driven / push-based Queries ad-hoc continuous Latency relatively high low • Derived stream: produced by a continuous query and its operators, e.g. total traffic from a source every minute ins_r(P:i) = insert(i, {j | j ∈ ins_r(P) ^ j.A ≠ i.A}). 28 Vasiliki Kalavri | Boston University 2020 Query processing challenges • Memory requirements: we cannot store the whole stream history. • Data rate:

0 码力 | 45 页 | 1.22 MB | 1 年前
3
Scalable Stream Processing - Spark Streaming and Flink

engine. ▶ Perform database-like query optimizations. 56 / 79 Programming Model (1/2) ▶ Two main steps to develop a Spark stuctured streaming: ▶ 1. Defines a query on the input table, as a static table table. • Spark automatically converts this batch-like query to a streaming execution plan. ▶ 2. Specify triggers to control when to update the results. • Each time a trigger fires, Spark checks for Spark stuctured streaming: ▶ 1. Defines a query on the input table, as a static table. • Spark automatically converts this batch-like query to a streaming execution plan. ▶ 2. Specify triggers to control

0 码力 | 113 页 | 1.22 MB | 1 年前
3
PyFlink 1.15 Documentation

to set up PyFlink development environment in your local machine. This is usually used for local execution or development in an IDE. Set up Python environment It requires Python 3.6 or above with PyFlink given Python virtual environment at client side (for job compiling) and server side (for Python UDF execution) separately. 1.1. Getting Started 7 pyflink-docs, Release release-1.15 • Specify the Python virtual cluster nodes during job execution. It should be noted that option -pyexec is also required to specify the Python virtual environment to use at server side (for Python UDF execution). For the Python virtual

0 码力 | 36 页 | 266.77 KB | 1 年前
3
PyFlink 1.16 Documentation

to set up PyFlink development environment in your local machine. This is usually used for local execution or development in an IDE. Set up Python environment It requires Python 3.6 or above with PyFlink given Python virtual environment at client side (for job compiling) and server side (for Python UDF execution) separately. 1.1. Getting Started 7 pyflink-docs, Release release-1.16 • Specify the Python virtual cluster nodes during job execution. It should be noted that option -pyexec is also required to specify the Python virtual environment to use at server side (for Python UDF execution). For the Python virtual

0 码力 | 36 页 | 266.80 KB | 1 年前
3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020

streams. • Declarative languages specify the expected results of the computation rather than the execution flow. • Imperative languages are used to describe plans of operators the streams must flow through • A Blocking query operator can only return answers when it detects the end of its input. • NOT IN, set difference and division, traditional SQL aggregates • A Non-blocking query operator can produce operator, iff F is monotonic with respect to the partial ordering ⊆. A query Q on a stream S can be implemented by a non-blocking query operator iff Q(S) is monotonic with respect to ⊆. The traditional

0 码力 | 53 页 | 532.37 KB | 1 年前
3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020

• we can store a fixed proportion of the stream, e.g. 1/10th 7 search engine query, timestamp> query stream Example use-case: Web search user behavior study Q: How many queries did users issued n queries in the last month: • s of those are unique • d of those are duplicates • no query was issued more than twice 9 How many of Ted’s queries will be in the 1/10th sample, S? Each of a flag indicating whether they belong to the sample or not • When a query arrives: • if the user is sampled: add the query to S • if we haven’t seen the user before: generate a random integer ru

0 码力 | 74 页 | 1.06 MB | 1 年前
3
Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020

HOk8+K8Ox+z1iWnmDmAP3A+fwCD9I4G We need to retrieve a distributed cut in a system execution that yields a system configuration Validity (safety): Termination (liveness): Obtain a valid CfjxXg3PqatJaOY2QV/yvj8AfLTl3A= We need to retrieve a distributed cut in a system execution that yields a system configuration Validity (safety): Termination (liveness): Obtain a valid m m’ System Possible Execution ??? Vasiliki Kalavri | Boston University 2020 Validity Explained p1 p2 p3 p1 p2 p3 m m’ C events in cut System Possible Execution ??? Vasiliki Kalavri | Boston

0 码力 | 81 页 | 13.18 MB | 1 年前
3

共 18 条前往

页

分类

语言

格式

Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Scalable Stream Processing - Spark Streaming and Flink

PyFlink 1.15 Documentation

PyFlink 1.16 Documentation

Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020