PyFlink 1.15 Documentation
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.1.2.1 QuickStart: Table API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.1.2.2 QuickStart: DataStream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.3.2.1 O1: How to prepare Python Virtual Environment . . . . . . . . . . . . . . . . . . . 24 1.3.2.2 O2: How to add Python Files . . . 26 1.3.4.1 O1: Could not find any factory for identifier ‘xxx’ that implements ‘org.apache.flink.table.factories.DynamicTableFactory’ in the classpath . . . . . . . 26 1.3.4.2 O2: ClassNotFoundException:0 码力 | 36 页 | 266.77 KB | 1 年前3PyFlink 1.16 Documentation
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.1.2.1 QuickStart: Table API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.1.2.2 QuickStart: DataStream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.3.2.1 O1: How to prepare Python Virtual Environment . . . . . . . . . . . . . . . . . . . 24 1.3.2.2 O2: How to add Python Files . . . 26 1.3.4.1 O1: Could not find any factory for identifier ‘xxx’ that implements ‘org.apache.flink.table.factories.DynamicTableFactory’ in the classpath . . . . . . . 26 1.3.4.2 O2: ClassNotFoundException:0 码力 | 36 页 | 266.80 KB | 1 年前3Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020
14 ??? Vasiliki Kalavri | Boston University 2020 Load Shedding Road Map (LSRM) • A pre-computed table that contains materialized load shedding plans ordered by how much load shedding they will cause throughput is limited by the processing rate of the slowest task. • Parallel tasks are connected via virtual channels multiplexed over TCP connections: • In the presence of skew, a single overload channel link-by-link, per virtual channel congestion control technique used in ATM network switches. • To exchange data through an ATM network, each pair of endpoints first needs to establish a virtual circuit (VC)0 码力 | 43 页 | 2.42 MB | 1 年前3Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020
queries on data streams • New streams (derived) are defined as virtual views in SQL • Semantics are equivalent to having an append-only table to which new tuples are continuously added. 34 Vasiliki start_price, start_time FROM OpenAuction WHERE start_price > 1000 Derived stream as an append- only table. 35 Vasiliki Kalavri | Boston University 2020 User-Defined Aggregates (UDAs) Constructs that allow Vasiliki Kalavri | Boston University 2020 Example: AVG UDA AGGREGATE myavg(Next Int): Real { TABLE state(tsum Int, cnt Int); INITIALIZE: { INSERT INTO state VALUES(Next, 1);0 码力 | 53 页 | 532.37 KB | 1 年前3Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020
are a Windows user, you are advised to use Windows subsystem for Linux (WSL), Cygwin, or a Linux virtual machine to run Flink in a UNIX environment. • A Java 8.x installation. To develop Flink applications0 码力 | 34 页 | 2.53 MB | 1 年前3Scalable Stream Processing - Spark Streaming and Flink
54 / 79 Structured Streaming 55 / 79 Structured Streaming ▶ Treating a live data stream as a table that is being continuously appended. ▶ Built on the Spark SQL engine. ▶ Perform database-like query Two main steps to develop a Spark stuctured streaming: ▶ 1. Defines a query on the input table, as a static table. • Spark automatically converts this batch-like query to a streaming execution plan. ▶ the input table), and incrementally updates the result. 57 / 79 Programming Model (1/2) ▶ Two main steps to develop a Spark stuctured streaming: ▶ 1. Defines a query on the input table, as a static0 码力 | 113 页 | 1.22 MB | 1 年前3Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Example use-case: Distinct users visiting one or multiple webpages Naive solution: maintain a hash table ??? Vasiliki Kalavri | Boston University 2020 How can we count the number of distinct elements seen Example use-case: Distinct users visiting one or multiple webpages Naive solution: maintain a hash table Convert the stream into a multi-set of uniformly distributed random numbers using a hash function Example use-case: Distinct users visiting one or multiple webpages Naive solution: maintain a hash table The more different elements we encounter in the stream, the more different hash values we shall0 码力 | 69 页 | 630.01 KB | 1 年前3Apache Flink的过去、现在和未来
Streaming Dataflow DataStream API Stream Processing DataSet API Batch Processing Table API & SQL Relational Table API & SQL Relational Local Single JVM Cloud GCE, EC2 Cluster Standalone, YARN Physical 统一 Operator 抽象 Pull-based operator Push-based operator 算子可自定义读取顺序 Table API & SQL 1.9 新特性 全新的 SQL 类型系统 DDL 初步支持 Table API 增强 统一的 Catalog API Blink Planner What’s new in Blink Planner 数据结构0 码力 | 33 页 | 3.36 MB | 1 年前3Flink如何实时分析Iceberg数据湖的CDC数据
步数RTransform I量h Apache Iceberg asic Data Metadata Database Table Partition Spec Manifest File TableMetadata Snapshot Current Table Version Pointer Apac2e Ice-er1 Bas3c Part3t354- f f3 Part3t354-20 码力 | 36 页 | 781.69 KB | 1 年前3Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020
w2 w1 w3 round-robin hash-based • Items are perfectly balanced among workers • No routing table required • Key semantics are not preserved: values of the same key might be routed to different different workers • Workers are responsible for roughly the same amount of keys • No routing table is required • Key semantics preserved: values of the same key are always processed by the same worker0 码力 | 31 页 | 1.47 MB | 1 年前3
共 14 条
- 1
- 2