PyFlink 1.15 Documentationimport udf # create a general Python UDF @udf(result_type=DataTypes.BIGINT()) def plus_one(i): return i + 1 table.select(plus_one(col('id'))).to_pandas() 1.1. Getting Started 17 pyflink-docs, Release release-1 @udf(result_type=DataTypes.BIGINT(), func_type='pandas') def pandas_plus_one(series): return series + 1 table.select(pandas_plus_one(col('id'))).to_pandas() /Users/duanchen/sourcecode/flink/flink-python/dev/ use the Python function in SQL API table_env.create_temporary_function("plus_one", plus_one) table_env.sql_query("SELECT plus_one(id) FROM {}".format(table)).to_pandas() Another example is UDFs used0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentationimport udf # create a general Python UDF @udf(result_type=DataTypes.BIGINT()) def plus_one(i): return i + 1 table.select(plus_one(col('id'))).to_pandas() 1.1. Getting Started 17 pyflink-docs, Release release-1 @udf(result_type=DataTypes.BIGINT(), func_type='pandas') def pandas_plus_one(series): return series + 1 table.select(pandas_plus_one(col('id'))).to_pandas() /Users/duanchen/sourcecode/flink/flink-python/dev/ use the Python function in SQL API table_env.create_temporary_function("plus_one", plus_one) table_env.sql_query("SELECT plus_one(id) FROM {}".format(table)).to_pandas() Another example is UDFs used0 码力 | 36 页 | 266.80 KB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020stream, we select a stream element i with probability 10%. • We can use a random generator that produces an integer ri between 0 and 9. We then select an input element i if ri=0. 8 Q: How many stream, we select a stream element i with probability 10%. • We can use a random generator that produces an integer ri between 0 and 9. We then select an input element i if ri=0. 8 Will this approach fixed-size sample of the stream so far? At all times, we want the following property to hold: an element is in S with probability s/n, where n is the total number of stream elements seen so far. ??? Vasiliki0 码力 | 74 页 | 1.06 MB | 1 年前3
Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 Let h be a hash function that maps each stream element into M = log2N bits, where N is the domain of input elements: For each element x, let rank(x) be the number of 0s in the end of h(x): University 2020 10 We split the input stream into m = 2p sub-streams S0, S1, …, Sm-1 For every element x, we compute h(x) and use the p first bits of the M-bit hash value to select a sub-stream and the University 2020 10 We split the input stream into m = 2p sub-streams S0, S1, …, Sm-1 For every element x, we compute h(x) and use the p first bits of the M-bit hash value to select a sub-stream and the0 码力 | 69 页 | 630.01 KB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flinknormal Spark RDDs. 20 / 79 Transformations (2/4) ▶ map • Returns a new DStream by passing each element of the source DStream through a given function. ▶ flatMap • Similar to map, but each input item func returns true. 21 / 79 Transformations (2/4) ▶ map • Returns a new DStream by passing each element of the source DStream through a given function. ▶ flatMap • Similar to map, but each input item func returns true. 21 / 79 Transformations (2/4) ▶ map • Returns a new DStream by passing each element of the source DStream through a given function. ▶ flatMap • Similar to map, but each input item0 码力 | 113 页 | 1.22 MB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 Time-based window assigners for the most common windowing use cases: • They assign an element based on its event-time timestamp or the current processing time to windows. • Time windows (processing or event) time passes the end of the window. • A window is created when the first element is assigned to it. Flink will never evaluate empty windows! Flink’s built-in window assigners performed on the elements of a window • Incremental aggregation functions are applied when an element is added to a window: • They maintain a single value as window state and eventually emit the0 码力 | 35 页 | 444.84 KB | 1 年前3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020if true, the element is probably in the set if false, it definitely isn’t Vasiliki Kalavri | Boston University 2020 21 http://streamingbook.net/fig/5-5 Bloom filter: if true, the element is probably Kalavri | Boston University 2020 21 http://streamingbook.net/fig/5-5 Bloom filter: if true, the element is probably in the set if false, it definitely isn’t Separate bloom filters for every 10-minute Kalavri | Boston University 2020 21 http://streamingbook.net/fig/5-5 Bloom filter: if true, the element is probably in the set if false, it definitely isn’t Separate bloom filters for every 10-minute0 码力 | 49 页 | 2.08 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020relation R contains a stream elementwhenever tuple s is in R(τ) − R(τ − 1). • Dstream (for “delete stream”) applied to relation R contains a stream elementwhenever tuple s is in R(τ R(τ − 1) − R(τ). • Rstream (for “relation stream”) applied to relation R contains a stream elementwhenever tuple s is in R at time τ. 6 Vasiliki Kalavri | Boston University 2020 Imperative statement groups: • INITIALIZE: initialized local state. • ITERATE: update state based on new element and current state. • TERMINATE: produce the result. Note that it is allowed to define and maintain0 码力 | 53 页 | 532.37 KB | 1 年前3
Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020e.g., if ε=0.2, w=5 (5 items per window) • wcur: the current window id • We keep a list D of element frequencies and their maximum associated error. • Once a window fills up, we remove infrequent each element x in wcur: if x ∈ D, increase its frequency, fx = fx +1 else insert with frequency fx = 1 and error εx = wcur - 1 N = N + 1 Delete step Iterate over D and remove every element x with0 码力 | 31 页 | 1.47 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020input elements • a map operator has a selectivity of 1, i.e. it produces one output element for each input element it processes • an operator that tokenizes sentences into words has selectivity > 10 码力 | 54 页 | 2.83 MB | 1 年前3
共 11 条
- 1
- 2













