State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020results: a local or instance variable that is accessed by a task’s business logic Operator state is scoped to an operator task, i.e. records processed by the same parallel task have access to the same state It cannot be accessed by other parallel tasks of the same or different operators Keyed state is scoped to a key defined in the operator’s input records • Flink maintains one state instance per key value key to the operator task that maintains the state for this key • State access is automatically scoped to the key of the current record so that all records with the same key access the same state State0 码力 | 24 页 | 914.13 KB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 202030 ??? Vasiliki Kalavri | Boston University 2020 Pause-and-restart state migration • State is scoped to a single task • Each stateful task is responsible for processing and state management 31 31 ??? Vasiliki Kalavri | Boston University 2020 Pause-and-restart state migration • State is scoped to a single task • Each stateful task is responsible for processing and state management 31 operators ??? Vasiliki Kalavri | Boston University 2020 Pause-and-restart state migration • State is scoped to a single task • Each stateful task is responsible for processing and state management 310 码力 | 93 页 | 2.42 MB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flinkeach RDD using a given function. ▶ reduceByKey • Returns a new DStream of (K, V) pairs where the values for each key are aggregated using the given reduce function. ▶ countByValue • Returns a new DStream each RDD using a given function. ▶ reduceByKey • Returns a new DStream of (K, V) pairs where the values for each key are aggregated using the given reduce function. ▶ countByValue • Returns a new DStream each RDD using a given function. ▶ reduceByKey • Returns a new DStream of (K, V) pairs where the values for each key are aggregated using the given reduce function. ▶ countByValue • Returns a new DStream0 码力 | 113 页 | 1.22 MB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020us compute the statistical variance of this series? 3 • the sum of all the values • the sum of the squares of the values • the number of observations var = ∑ (xi − μ)2 N ??? Vasiliki Kalavri | us compute the statistical variance of this series? 3 • the sum of all the values • the sum of the squares of the values • the number of observations • μ = sum / count • var = (sum of squares / of this series? 3 • the sum of all the values • the sum of the squares of the values • the number of observations We can compute the three summary values in a single pass through the data. • μ =0 码力 | 74 页 | 1.06 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020Int): Real { TABLE state(tsum Int, cnt Int); INITIALIZE: { INSERT INTO state VALUES(Next, 1); } ITERATE: { UPDATE state SET tsum=tsum+Next Int): Real { TABLE state(tsum Int, cnt Int); INITIALIZE: { INSERT INTO state VALUES(Next, 1); } ITERATE: { UPDATE state SET tsum=tsum+Next Int): Real { TABLE state(tsum Int, cnt Int); INITIALIZE: { INSERT INTO state VALUES(Next, 1); } ITERATE: { UPDATE state SET tsum=tsum+Next, cnt=cnt+1;0 码力 | 53 页 | 532.37 KB | 1 年前3
Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020maintain a hash table The more different elements we encounter in the stream, the more different hash values we shall see. Convert the stream into a multi-set of uniformly distributed random numbers using equal to: ??? Vasiliki Kalavri | Boston University 2020 The hash function h hashes x to any of N values with probability 1/N. Out of all x we hash: • around 50% will have a binary representation that so on… 6 ??? Vasiliki Kalavri | Boston University 2020 The hash function h hashes x to any of N values with probability 1/N. Out of all x we hash: • around 50% will have a binary representation that0 码力 | 69 页 | 630.01 KB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020EmitIntermediate(u, "1"); reduce(String key, Iterator values): // key: a URL // values: a list of counts int result = 0; for each v in values: result += ParseInt(v); Emit(key, AsString(result));0 码力 | 54 页 | 2.83 MB | 1 年前3
Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020preserved: values of the same key might be routed to different workers • Workers are responsible for roughly the same amount of keys • No routing table is required • Key semantics preserved: values of Addressing skew • To address skew, the system needs to track the frequencies of the partitioning key values. • We can then use a hybrid partitioning function that treats normal keys and popular keys differently0 码力 | 31 页 | 1.47 MB | 1 年前3
PyFlink 1.15 Documentation. . . . . . . . . . . . . . . . . . . . . . . 31 1.3.6.1 Q1: ‘tuple’ object has no attribute ‘_values’ . . . . . . . . . . . . . . . . . . . . . 31 1.3.6.2 Q2: AttributeError: ‘int’ object has no attribute PyFlink in a clean environment. 1.3.6 Data type issues 1.3.6.1 Q1: ‘tuple’ object has no attribute ‘_values’ Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error␣ ˓→received page) ˓→impl_fast.RowCoderImpl.encode_to_streamAttributeError: 'tuple' object has no attribute '_values' This issue is usually caused by the reason that it returns an object other than Row type in a Python0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentation. . . . . . . . . . . . . . . . . . . . . . . 31 1.3.6.1 Q1: ‘tuple’ object has no attribute ‘_values’ . . . . . . . . . . . . . . . . . . . . . 31 1.3.6.2 Q2: AttributeError: ‘int’ object has no attribute PyFlink in a clean environment. 1.3.6 Data type issues 1.3.6.1 Q1: ‘tuple’ object has no attribute ‘_values’ Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error␣ ˓→received page) ˓→impl_fast.RowCoderImpl.encode_to_streamAttributeError: 'tuple' object has no attribute '_values' This issue is usually caused by the reason that it returns an object other than Row type in a Python0 码力 | 36 页 | 266.80 KB | 1 年前3
共 12 条
- 1
- 2













