Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020azurewebsites.net/pubs/pubs.html#chandy 13 ??? Vasiliki Kalavri | Boston University 2020 Snapshotting Protocols p1 p2 p3 mAB53icbVBNS8NAEJ34WetX1aOXxSJ4Kok global state of a distributed system ??? Vasiliki Kalavri | Boston University 2020 Snapshotting Protocols p1 p2 p3 C m AB53icbVBNS8NAEJ34WetX1aOXxSJ4Kok checkpointing and recovery mechanism only resets the internal state of a streaming application • Some result records might be emitted multiple times to downstream systems 50 ??? Vasiliki Kalavri | Boston 0 码力 | 81 页 | 13.18 MB | 1 年前3
PyFlink 1.15 Documentationword_count.py # You will see outputs as following: # Use --input to specify file input. # Printing result to stdout. Use --output to specify output path. # +I[To, 1] # +I[be,, 1] # +I[or, 1] # +I[not, 1] Local This page shows you how to set up PyFlink development environment in your local machine. This is usually used for local execution or development in an IDE. Set up Python environment It requires pre-installed to be available in your local environment. It’s suggested to use Python virtual environments to set up your local Python environment. See Create a Python virtual environment for more details on how to0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentationword_count.py # You will see outputs as following: # Use --input to specify file input. # Printing result to stdout. Use --output to specify output path. # +I[To, 1] # +I[be,, 1] # +I[or, 1] # +I[not, 1] Local This page shows you how to set up PyFlink development environment in your local machine. This is usually used for local execution or development in an IDE. Set up Python environment It requires pre-installed to be available in your local environment. It’s suggested to use Python virtual environments to set up your local Python environment. See Create a Python virtual environment for more details on how to0 码力 | 36 页 | 266.80 KB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flinkfrequency in each RDD of the source DStream. 23 / 79 Window Operations (1/3) ▶ Spark provides a set of transformations that apply to a over a sliding window of data. ▶ A window is defined by two parameters: DStream of (word, 1). ▶ Get the frequency of words in each batch of data. ▶ Finally, print the result. val pairs = words.map(word => (word, 1)) val wordCounts = pairs.reduceByKey(_ + _) wordCounts these operation is proportional to the size of the state. ▶ mapWithState • It is executed only on set of keys that are available in the last micro batch. • The performance is proportional to the size0 码力 | 113 页 | 1.22 MB | 1 年前3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020produce output What can go wrong: • lost events • duplicate or lost state updates • wrong result 5 mi mo Was mi fully processed? Was mo delivered downstream? Vasiliki Kalavri | Boston University University 2020 Processing guarantees and result semantics 11 sum 4 3 2 1 0 … Vasiliki Kalavri | Boston University 2020 Processing guarantees and result semantics 11 sum 4 3 2 1 … 1 5 Vasiliki University 2020 Processing guarantees and result semantics 11 sum 4 3 1 3 3 … 5 6 Vasiliki Kalavri | Boston University 2020 Processing guarantees and result semantics 11 sum 5 4 3 6 6 … 6 7 10 码力 | 49 页 | 2.08 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020conditions does the optimization preserve correctness? • maintain state semantics • maintain result and selectivity semantics • Dynamism: can the optimization be applied during runtime or does it availability: the set of attributes B reads from must be disjoint from the set of attributes A writes to. • Commutativity: the results of applying A and then B must be the same as the result of applying availability: the set of attributes B reads from must be disjoint from the set of attributes A writes to. • commutativity: the results of applying A and then B must be the same as the result of applying B0 码力 | 54 页 | 2.83 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 Operator types (II) • Sequence Operators capture the arrival of an ordered set of events. • common in pattern languages • events must have associated timestamps • Iteration Blocking query operator can only return answers when it detects the end of its input. • NOT IN, set difference and division, traditional SQL aggregates • A Non-blocking query operator can produce answers GROUP BY DeptNo HAVING SUM(empl.Sal) > 10000 The introduction of a new empl can only expand the set of departments that satisfy this query However this sum query cannot be expressed without the use0 码力 | 53 页 | 532.37 KB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020time characteristic 4 object AverageSensorReadings { def main(args: Array[String]) { // set up the streaming execution environment val env = StreamExecutionEnvironment.getExecutionEnvironment • They maintain a single value as window state and eventually emit the aggregated value as the result. • ReduceFunction and AggregateFunction • Full window functions collect all elements of a window accumulator); // compute the result from the accumulator and return it. OUT getResult(ACC accumulator); // merge two accumulators and return the result. ACC merge(ACC a, ACC b); } 160 码力 | 35 页 | 444.84 KB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020rate : throughput Why is it necessary? ??? Vasiliki Kalavri | Boston University 2020 • Ensure result correctness • reconfiguration mechanism often relies on fault-tolerance mechanism • State re-partitioning Re-partition and migrate state in a consistent manner • Block and unblock computations to ensure result correctness ??? Vasiliki Kalavri | Boston University 2020 Control: When and how much to adapt? Re-partition and migrate state in a consistent manner • Block and unblock computations to ensure result correctness ??? Vasiliki Kalavri | Boston University 2020 Control: When and how much to adapt?0 码力 | 41 页 | 4.09 MB | 1 年前3
Streaming in Apache FlinkCommitter @ Apache Flink @SenorCarbone Contents • DataSet API • DataStream API • Concepts • Set up an environment to develop Flink programs • Implement streaming data processing pipelines • Flink average average.add(item.f1); averageState.update(average); // return the smoothed result return new Tuple2(item.f0, average.getAverage()); } } Connected Streams DataStreamLong, SensorReading>(key, context.window().getEnd(), max)); } } Precombine Produce final result Lateness • By default, when using event-time windows, late events are dropped. stream. .keyBy( 0 码力 | 45 页 | 3.00 MB | 1 年前3
共 22 条
- 1
- 2
- 3













