PyFlink 1.15 DocumentationDependency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.3.1.2 O2: Java gateway process exited before sending its port number . . . . . . . . . . . 22 1.3.2 Usage issues . . . . . . . use. ./bin/flink run-application -t yarn-application \ -Djobmanager.memory.process.size=1024m \ -Dtaskmanager.memory.process.size=1024m \ -Dyarn.application.name=\ -pyclientexec /pat meet. ./bin/flink run-application -t yarn-application \ -Djobmanager.memory.process.size=1024m \ -Dtaskmanager.memory.process.size=1024m \ -Dyarn.application.name= \ -Dyarn.ship-files=/path/to/shipfiles 0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 DocumentationDependency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.3.1.2 O2: Java gateway process exited before sending its port number . . . . . . . . . . . 22 1.3.2 Usage issues . . . . . . . use. ./bin/flink run-application -t yarn-application \ -Djobmanager.memory.process.size=1024m \ -Dtaskmanager.memory.process.size=1024m \ -Dyarn.application.name=\ -pyclientexec /pat meet. ./bin/flink run-application -t yarn-application \ -Djobmanager.memory.process.size=1024m \ -Dtaskmanager.memory.process.size=1024m \ -Dyarn.application.name= \ -Dyarn.ship-files=/path/to/shipfiles 0 码力 | 36 页 | 266.80 KB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020reduce/aggregate/process(...) // specify the window function // define a non-keyed window-all operator stream .windowAll(...) // specify the window assigner .reduce/aggregate/process(...) // specify the seconds(1))) .process(new TemperatureAverager) val avgTemp = sensorData .keyBy(_.id) // shortcut for window.(TumblingEventTimeWindows.of(size)) .timeWindow(Time.seconds(1)) .process(new TemperatureAverager) windows every 15 minutes .window(SlidingEventTimeWindows.of(Time.hours(1), Time.minutes(15))) .process(new TemperatureAverager) val slidingAvgTemp = sensorData .keyBy(_.id) // shortcut for window0 码力 | 35 页 | 444.84 KB | 1 年前3
Streaming in Apache Flink.window() .reduce|aggregate|process( ) stream. .windowAll( ) .reduce|aggregate|process( ) ◦TumblingEventTimeWindows.of(Time input = ... input .keyBy(“key”) .window(TumblingEventTimeWindows.of(Time.minutes(1))) .process(new MyWastefulMax()); public static class MyWastefulMax extends ProcessWindowFunction< SensorReading key type TimeWindow> { // window type @Override public void process( String key, Context context, Iterable events, Collector 0 码力 | 45 页 | 3.00 MB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020consumers can process events. 2 ??? Vasiliki Kalavri | Boston University 2020 Keeping up with the producers • Producers can generate events in a higher rate than the rate consumers can process events. with the producers • Producers can generate events in a higher rate than the rate consumers can process events. • What happens if consumers cannot keep up with the event rate? • drop messages 2 ?? with the producers • Producers can generate events in a higher rate than the rate consumers can process events. • What happens if consumers cannot keep up with the event rate? • drop messages • buffer0 码力 | 43 页 | 2.42 MB | 1 年前3
Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020satisfies causality: • An event is pre-snapshot if it occurs before the local snapshot on a process, otherwise it is post- snapshot • If event A happens causally before B and B is pre-snapshot, duplicate messages • Strongly connected execution graph: each process can reach every other process in the system • Single initiating process 18 The Chandy-Lamport Algorithm A snapshot algorithm that interfere with processing • processing and messages do not stop • Each process cast locally record its own state • Any process can initiate the algorithm 19 The Chandy-Lamport Algorithm ??? Vasiliki0 码力 | 81 页 | 13.18 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020that might be unbounded • we cannot store the entire stream in an accessible way • we have to process stream elements on-the-fly using limited memory 2 Vasiliki Kalavri | Boston University 2020 Properties ETL process complex fast and light-weight ETL: Extract-Transform-Load e.g. unzipping compressed files, data cleaning and standardization 6 Vasiliki Kalavri | Boston University 2020 1. Process events or more base and/or derived streams • Each query (operator) maintains its own state • Queries process raw streams, not synopses => results are typically exact • Challenges: computation progress,0 码力 | 45 页 | 1.22 MB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020re-ordering of messages • Re-delivery complicates stream processing and fault-tolerance • might process a message out-of-order or twice 14 How can we avoid this? 15 Publish/Subscribe Systems publisher parallelism: the number of the topic's partitions • Processing delays: If a message is slow to process, this delays processing of subsequent messages, as each partition is read by a single thread throughput and ordering? 31 How long to keep the log? • Log compaction: a (usually background) process that searches for log records with the same key and merges the records by only keeping the most0 码力 | 33 页 | 700.14 KB | 1 年前3
Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020> cc = edgeStream .keyBy(0) .timeWindow(Time.of(100, TimeUnit.MILLISECONDS)) .process(new UpdateDisjointSet()) // ephemeral partial state .flatMap(new Merger()) // global state > cc = edgeStream .keyBy(0) .timeWindow(Time.of(100, TimeUnit.MILLISECONDS)) .process(new UpdateDisjointSet()) // ephemeral partial state .flatMap(new Merger()) // global state > cc = edgeStream .keyBy(0) .timeWindow(Time.of(100, TimeUnit.MILLISECONDS)) .process(new UpdateDisjointSet()) // ephemeral partial state .flatMap(new Merger()) // global state0 码力 | 72 页 | 7.77 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 • Transforming languages define transformations specifying operations that process input streams and produce output streams. • Declarative languages specify the expected results Operators 9 Vasiliki Kalavri | Boston University 2020 Operator types (I) • Single-Item Operators process stream elements one-by-one. • selection, filtering, projection, renaming. • Logic Operators define non-blocking. 41 Vasiliki Kalavri | Boston University 2020 Pattern Queries with UDAs • UDAs process streams tuple-per-tuple • How can we write a UDA that detects a sequence of actions? • e.g. detect0 码力 | 53 页 | 532.37 KB | 1 年前3
共 16 条
- 1
- 2













