Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 Control: When and how much to adapt? Mechanism: How to apply the re-configuration? 3 • Detect environment changes: external workload and system performance • Identify bottleneck Flink wordcount Every reconfiguration takes ~30s during which the system is unavailable Re-configuration requires state migration with correctness guarantees. ??? Vasiliki Kalavri | Boston University hidden from the application developer Live state migration ??? Vasiliki Kalavri | Boston University 2020 35 control command Helper operators, hidden from the application developer Live state0 码力 | 93 页 | 2.42 MB | 1 年前3
Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020system execution that yields a system configuration Validity (safety): Termination (liveness): Obtain a valid system configuration A full system configuration is eventually captured A snapshot algorithm system execution that yields a system configuration Validity (safety): Termination (liveness): Obtain a valid system configuration A full system configuration is eventually captured A snapshot algorithm recorded in the snapshot but enforces the causal consistency. 3. Starts recording all data (application) messages it receives on all of its incoming channels. 20 ??? Vasiliki Kalavri | Boston University0 码力 | 81 页 | 13.18 MB | 1 年前3
PyFlink 1.15 Documentationenvironments to use. ./bin/flink run-application -t yarn-application \ -Djobmanager.memory.process.size=1024m \ -Dtaskmanager.memory.process.size=1024m \ -Dyarn.application.name=\ -pyclientexec could not meet. ./bin/flink run-application -t yarn-application \ -Djobmanager.memory.process.size=1024m \ -Dtaskmanager.memory.process.size=1024m \ -Dyarn.application.name= \ -Dyarn zip and word_count.py for the above example. As it executes the job on the JobManager in YARN application mode, the paths specified in -pyarch and -py are paths relative to shipfiles which is the directory 0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentationenvironments to use. ./bin/flink run-application -t yarn-application \ -Djobmanager.memory.process.size=1024m \ -Dtaskmanager.memory.process.size=1024m \ -Dyarn.application.name=\ -pyclientexec could not meet. ./bin/flink run-application -t yarn-application \ -Djobmanager.memory.process.size=1024m \ -Dtaskmanager.memory.process.size=1024m \ -Dyarn.application.name= \ -Dyarn zip and word_count.py for the above example. As it executes the job on the JobManager in YARN application mode, the paths specified in -pyarch and -py are paths relative to shipfiles which is the directory 0 码力 | 36 页 | 266.80 KB | 1 年前3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020factorValues)) } }) 17 Vasiliki Kalavri | Boston University 2020 Configuration options conf/flink-conf.yaml contains the configuration options as a collection of key-value pairs with format key:value Default parallelism for jobs. You can override this option by using env.setParallelism() in your application. taskmanager.numberOfTaskSlots: The number of parallel operator or user function instances that /bin/start-cluster.sh Stop Flink: ./bin/stop-cluster.sh Run an application with no arguments: ./bin/flink run ./examples/batch/WordCount.jar Run an application with input and output arguments: ./bin/flink run0 码力 | 26 页 | 3.33 MB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020• To recover from failures, the system needs to • restart failed processes • restart the application and recover its state 2 Checkpointing guards the state from failures, but what about process sufficient number of processing slots in order to execute all tasks of an application. • The JobManager cannot restart the application until enough slots become available. • Restart is automatic if there standalone mode • The restart strategy determines how often the JobManager tries to restart the application and how long it waits between restart attempts. 4 TaskManager failures ??? Vasiliki Kalavri0 码力 | 41 页 | 4.09 MB | 1 年前3
Streaming in Apache Flinkand shrinks • queryable: Flink state can be queried via a REST API Rich Functions • open(Configuration c) • close() • getRuntimeContext() DataStream> input = … DataStream > { private ValueState averageState; @Override public void open (Configuration conf) { ValueStateDescriptor descriptor = new ValueStateDescriptor<>("moving String, String> { private ValueState blocked; @Override public void open(Configuration config) { blocked = getRuntimeContext().getState(new ValueStateDescriptor<>("blocked", Boolean 0 码力 | 45 页 | 3.00 MB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020handle object private var lastTempState: ValueState[Double] = _ override def open(parameters: Configuration): Unit = { // create state descriptor val lastTempDescriptor = new ValueStateDescriptor[Double]("lastTemp" rideState; private ValueStatefareState; @Override public void open(Configuration config) { // initialize the state descriptors here rideState = getRuntimeContext() 0 码力 | 24 页 | 914.13 KB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020output Redundancy elimination variations How can no-op or idempotent operators appear in an application? ??? Vasiliki Kalavri | Boston University 2020 23 Ensure the combination of A1, A2 is equivalent GET /dumprequest HTTP/1.1 Host: rve.org.uk Connection: keep-alive Accept: text/html,application/ xhtml+xml,application/ xml;q=0.9,*/*;q=0.8 User-Agent: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.22 GET /dumprequest HTTP/1.1 Host: rve.org.uk Connection: keep-alive Accept: text/html,application/ xhtml+xml,application/ xml;q=0.9,*/*;q=0.8 User-Agent: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.220 码力 | 54 页 | 2.83 MB | 1 年前3
监控Apache Flink应用程序(入门)(framework or user code), as well as each network shuffle, takes time and adds to latency. 5. If the application emits through a transactional sink, the sink will only commit and publish transactions upon successful So far we have only looked at Flink-specific metrics. As long as latency & throughput of your application are in line with your expectations and it is checkpointing consistently, this is probably everything to the size of your application state (check the checkpointing metrics5 for an estimated size of the on-heap state). The possible reasons for growing state are very application-specific. Typically, an0 码力 | 23 页 | 148.62 KB | 1 年前3
共 16 条
- 1
- 2













