Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020?? Vasiliki Kalavri | Boston University 2020 • To recover from failures, the system needs to • restart failed processes • restart the application and recover its state 2 Checkpointing guards the about process failure? High-availability ??? Vasiliki Kalavri | Boston University 2020 3 Flink processes ??? Vasiliki Kalavri | Boston University 2020 • Flink requires a sufficient number of processing and all required metadata, such as the application’s JAR file, into a remote persistent storage system • Zookeeper also holds state handles and checkpoint locations 5 JobManager failures ??? Vasiliki0 码力 | 41 页 | 4.09 MB | 1 年前3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020fully processed? Was mo delivered downstream? Vasiliki Kalavri | Boston University 2020 A simple system model stream sources N1 NK N2 … input queue output queue primary nodes secondary nodes other semantics depend on the operator type: • arbitrary: it depends on order, randomness, or external system • deterministic: it produces the same output when starting from the same initial state and given Can you think of an operator that provides correct, possibly repeating, results even if it re-processes tuples after recovery? Vasiliki Kalavri | Boston University 2020 Processing guarantees and result0 码力 | 49 页 | 2.08 MB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020broker: a system that connects event producers with event consumers. • It receives messages from the producers and pushes them to the consumers. • A TCP connection is a simple messaging system which which connects one sender with one recipient. • A general messaging system connects multiple producers to multiple consumers by organizing messages into topics. 7 Message Broker producer producer producer generates events with high rate, we can balance the load by spawning several consumer processes • The broker can choose to send messages to consumers in a round-robin fashion 10 Communication0 码力 | 33 页 | 700.14 KB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020Mechanism: How to apply the re-configuration? 3 • Detect environment changes: external workload and system performance • Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions decide which and when to apply • Allocate new resources, spawn new processes or release unused resources, safely terminate processes • Adjust dataflow channels and network connections • Re-partition processing a tuple and all its derived results • Policy • each operator as a single-server queuing system • generalized Jackson networks • Action • predictive, at-once for all operators ??? Vasiliki0 码力 | 93 页 | 2.42 MB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flinkbatch jobs. • Chops up the live stream into batches of X seconds. • Treats each batch as RDDs and processes them using RDD operations. • Discretized Stream Processing (DStream) 7 / 79 Spark Streaming batch jobs. • Chops up the live stream into batches of X seconds. • Treats each batch as RDDs and processes them using RDD operations. • Discretized Stream Processing (DStream) 7 / 79 Spark Streaming batch jobs. • Chops up the live stream into batches of X seconds. • Treats each batch as RDDs and processes them using RDD operations. • Discretized Stream Processing (DStream) 7 / 79 DStream (1/2) ▶0 码力 | 113 页 | 1.22 MB | 1 年前3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020Vasiliki Kalavri | Boston University 2020 A distributed and fault-tolerant publish-subscribe messaging system and serves as the ingestion, storage, and messaging layer for large production streaming pipelines publishes a stream of records to a Kafka topic and a consumer subscribes to one or more topics and processes the stream of records published in them. Topics are multi-subscriber, i.e. a topic can have zero consumer instance within each subscribing consumer group. Consumer instances can be in separate processes or on separate machines. If all the consumer instances have the same consumer group, then the0 码力 | 26 页 | 3.33 MB | 1 年前3
监控Apache Flink应用程序(入门)....................................................................................... 22 4.14 System Resources....................................................................................... is processed by Apache Flink, which then writes the results to a database or calls a downstream system. In such a pipeline, latency can be introduced at each stage and for various reasons including the TaskManager (in case of a containerized setup), or by providing more TaskManagers. In general, a system already running under very high load during normal operations, will need much more time to catch-up0 码力 | 23 页 | 148.62 KB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020types • The system is unaware of which parts of an operator constitute state Streaming state 3 • Explicit state primitives including state types and interfaces • The system is aware of state persistent storage, e.g. a distributed filesystem or a database system • Available state backends in Flink: • In-memory • File system • RocksDB State backends 7 Vasiliki Kalavri | Boston University purposes! FsStateBackend • Stores state on TaskManager’s heap but checkpoints it to a remote file system • In-memory speed for local accesses and fault tolerance • Limited to TaskManager’s memory and might0 码力 | 24 页 | 914.13 KB | 1 年前3
Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 Active Standby 5 • The secondary receives tuples from upstream and processes them in parallel with the primary Ni primary secondary I1 O1 N’i I1 Vasiliki Kalavri | Boston retrieve a distributed cut in a system execution that yields a system configuration Validity (safety): Termination (liveness): Obtain a valid system configuration A full system configuration is eventually eventually captured A snapshot algorithm attempts to capture a coherent global state of a distributed system ??? Vasiliki Kalavri | Boston University 2020 Snapshotting Protocols p1 p2 p3 C m0 码力 | 81 页 | 13.18 MB | 1 年前3
PyFlink 1.15 Documentationprocessing pipelines, large-scale exploratory data analysis, Machine Learning (ML) pipelines and ETL processes. If you’re already familiar with Python and libraries such as Pandas, then PyFlink makes it simpler 2 GB. If the size of an archive file is more than 2 GB, you could upload it to a distributed file system and then use the path in the command line option -pyarch. • Mix use of the above options You could pyflink-docs, Release release-1.15 1.1.1.5 Kubernetes Kubernetes is a popular container-orchestration system for automating computer application deployment, scaling, and management. This page shows you how0 码力 | 36 页 | 266.77 KB | 1 年前3
共 21 条
- 1
- 2
- 3













