监控Apache Flink应用程序(入门)NonHeap.C ommitted job-/ taskmana ger The amount of non-heap memory guaranteed to be available to the JVM (in bytes). Status.JVM.Memory.Heap.Used job-/ taskmana ger The amount of heap Memory.Heap.Comm itted job-/ taskmana ger The amount of heap memory guaranteed to be available to the JVM (in bytes). caolei – 监控Apache Flink应用程序(入门) 进度和吞吐量监控 – 18 Status.JVM.Memory starting point when you first think about how to successfully monitor your Flink application. I highly recommend to start monitoring your Flink application early on in the development phase. This way0 码力 | 23 页 | 148.62 KB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020event rate? • drop messages • buffer messages in a queue: what if the queue grows larger than available memory? 2 ??? Vasiliki Kalavri | Boston University 2020 Keeping up with the producers • Producers event rate? • drop messages • buffer messages in a queue: what if the queue grows larger than available memory? • block the producer (back-pressure, flow control) 2 ??? Vasiliki Kalavri | Boston University their receivers and receivers regularly send notifications upstream containing their number of available credits. • One credit corresponds to some amount of buffer space so that a sender can know0 码力 | 43 页 | 2.42 MB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020of an application. • The JobManager cannot restart the application until enough slots become available. • Restart is automatic if there is a ResourceManager, e.g. in a YARN setup • A manual TaskManager restarts the application and resets the state of all its tasks to the last completed checkpoint. Highly available Flink setup ??? Vasiliki Kalavri | Boston University 2020 To avoid repeating failures, Flink0 码力 | 41 页 | 4.09 MB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020Advantages of sampling ??? Vasiliki Kalavri | Boston University 2020 20 • It might be unsuitable for highly selective queries: • queries that depend only upon a few tuples from the dataset • Providing0 码力 | 74 页 | 1.06 MB | 1 年前3
PyFlink 1.15 Documentation(suppose miniconda), run: # Download and install miniconda, the latest miniconda installers are available in https: ˓→//repo.anaconda.com/miniconda/ # Suppose the name of the downloaded miniconda installer IDE. Set up Python environment It requires Python 3.6 or above with PyFlink pre-installed to be available in your local environment. It’s suggested to use Python virtual environments to set up your local cluster. Set up Python environment It requires Python 3.6 or above with PyFlink pre-installed to be available on the nodes of the standalone cluster. It’s sug- gested to use Python virtual environments to set0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentation(suppose miniconda), run: # Download and install miniconda, the latest miniconda installers are available in https: ˓→//repo.anaconda.com/miniconda/ # Suppose the name of the downloaded miniconda installer IDE. Set up Python environment It requires Python 3.6 or above with PyFlink pre-installed to be available in your local environment. It’s suggested to use Python virtual environments to set up your local cluster. Set up Python environment It requires Python 3.6 or above with PyFlink pre-installed to be available on the nodes of the standalone cluster. It’s sug- gested to use Python virtual environments to set0 码力 | 36 页 | 266.80 KB | 1 年前3
Scalable Stream Processing - Spark Streaming and FlinkSpark’s memory for processing. ▶ Three categories of streaming sources: 1. Basic sources directly available in the StreamingContext API, e.g., file systems, socket connections. 2. Advanced sources, e.g. Spark’s memory for processing. ▶ Three categories of streaming sources: 1. Basic sources directly available in the StreamingContext API, e.g., file systems, socket connections. 2. Advanced sources, e.g. off explicitly by a call to the start() method. ▶ DStreams support many of the transformations available on normal Spark RDDs. 20 / 79 Transformations (2/4) ▶ map • Returns a new DStream by passing0 码力 | 113 页 | 1.22 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020required by a fused operator should remain available. • Ensure resource amounts: the total amount of resources required by the fused operator must be available on a single host. • Avoid infinite recursion: • The optimizer can interact with the scheduler and fuse operators according to the number of available cores / threads • Fused operators can share the address space but use separate threads of control0 码力 | 54 页 | 2.83 MB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020state to remote and persistent storage, e.g. a distributed filesystem or a database system • Available state backends in Flink: • In-memory • File system • RocksDB State backends 7 Vasiliki 2020 • RocksDB is a persistent key value store: data lives on disk, state can grow larger than available memory and will not be lost upon failure. • Keys and values are arbitrary byte arrays: serialization0 码力 | 24 页 | 914.13 KB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020database. A data stream is a data set that is produced incrementally over time, rather than being available in full before its processing begins. • Data streams are high-volume, real-time data that might Boston University 2020 Properties of data streams • They arrive continuously instead of being available a-priori. • They bear an arrival and/or a generation timestamp. • They are produced by external0 码力 | 45 页 | 1.22 MB | 1 年前3
共 13 条
- 1
- 2













