Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Vasiliki Kalavri | Boston University 2020 • The JobManager is a single point of failure Flink applications • It keeps metadata about application execution, such as pointers to completed checkpoints. parallelism • scale out to process increased load • scale in to save resources • Fix bugs or change business logic • Optimize execution plan • Change operator placement • skew and straggler mitigation software version 9 Reconfiguration cases ??? Vasiliki Kalavri | Boston University 2020 Streaming applications are long-running • Workload will change • Conditions might change • State is accumulated0 码力 | 41 页 | 4.09 MB | 1 年前3State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020
by a task and used to compute results: a local or instance variable that is accessed by a task’s business logic Operator state is scoped to an operator task, i.e. records processed by the same parallel • Checkpoints state to a remote file system and supports incremental checkpoints • Use for applications with very large state Which backend to choose? 9 Vasiliki Kalavri | Boston University 20200 码力 | 24 页 | 914.13 KB | 1 年前3Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Vasiliki Kalavri | Boston University 2020 B 21 Profitability • Running two applications together on a single core, one with operators B and C, the other with operators B and D. Redundancy elimination Multi-tenancy • in streaming systems that build one dataflow graph for several queries • when applications analyze data streams from a small set of sources • Operator elimination • remove a no-op, if the batched operator shares a lock with an upstream operator. • Satisfy deadlines: for applications with real-time constraints or QoS latency constraints. Batching Process multiple data elements0 码力 | 54 页 | 2.83 MB | 1 年前3PyFlink 1.15 Documentation
dianfu staff 295K 10 18 20:43 log4j-api-2.17.1.jar # -rw-r--r-- 1 dianfu staff 1.7M 10 18 20:43 log4j-core-2.17.1.jar # -rw-r--r-- 1 dianfu staff 24K 10 18 20:43 log4j-slf4j-impl-2.17.1.jar Please make sure [2]:Table Creation Table is a core component of the Python Table API. A Table object describes a pipeline of data transformations. It QuickStart: DataStream API Apache Flink offers a DataStream API for building robust, stateful streaming applications. It provides fine-grained control over state and timer, which allows for the implementation of 0 码力 | 36 页 | 266.77 KB | 1 年前3PyFlink 1.16 Documentation
dianfu staff 295K 10 18 20:43 log4j-api-2.17.1.jar # -rw-r--r-- 1 dianfu staff 1.7M 10 18 20:43 log4j-core-2.17.1.jar # -rw-r--r-- 1 dianfu staff 24K 10 18 20:43 log4j-slf4j-impl-2.17.1.jar Please make sure [2]:Table Creation Table is a core component of the Python Table API. A Table object describes a pipeline of data transformations. It QuickStart: DataStream API Apache Flink offers a DataStream API for building robust, stateful streaming applications. It provides fine-grained control over state and timer, which allows for the implementation of 0 码力 | 36 页 | 266.80 KB | 1 年前3Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020
proficient in using Apache Flink and Kafka to build end-to-end, scalable, and reliable streaming applications • have a solid understanding of how stream processing systems work and what factors affect their of the challenges and trade-offs one needs to consider when designing and deploying streaming applications 6 Vasiliki Kalavri | Boston University 2020 Grading Scheme (1) • No Exam • 5 in-class quizzes virtual machine to run Flink in a UNIX environment. • A Java 8.x installation. To develop Flink applications and use its DataStream API in Java or Scala you will need a Java JDK. A Java JRE is not sufficient0 码力 | 34 页 | 2.53 MB | 1 年前3Apache Flink的过去、现在和未来
offline Real-time Batch Processing Continuous Processing & Streaming Analytics Event-driven Applications ✔ 现在 Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator offline Real-time Batch Processing Continuous Processing & Streaming Analytics Event-driven Applications ✔ ✔ 未来 Micro Services O_0 O_1 I_0 I_1 I_2 P_0 P_1 P_2 S_0 S_1 Order Inventory Payment offline Real-time Batch Processing Continuous Processing & Streaming Analytics Event-driven Applications ✔ ✔ ✔ 扫码加入社群 与志同道合的码友一起 Code Up 阿里云开发者社区 Apache Flink China 2群 粘贴二维码 谢谢!0 码力 | 33 页 | 3.36 MB | 1 年前3Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Complex Event Processing (CEP) systems 22 Google Cloud Pub/Sub Publishers and Subscribers are applications. 23 Use-cases • Balancing workloads in network clusters • tasks can be efficiently distributed lecture was assembled from the following sources: • Martin Kleppmann. Designing data-intensive applications (O’Reilly Media) • Patrick Th. Eugster, Pascal A. Felber, Rachid Guerraoui, and Anne-Marie0 码力 | 33 页 | 700.14 KB | 1 年前3Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Vasiliki Kalavri | Boston University 2020 What is a stream? • In traditional data processing applications, we know the entire dataset in advance, e.g. tables stored in a database. A data stream is Summary Today you learned: • stream representations, stream processing models • streaming applications and use-cases • different approaches to data management • the relational streaming model vs0 码力 | 45 页 | 1.22 MB | 1 年前3监控Apache Flink应用程序(入门)
– 监控Apache Flink应用程序(入门) – 4 原文地址:https://www.ververica.com/blog/monitoring-apache-flink-applications-101 这篇博文介绍了Apache Flink内置的监控和度量系统,通过该系统,开发人员可以有效地监控他们的Flink作 业。通常,对于一个刚刚开始使用Apache Flink进行流处 NonHeap, Direct & Mapped memory for JobManagers and TaskManagers. • Heap memory - as with most JVM applications - is the most volatile and important metric to watch. This is especially true when using Flink’s0 码力 | 23 页 | 148.62 KB | 1 年前3
共 16 条
- 1
- 2