Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020selectivity > 1 • a filter operator typically has selectivity < 1 Is selectivity always known at development time? ??? Vasiliki Kalavri | Boston University 2020 Types of Parallelism 7 B A C A B D perform an equivalent computation • Ensure mergeable state: even a simple counter might differ on a combined stream vs. on separate streams Redundancy elimination Eliminate redundant operations, aka subgraph0 码力 | 54 页 | 2.83 MB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020• MBs assume a small working set. If consumers are slow, throughput might degrade. • DBs support secondary indexes for efficient search while MBs only offer topic-based subscription. • DB query the form of name-value pairs and basic comparison operators. • Constraints can be logically combined to form complex event patterns. • company == ‘Uber’ and price < 100 • Predecessors of Complex0 码力 | 33 页 | 700.14 KB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 20206 Vasiliki Kalavri | Boston University 2020 1. Process events online without storing them 2. Support a high-level language (e.g. StreamSQL) 3. Handle missing, out-of-order, delayed data 4. Guarantee Combine batch (historical) and stream processing 6. Ensure availability despite failures 7. Support distribution and automatic elasticity 8. Offer low-latency 7 2005 Vasiliki Kalavri | Boston limitation on the stream: updates cannot change past entries in A. 11 Useful in theory for the development of streaming algorithms With limited practical value in distributed, real-world settings Vasiliki0 码力 | 45 页 | 1.22 MB | 1 年前3
PyFlink 1.15 Documentationcontains its own Python executable files and the installed Python packages. It is useful for local development to create a standalone Python environment and also useful when deploying a PyFlink job to production Local This page shows you how to set up PyFlink development environment in your local machine. This is usually used for local execution or development in an IDE. Set up Python environment It requires ExecNodeBase. ˓→translateToPlan(ExecNodeBase.java:134) This is an issue around Java 17. It still doesn’t support Java 17 in Flink. You can refer to FLINK-15736 for more details. To solve this issue, you need to0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentationcontains its own Python executable files and the installed Python packages. It is useful for local development to create a standalone Python environment and also useful when deploying a PyFlink job to production Local This page shows you how to set up PyFlink development environment in your local machine. This is usually used for local execution or development in an IDE. Set up Python environment It requires ExecNodeBase. ˓→translateToPlan(ExecNodeBase.java:134) This is an issue around Java 17. It still doesn’t support Java 17 in Flink. You can refer to FLINK-15736 for more details. To solve this issue, you need to0 码力 | 36 页 | 266.80 KB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020operations and types 4 Consider you are designing a state interface. What operations should state support? What state types can you think of? • Count, sum, list, map, … Vasiliki Kalavri | Boston University Checkpoints sent to JobManager's heap memory, i.e. the state is lost in case of failure • Use only for development and debugging purposes! FsStateBackend • Stores state on TaskManager’s heap but checkpoints it0 码力 | 24 页 | 914.13 KB | 1 年前3
监控Apache Flink应用程序(入门)Flink application. I highly recommend to start monitoring your Flink application early on in the development phase. This way you will be able to improve your dashboards and alerts over time and, more importantly importantly, observe the performance impact of the changes to your application throughout the development phase. By doing so, you can ask the right questions about the runtime behaviour of your application0 码力 | 23 页 | 148.62 KB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020Java JDK. A Java JRE is not sufficient! • Apache Maven 3.x. • An IDE for Java and/or Scala development, such as IntelliJ IDEA (preferred), Eclipse, or Netbeans with appropriate plugins installed.0 码力 | 34 页 | 2.53 MB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020iterate over the list of all collected elements when evaluated: • They require more space but support more complex logic. • ProcessWindowFunction Window functions 14 Vasiliki Kalavri | Boston University0 码力 | 35 页 | 444.84 KB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020value of B 20 Vasiliki Kalavri | Boston University 2020 What kind of queries can we express and support on data streams? 21 Vasiliki Kalavri | Boston University 2020 Non-blocking (monotonic) queries0 码力 | 53 页 | 532.37 KB | 1 年前3
共 11 条
- 1
- 2













