Scalable Stream Processing - Spark Streaming and Flinkthe external storage. • This mode works for output sinks that can be updated in place, such as a MySQL table. 59 / 79 Output Modes ▶ Three output modes: 1. Append: only the new rows appended to the the external storage. • This mode works for output sinks that can be updated in place, such as a MySQL table. 59 / 79 Output Modes ▶ Three output modes: 1. Append: only the new rows appended to the the external storage. • This mode works for output sinks that can be updated in place, such as a MySQL table. 59 / 79 Structured Streaming Example (1/3) ▶ Assume we receive (id, time, action) events0 码力 | 113 页 | 1.22 MB | 1 年前3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020recovery (at-least-once) • It avoids information loss • The output may contain duplicates • A backup needs to rebuild state of the failed node 8 Vasiliki Kalavri | Boston University 2020 Recovery output may contain duplicates • A backup needs to rebuild state of the failed node • Gap recovery (at-most-once) • It drops data during failure • The backup starts from most recent information Can you see any disadvantage in this approach? Vasiliki Kalavri | Boston University 2020 Upstream Backup Upstream nodes act as backups for their downstream operators by logging tuples in their output0 码力 | 49 页 | 2.08 MB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020Filter out all compromised passwords? • Remove duplicate tuples on recovery when using upstream backup? The membership problem ??? Vasiliki Kalavri | Boston University 2020 22 What data structure Filter out all compromised passwords? • Remove duplicate tuples on recovery when using upstream backup? The membership problem A hash table requires O(logn) bits per element which might still be0 码力 | 74 页 | 1.06 MB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020automatic if there is a ResourceManager, e.g. in a YARN setup • A manual TaskManager re-start or a backup is required in standalone mode • The restart strategy determines how often the JobManager tries0 码力 | 41 页 | 4.09 MB | 1 年前3
PyFlink 1.15 Documentationfactories.DynamicTableFactory’ in the classpath . . . . . . . 26 1.3.4.2 O2: ClassNotFoundException: com.mysql.cj.jdbc.Driver . . . . . . . . . . . . . . 29 1.3.4.3 O3: NoSuchMethodError: org.apache.flink.table dependency management page of official PyFlink documentation. 1.3.4.2 O2: ClassNotFoundException: com.mysql.cj.jdbc.Driver py4j.protocol.Py4JJavaError: An error occurred while calling o13.execute. : org.apache java:575) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: com.mysql.cj.jdbc.Driver at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentationfactories.DynamicTableFactory’ in the classpath . . . . . . . 26 1.3.4.2 O2: ClassNotFoundException: com.mysql.cj.jdbc.Driver . . . . . . . . . . . . . . 29 1.3.4.3 O3: NoSuchMethodError: org.apache.flink.table dependency management page of official PyFlink documentation. 1.3.4.2 O2: ClassNotFoundException: com.mysql.cj.jdbc.Driver py4j.protocol.Py4JJavaError: An error occurred while calling o13.execute. : org.apache java:575) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: com.mysql.cj.jdbc.Driver at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader0 码力 | 36 页 | 266.80 KB | 1 年前3
Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 20202020 Fault-tolerance approaches recap 3 Vasiliki Kalavri | Boston University 2020 Upstream Backup Upstream nodes act as backups for their downstream operators by logging tuples in their output0 码力 | 81 页 | 13.18 MB | 1 年前3
共 7 条
- 1













