Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020vkalavri@bu.edu Spring 2020 1/21: Introduction Vasiliki Kalavri | Boston University 2020 Course Information • Instructor: Vasiliki Kalavri • Office: MCS 206 • Contact: vkalavri@bu.edu • Course Time At the end of the course, you will hopefully: • know when to use stream processing vs other technology • be able to comprehensively compare features and processing guarantees of streaming systems Analysis of real-time vehicle locations to improve traffic conditions • Provide real-time scheduling information for public transport • Optimize transport network flow and recommend alternative routes Example:0 码力 | 34 页 | 2.53 MB | 1 年前3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020Post-failure output is identical to no-failure • Rollback recovery (at-least-once) • It avoids information loss • The output may contain duplicates • A backup needs to rebuild state of the failed node Post-failure output is identical to no-failure • Rollback recovery (at-least-once) • It avoids information loss • The output may contain duplicates • A backup needs to rebuild state of the failed node recovery (at-most-once) • It drops data during failure • The backup starts from most recent information 8 Vasiliki Kalavri | Boston University 2020 Recovery semantics Given a dataflow Q, let Oe be0 码力 | 49 页 | 2.08 MB | 1 年前3
PyFlink 1.15 DocumentationLONG() It should be noted that Types.BIG_INT() represents type information for the Java BigInteger, while Types.LONG() represents type information for long integer. There are several users are using Types.BIG_INT()0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 DocumentationLONG() It should be noted that Types.BIG_INT() represents type information for the Java BigInteger, while Types.LONG() represents type information for long integer. There are several users are using Types.BIG_INT()0 码力 | 36 页 | 266.80 KB | 1 年前3
Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020watermark in each passing record, e.g. if the stream contains special records that encode watermark information. val env = StreamExecutionEnvironment.getExecutionEnvironment env.setStreamTimeCharacteristi0 码力 | 22 页 | 2.22 MB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020publish publish notify() subscribe() unsubscribe() subscribe notify unsubscribe advertise(): information reg. future events Publish/Subscribe Systems 17 Pub/Sub levels of de-coupling • Space: interacting0 码力 | 33 页 | 700.14 KB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020holistic aggregates • Compute on most recent events only • when providing real-time traffic information, you probably don't care about an accident that happened 2 hours ago • Recent might mean different0 码力 | 35 页 | 444.84 KB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020• The load shedder continuously monitors input rates or other system metrics and can access information about the running query plan • It detects overload and decides what actions to take in order0 码力 | 43 页 | 2.42 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020optimization • Statis Viglas and Jeffrey Naughton. Rate-based Query Optimization for Streaming Information Sources. SIGMOD 2002. Further reading0 码力 | 54 页 | 2.83 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 Lecture references • Gianpaolo Cugola and Alessandro Margara. Processing flows of information: From data stream to complex event processing. ACM Comput. Surv. 44, 3, Article 15 (June 2012)0 码力 | 53 页 | 532.37 KB | 1 年前3
共 11 条
- 1
- 2













