Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Alternatives • data structures • sorting vs hashing • indexing, pre-fetching • minimize disk access • scheduling Objectives • optimize resource utilization or minimize resources • decrease operators are stateless Operator re-ordering B A A B Move selective operators upstream to filter data early ??? Vasiliki Kalavri | Boston University 2020 16 Profitability • Selectivity of A = 0.5 • Profitable • theta-join operations are commutative • natural joins are associative • Move projections early to reduce data item size • Pick join orderings to minimize the size of intermediate results •0 码力 | 54 页 | 2.83 MB | 1 年前3Flink如何实时分析Iceberg数据湖的CDC数据
ACCly DeleFiBA ACCly DeleFiBA 68Ek- 68Ek-3 68Ek-2 68Ek-4 K满足y确o要求J 2Kk现高吞e写入J 3K满足n发高t读aJ 4Kb以k现EA8CEhBF级别的增量ra J 方案p结 R点 K同一N68EkV的重hDeleFe -ileb以 缓存I加速 J3I2 t率J 2KlODeleFe-ile溢出到DiEk的情况I b考虑T助K7 -I1E5 DE1E6E -I1E5 D量FCI的Tr3ns3cAion提交 .3rAiAion-2 Ice4erg -eA3sAore .3rAiAion-1 .3rAiAion-3 f1 f2 f3 Ice4erg D3A3 )enAer ((2-1 -eA3sAore D3A3 )enAer ((2-2 f4 Ice4erg/Are3m1riAer Ice4erg/Are3m1riAer0 码力 | 36 页 | 781.69 KB | 1 年前3监控Apache Flink应用程序(入门)
successfully monitor your Flink application. I highly recommend to start monitoring your Flink application early on in the development phase. This way you will be able to improve your dashboards and alerts over questions about the runtime behaviour of your application, and learn much more about Flink’s internals early on. Last but not least, this post only scratches the surface of the overall metrics and monitoring0 码力 | 23 页 | 148.62 KB | 1 年前3State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Operator state is scoped to an operator task, i.e. records processed by the same parallel task have access to the same state • It cannot be accessed by other parallel tasks of the same or different operators that maintains the state for this key • State access is automatically scoped to the key of the current record so that all records with the same key access the same state State management in Apache Flink • Keys and values are arbitrary byte arrays: serialization and deserialization is required to access the state via a Flink program. • The keys are ordered according to a user-specified comparator0 码力 | 24 页 | 914.13 KB | 1 年前3Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020
called with the key of the window, an Iterable to access the elements of the window, and a Collector to emit results. • A Context gives access to the metadata of the window (start and end timestamps custom logic for which predefined windows and transformations might not be suitable: • they provide access to record timestamps and watermarks • they can register timers that trigger at a specific time in the stream. Result records are emitted by passing them to the Collector. The Context object gives access to the timestamp and the key of the current record and to a TimerService. • onTimer(timestamp:0 码力 | 35 页 | 444.84 KB | 1 年前3Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020
single row or groups of rows Data Stream Management System • continuous queries • sequential data access, high-rate append-only updates Data Warehouse • complex, offline analysis • large and relatively Kalavri | Boston University 2020 DBMS vs. DSMS DBMS DSMS Data persistent relations streams Data Access random sequential, single-pass Updates arbitrary append-only Update rates relatively low high,0 码力 | 45 页 | 1.22 MB | 1 年前3Streaming in Apache Flink
@Override public Tuple2map (Tuple2 item) throws Exception { // access the state for this key MovingAverage average = averageState.value(); // create a new MovingAverage 0 码力 | 45 页 | 3.00 MB | 1 年前3Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020
processor • The load shedder continuously monitors input rates or other system metrics and can access information about the running query plan • It detects overload and decides what actions to take0 码力 | 43 页 | 2.42 MB | 1 年前3Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020
control command Helper operators, hidden from the application developer Helper operators have access to the downstream state Live state migration ??? Vasiliki Kalavri | Boston University 2020 360 码力 | 93 页 | 2.42 MB | 1 年前3
共 9 条
- 1