PyFlink 1.15 DocumentationTableEnvironment. TableEnvironment is responsible for: • Table management: Table Creation, listing Tables, Conversion between Table and DataStream, etc. • User-defined function management: User-defined function registration +----+----------------------+--------------------------------+ 2 rows in set PyFlink Table also provides the conversion back to a pandas DataFrame to leverage pandas API. [14]: table.to_pandas() [14]: id data 0 1 n/beam/beam_operations_fast.pyx", line 158, in pyflink.fn_ ˓→execution.beam.beam_operations_fast.FunctionOperation.process File "pyflink/fn_execution/beam/beam_operations_fast.pyx", line 174, in pyflink0 码力 | 36 页 | 266.77 KB | 1 年前3
 PyFlink 1.16 DocumentationTableEnvironment. TableEnvironment is responsible for: • Table management: Table Creation, listing Tables, Conversion between Table and DataStream, etc. • User-defined function management: User-defined function registration +----+----------------------+--------------------------------+ 2 rows in set PyFlink Table also provides the conversion back to a pandas DataFrame to leverage pandas API. [14]: table.to_pandas() [14]: id data 0 1 n/beam/beam_operations_fast.pyx", line 158, in pyflink.fn_ ˓→execution.beam.beam_operations_fast.FunctionOperation.process File "pyflink/fn_execution/beam/beam_operations_fast.pyx", line 174, in pyflink0 码力 | 36 页 | 266.80 KB | 1 年前3
 High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020failures and guarantee correct results after recovery? • how can we ensure minimal downtime and fast recovery? • how can we hide recovery side-effects from downstream applications? Vasiliki Kalavri recovery? • How much input do we need to re-play? How expensive is it to re-construct the state? How fast can we de-duplicate output? Vasiliki Kalavri | Boston University 2020 Gap Recovery • Restart the0 码力 | 49 页 | 2.08 MB | 1 年前3
 Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 DSMS with load shedder 8 Synopsis Maintenance Synopsis for S1 Synopsis for Sr … Fast approximate answers … S1 S2 Sr Input Manager Scheduler QoS Monitor Load Shedder Query caused by high congestion. • In the presence of bursty traffic, CFC causes backpressure to build up fast and propagate along congested VCs to their sources which can be throttled. • Essentially, CFC0 码力 | 43 页 | 2.42 MB | 1 年前3
 Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020propagation synchronized asynchronous Data historical recent and historical ETL process complex fast and light-weight ETL: Extract-Transform-Load e.g. unzipping compressed files, data cleaning and arrival order • Small space: memory footprint poly-logarithmic in the stream size • Low time: fast update and query times • Delete-proof: synopses can handle both insertions and deletions in an0 码力 | 45 页 | 1.22 MB | 1 年前3
 Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 20207 ▸ Accuracy ▸ no over/under-provisioning ▸ Stability ▸ no oscillations ▸ Performance ▸ fast convergence scaling controller detect symptoms decide whether to scale decide how much Hoffmann, Desislava Dimitrova, Matthew Forshaw, and Timothy Roscoe. Three steps is all you need: fast, accurate, automatic scaling decisions for distributed streaming dataflows. (OSDI’18). • Moritz0 码力 | 93 页 | 2.42 MB | 1 年前3
 Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 Mobile game application • input stream: user activity • output: rewards based on how fast the user meets goals • e.g. pop 500 bubbles within 1 minute and get extra life Vasiliki Kalavri0 码力 | 22 页 | 2.22 MB | 1 年前3
 Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020• can be connected to the network • latency and unpredictable delays • might be producing too fast • stream processor needs to keep up and not shed load • might be producing too slow or become0 码力 | 33 页 | 700.14 KB | 1 年前3
 Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020?? Vasiliki Kalavri | Boston University 2020 • Evenly distributes keys across parallel tasks • Fast to compute, no routing state • High migration cost • When a new node is added, state is shuffled0 码力 | 41 页 | 4.09 MB | 1 年前3
共 9 条
- 1
 













