PyFlink 1.15 Documentationdeploying a PyFlink job to production when there are massive Python dependencies. It’s supported to use Python virtual environment in your PyFlink jobs, see PyFlink Dependency Management for more details --python /path/to/python/executable venv The virtual environment needs to be activated before to use it. To activate the virtual environment, run: source venv/bin/activate That is, execute the activate conda create --name venv python=3.8 -y The conda virtual environment needs to be activated before to use it. To activate the conda virtual environment, run: 4 Chapter 1. How to build docs locally pyflink-docs0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentationdeploying a PyFlink job to production when there are massive Python dependencies. It’s supported to use Python virtual environment in your PyFlink jobs, see PyFlink Dependency Management for more details --python /path/to/python/executable venv The virtual environment needs to be activated before to use it. To activate the virtual environment, run: source venv/bin/activate That is, execute the activate conda create --name venv python=3.8 -y The conda virtual environment needs to be activated before to use it. To activate the conda virtual environment, run: 4 Chapter 1. How to build docs locally pyflink-docs0 码力 | 36 页 | 266.80 KB | 1 年前3
Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 How can we count the number of distinct elements seen so far in a stream? 3 Example use-case: Distinct users visiting one or multiple webpages ??? Vasiliki Kalavri | Boston University 2020 2020 How can we count the number of distinct elements seen so far in a stream? 3 Example use-case: Distinct users visiting one or multiple webpages Naive solution: maintain a hash table ??? Vasiliki University 2020 How can we count the number of distinct elements seen so far in a stream? 3 Example use-case: Distinct users visiting one or multiple webpages Naive solution: maintain a hash table Convert0 码力 | 69 页 | 630.01 KB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020proportion of the stream, e.g. 1/10th 7 search enginequery stream Example use-case: Web search user behavior study Q: How many queries did users repeat last month? ??? Vasiliki we can store 1/10th of the stream, we select a stream element i with probability 10%. • We can use a random generator that produces an integer ri between 0 and 9. We then select an input element i we can store 1/10th of the stream, we select a stream element i with probability 10%. • We can use a random generator that produces an integer ri between 0 and 9. We then select an input element i 0 码力 | 74 页 | 1.06 MB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020| Boston University 2020 Outcomes At the end of the course, you will hopefully: • know when to use stream processing vs other technology • be able to comprehensively compare features and processing announcements Vasiliki Kalavri | Boston University 2020 Guest Lectures • Learn about real-world use-cases of stream processing in industry • Learn from experts with decades of hands-on experience in the Official Semester Dates 11 Vasiliki Kalavri | Boston University 2020 Final Project You will use Apache Flink and Kafka to build a real-time monitoring and anomaly detection framework for datacenters0 码力 | 34 页 | 2.53 MB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020Window sensor readings Vasiliki Kalavri | Boston University 2020 In the DataStream API, you can use the time characteristic to tell Flink how to define time when you are creating windows. The time characteristic streaming execution environment val env = StreamExecutionEnvironment.getExecutionEnvironment // use event time for the application env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) Vasiliki Kalavri | Boston University 2020 Time-based window assigners for the most common windowing use cases: • They assign an element based on its event-time timestamp or the current processing time0 码力 | 35 页 | 444.84 KB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020Databases • DBs keep data until explicitly deleted while MBs delete messages once consumed. • Use a database for long-term data storage! • MBs assume a small working set. If consumers are slow, throughput topic, too. • Topic names are represented with URL-like notation and some systems also allow the use of wildcards. 21 Content-based Pub/Sub • Events are grouped according to event properties or contents Processing (CEP) systems 22 Google Cloud Pub/Sub Publishers and Subscribers are applications. 23 Use-cases • Balancing workloads in network clusters • tasks can be efficiently distributed among multiple0 码力 | 33 页 | 700.14 KB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020split merge split When might this be beneficial? ??? Vasiliki Kalavri | Boston University 2020 • Use equivalence transformation rules if the language allows • selection operations are commutative Fused operators can share the address space but use separate threads of control • avoid communication cost without losing pipeline parallelism • use a shared buffer for communication • Fused filters deterministic batch computations on small time intervals • Keep intermediate state in memory • Use Spark's RDDs instead of replication • Parallel recovery mechanism in case of failures 44 input0 码力 | 54 页 | 2.83 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020is called a sequence, of length n, of tuples from R. The empty sequence [ ] has length 0. We use t ∈ S to denote that, for some 1 ≤ i ≤ n, ti = t. 23 Vasiliki Kalavri | Boston University 2020 Model of departments that satisfy this query However this sum query cannot be expressed without the use of aggregates! 31 Non-blocking SQL Vasiliki Kalavri | Boston University 2020 SQL extensions and University 2020 SQL extensions for streams Why SQL-based approaches? • Ideally, we would like to use the same language for querying both streaming and static data. Requirements (or why SQL is not0 码力 | 53 页 | 532.37 KB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri | Boston University 2020 src o1 o2 10 recs 10 recs 1 2 3 4 100 rec 100 recs Intuition: use the dataflow graph to extract operator dependencies and system instrumentation to collect accurate Kalavri | Boston University 2020 src o1 o2 10 recs 10 recs 1 2 3 4 100 rec 100 recs Intuition: use the dataflow graph to extract operator dependencies and system instrumentation to collect accurate Kalavri | Boston University 2020 src o1 o2 10 recs 10 recs 1 2 3 4 100 rec 100 recs Intuition: use the dataflow graph to extract operator dependencies and system instrumentation to collect accurate0 码力 | 93 页 | 2.42 MB | 1 年前3
共 19 条
- 1
- 2













