Original Source Address - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020

the day of the respective deadline. • Late submissions are only eligible for up to 50% of the original score. 15 Vasiliki Kalavri | Boston University 2020 Quiz #0 Vasiliki Kalavri | Boston University bank-apache-flink 24 Vasiliki Kalavri | Boston University 2020 Call monitoring • Service monitoring, e.g. source and destination phone numbers, their first and last cell towers Examples: • Location-based

0 码力 | 34 页 | 2.53 MB | 1 年前
3
PyFlink 1.15 Documentation

1.1 Preparation This page shows you how to install PyFlink using pip, conda, installing from the source, etc. Python Version Supported PyFlink Version Python Version Supported PyFlink 1.16 Python 3 virtual environment needs to be activated before to use it. To activate the virtual environment, run: source venv/bin/activate That is, execute the activate script under the bin directory of your virtual environment miniconda.sh # install miniconda ./miniconda.sh -b -p miniconda # Activate the miniconda environment source miniconda/bin/activate # Create conda virtual environment under a directory, e.g. venv conda create

0 码力 | 36 页 | 266.77 KB | 1 年前
3
PyFlink 1.16 Documentation

1.1 Preparation This page shows you how to install PyFlink using pip, conda, installing from the source, etc. Python Version Supported PyFlink Version Python Version Supported PyFlink 1.16 Python 3 virtual environment needs to be activated before to use it. To activate the virtual environment, run: source venv/bin/activate That is, execute the activate script under the bin directory of your virtual environment miniconda.sh # install miniconda ./miniconda.sh -b -p miniconda # Activate the miniconda environment source miniconda/bin/activate # Create conda virtual environment under a directory, e.g. venv conda create

0 码力 | 36 页 | 266.80 KB | 1 年前
3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020

covered in this lecture ??? Vasiliki Kalavri | Boston University 2020 Revisiting the basics 3 source sink input port output port dataflow graph ??? Vasiliki Kalavri | Boston University 2020 Revisiting operators according to the number of available cores / threads • Fused operators can share the address space but use separate threads of control • avoid communication cost without losing pipeline pipeline parallelism • use a shared buffer for communication • Fused filters / projections at the source can significantly reduce I/O and intermediate results size Synergies with scheduling and other optimizations

0 码力 | 54 页 | 2.83 MB | 1 年前
3
Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020

stream into m = 2p = 4 sub-streams. Consider the input elements {5, 14, 5, 2, 8, 1, …} Substream Address Counter S0 00 S1 01 S2 10 S3 11 ??? Vasiliki Kalavri | Boston University 2020 11 Stochastic stream into m = 2p = 4 sub-streams. Consider the input elements {5, 14, 5, 2, 8, 1, …} Substream Address Counter S0 00 S1 01 S2 10 S3 11 • x1=5, h5(5) = 00101 • x2=14, h5(14) = 10110 • x3=5 stream into m = 2p = 4 sub-streams. Consider the input elements {5, 14, 5, 2, 8, 1, …} Substream Address Counter S0 00 S1 01 S2 10 S3 11 • x1=5, h5(5) = 00101 • x2=14, h5(14) = 10110 • x3=5

0 码力 | 69 页 | 630.01 KB | 1 年前
3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020

high-availability mode migrates the responsibility and metadata for a job to another JobManager in case the original JobManager disappears. • Flink relies on Apache ZooKeeper for high-availability • coordination

0 码力 | 41 页 | 4.09 MB | 1 年前
3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020

What data structure would you use to: • Filter out all emails that are sent from a suspected spam address? • Filter out all URLs that contain malware? • Filter out all compromised passwords? • Remove What data structure would you use to: • Filter out all emails that are sent from a suspected spam address? • Filter out all URLs that contain malware? • Filter out all compromised passwords? • Remove

0 码力 | 74 页 | 1.06 MB | 1 年前
3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Vasiliki Kalavri | Boston University 2020 Exactly-once in Google Cloud Dataflow Checkpointing to address non-determinism • Each output is checkpointed together with its unique ID to stable storage before

0 码力 | 49 页 | 2.08 MB | 1 年前
3
Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020

cause imbalance w2 w1 w3 ??? Vasiliki Kalavri | Boston University 2020 Addressing skew • To address skew, the system needs to track the frequencies of the partitioning key values. • We can then

0 码力 | 31 页 | 1.47 MB | 1 年前
3
Scalable Stream Processing - Spark Streaming and Flink

Operations ▶ Every input DStream is associated with a Receiver object. • It receives the data from a source and stores it in Spark’s memory for processing. ▶ Three categories of streaming sources: 1. Basic Operations ▶ Every input DStream is associated with a Receiver object. • It receives the data from a source and stores it in Spark’s memory for processing. ▶ Three categories of streaming sources: 1. Basic [number of partitions]) 15 / 79 Input Operations - Custom Sources (1/3) ▶ To create a custom source: extend the Receiver class. ▶ Implement onStart() and onStop(). ▶ Call store(data) to store received

0 码力 | 113 页 | 1.22 MB | 1 年前
3

共 21 条前往

页

分类

语言

格式

Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020

PyFlink 1.15 Documentation

PyFlink 1.16 Documentation

Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020

High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Scalable Stream Processing - Spark Streaming and Flink