Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020become available. • Restart is automatic if there is a ResourceManager, e.g. in a YARN setup • A manual TaskManager re-start or a backup is required in standalone mode • The restart strategy determines are distributed over the existing nodes • On average M/N partitions are moved when the Nth node is inserted or removed from a system with M partitions 30 Consistent hashing ??? Vasiliki Kalavri0 码力 | 41 页 | 4.09 MB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020Noisy, sensitive to interference, misleading Easy-to-obtain Sensitive to thresholds and require manual tuning Oscillations, slow convergence, black-listing ??? Vasiliki Kalavri | Boston University0 码力 | 93 页 | 2.42 MB | 1 年前3
Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020html#chandy 13 ??? Vasiliki Kalavri | Boston University 2020 Snapshotting Protocols p1 p2 p3 mAB53icbVBNS8NAEJ34WetX1aOXxSJ4Kok I6q3oxWMLxhba m+4UZgO1VI41BgKxzdTv3WEyrNE3lvxikGMR1IHnFGjZWaca9SdWvuDGSZeAWp QoFGr/LV7Scsi1EaJqjWHc9NTZBTZTgTOCl3M40pZSM6wI6lksaog3x26IScWqVPokTZkobM1N8TOY21Hseh7YypGepFbyr+53UyE10FOZdpZlCy+aIoE8QkZPo16XOFzIixJZQpb m+4UZgO1VI41BgKxzdTv3WEyrNE3lvxikGMR1IHnFGjZWaca9SdWvuDGSZeAWp QoFGr/LV7Scsi1EaJqjWHc9NTZBTZTgTOCl3M40pZSM6wI6lksaog3x26IScWqVPokTZkobM1N8TOY21Hseh7YypGepFbyr+53UyE10FOZdpZlCy+aIoE8QkZPo16XOFzIixJZQpb 0 码力 | 81 页 | 13.18 MB | 1 年前3
Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020Vasiliki Kalavri | Boston University 2020 Let h be a hash function that maps each stream element into M = log2N bits, where N is the domain of input elements: For each element x, let rank(x) be the 318, h(x1) = 12 or 01100 => rank(x1) = 2 • x2 = 9013, h(x2) = 24 or 11000 => rank(x2) = 3 h(x) = M−1 ∑ k=0 ik2k = (i0i1 . . . iM−1)2, ik ∈ {0,1} 4 ??? Vasiliki Kalavri | Boston University 2020 5 the input stream into m = 2p sub-streams S0, S1, …, Sm-1 For every element x, we compute h(x) and use the p first bits of the M-bit hash value to select a sub-stream and the next M-p bits to compute the0 码力 | 69 页 | 630.01 KB | 1 年前3
PyFlink 1.15 Documentation2 CONTENTS CHAPTER ONE HOW TO BUILD DOCS LOCALLY 1. Install dependency requirements python3 -m pip install -r dev/requirements.txt 2. Conda install pandoc conda install pandoc 3. Build the docs virtual environment using virtualenv To create a virtual environment using virtualenv, run: python3 -m pip install virtualenv # Create Python virtual environment under a directory, e.g. venv virtualenv PyFlink 1.15 Installing using PyPI PyFlink could be installed using PyPI as following: python3 -m pip install apache-flink Installing from Source To install PyFlink from source, you could refer to0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentation2 CONTENTS CHAPTER ONE HOW TO BUILD DOCS LOCALLY 1. Install dependency requirements python3 -m pip install -r dev/requirements.txt 2. Conda install pandoc conda install pandoc 3. Build the docs virtual environment using virtualenv To create a virtual environment using virtualenv, run: python3 -m pip install virtualenv # Create Python virtual environment under a directory, e.g. venv virtualenv PyFlink 1.15 Installing using PyPI PyFlink could be installed using PyPI as following: python3 -m pip install apache-flink Installing from Source To install PyFlink from source, you could refer to0 码力 | 36 页 | 266.80 KB | 1 年前3
Flink如何实时分析Iceberg数据湖的CDC数据直接D入CDC到Hi2+分析 、流程能E作 2、Hi2+存量数据不受增量数据H响。 方案评估 优点 、数据不是CR写入; 2、每次数据D致都要 MERGE 存量数据 。T+ 方GT新3R效性差。 3、不M持CR1ps+rt。 缺点 SCaDk + )=AFa IL()(数据 MER,E .NTO GE=DE US.N, chan>=E ON GE=DE.GE=D.< = chan>=E.GE=D FileM否则直接a,elete行写 -DualitJ delete fileO 写T思l 1N5oFitioA ,elete File和nR SeD3umP大Qi己SeD3um 的,ata FileS 04I3M 2N-DualitJ ,elete File和n RSeD3um小Qi己SeD3um 的,ata FileS 04I3O 读取思l *CClJ ,eletioA *CClJ ,eletioA f2 f3 Ice4erg D3A3 )enAer ((2-1 -eA3sAore D3A3 )enAer ((2-2 f4 Ice4erg/Are3m1riAer Ice4erg/Are3m1riAer Ice4erg/Are3m1riAer 1 1riAe records Ao D3A3/DeleAe Files. F量文E集I1A4ns4cCion提D /4ACiCion-20 码力 | 36 页 | 781.69 KB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020false positives depends on the choice of k and n: • Let m be the number of expected elements: • If the allocated bits per element, n/m, is too small, the filter will fill up too quickly • All lookups lookups will yield a false positive • For a given n/m, the false positive probability can be tuned by choosing the number of hash functions: Pfp ≈ (1 − e km n )k k = n mln2 * *: see slide 31 many hash functions do we need in order to minimize Pfp? Optimal number of hash functions After m elements have been inserted to the filter, what is the probability P0 that a bit is still 0? 1 0 10 码力 | 74 页 | 1.06 MB | 1 年前3
Streaming in Apache Flink.print(); ... 4> (64549,5M) 4> (46298,18M) 1> (51549,14M) 1> (53043,13M) 1> (56031,22M) 1> (50797,6M) ... 1> (50797,8M) ... 1> (50797,11M) ... 1> (50797,12M) Stateful Transformations0 码力 | 45 页 | 3.00 MB | 1 年前3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020appended in the order they are sent: • if a record M1 is sent by the same producer as a record M2, and M1 is sent first, then M1 will have a lower offset than M2 and appear earlier in the log. • A consumer0 码力 | 26 页 | 3.33 MB | 1 年前3
共 15 条
- 1
- 2













