Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 Snapshotting Protocols p1 p2 p3 mAB53icbVBNS8NAEJ34WetX1aOXxSJ4Kok I6q3oxWMLxhbaUDbSbt2swm7G6GE/gIvHlS8+pe8+W/ctjlo64OBx3szMwLU8G UYJjuEzsCDS6jDHTABwYIz/AKb86j8+K8Ox/z1hWnmDmCP3A+fwBD/ozF AB53icbVBNS8NAEJ34WetX1aOXxSJ4Kok I6q3oxWMLxhbaUDbSbt2swm7G6GE/gIvHlS8+pe8+W/ctjlo64OBx3szMwLU8G UYJjuEzsCDS6jDHTABwYIz/AKb86j8+K8Ox/z1hWnmDmCP3A+fwBD/ozF AB53icbVBNS8NAEJ34WetX1aOXxSJ4Kok I6q3oxWMLxhbaUDbSbt2swm7G6GE/gIvHlS8+pe8+W/ctjlo64OBx3szMwLU8G 0 码力 | 81 页 | 13.18 MB | 1 年前3
Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020far and let R be the maximum value of rank(.) seen so far. ??? Vasiliki Kalavri | Boston University 2020 5 Let n be the number of distinct elements in the input stream so far and let R be the maximum maximum value of rank(.) seen so far. ̂n = 2R Claim: The maximum observed rank is a good estimate of log2n. In other words, the estimated number of distinct elements is equal to: ??? Vasiliki Kalavri | is the maximum we’ve seen, that indicates 4 distinct elements, … It takes 2r hash calls before we encounter a result with r 0s. 6 ??? Vasiliki Kalavri | Boston University 2020 Is this a good estimate0 码力 | 69 页 | 630.01 KB | 1 年前3
PyFlink 1.15 DocumentationCHAPTER ONE HOW TO BUILD DOCS LOCALLY 1. Install dependency requirements python3 -m pip install -r dev/requirements.txt 2. Conda install pandoc conda install pandoc 3. Build the docs python3 setup pyflink-docs, Release release-1.15 (continued from previous page) # -rw-r--r-- 1 dianfu staff 45K 10 18 20:54 flink-dianfu-python-B-7174MD6R-1908. ˓→local.log Besides, you could also check if the files of packages as following: # -rw-r--r-- 1 dianfu staff 190K 10 18 20:43 flink-cep-1.15.2.jar # -rw-r--r-- 1 dianfu staff 475K 10 18 20:43 flink-connector-files-1.15.2.jar # -rw-r--r-- 1 dianfu staff 93K 10 180 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 DocumentationCHAPTER ONE HOW TO BUILD DOCS LOCALLY 1. Install dependency requirements python3 -m pip install -r dev/requirements.txt 2. Conda install pandoc conda install pandoc 3. Build the docs python3 setup pyflink-docs, Release release-1.16 (continued from previous page) # -rw-r--r-- 1 dianfu staff 45K 10 18 20:54 flink-dianfu-python-B-7174MD6R-1908. ˓→local.log Besides, you could also check if the files of packages as following: # -rw-r--r-- 1 dianfu staff 190K 10 18 20:43 flink-cep-1.15.2.jar # -rw-r--r-- 1 dianfu staff 475K 10 18 20:43 flink-connector-files-1.15.2.jar # -rw-r--r-- 1 dianfu staff 93K 10 180 码力 | 36 页 | 266.80 KB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 20202020 Example 22 i=1 i=2 i=3 i=4 λ1 o = 2000r/s λ1 p = 450r/s λ2 o = 1800r/s λ2 p = 550r/s λ1 p = 1000r/s λ2 p = 950r/s λ3 p = 980r/s ??? Vasiliki Kalavri | Boston University 2020 Example 2000 r/s λ1 o = 2000r/s λ1 p = 450r/s λ2 o = 1800r/s λ2 p = 550r/s o2[λp] = 1000 r/s o2[λo] = 3800 r/s λ1 p = 1000r/s λ2 p = 950r/s λ3 p = 980r/s o3[λp] = 2930 r/s o3[λo] = 600 r/s ? 2000 r/s λ1 o = 2000r/s λ1 p = 450r/s λ2 o = 1800r/s λ2 p = 550r/s o2[λp] = 1000 r/s o2[λo] = 3800 r/s λ1 p = 1000r/s λ2 p = 950r/s λ3 p = 980r/s o3[λp] = 2930 r/s o3[λo] = 600 r/s π20 码力 | 93 页 | 2.42 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 20206.2, 28K> 17 append… … … … new events old events R(t1) R(t2) R(t3) R(tk) Vasiliki Kalavri | Boston University 2020 • Base streams are typically append-only mathematical structure, e.g. a sequence of (finite) relation states over a common schema R: [r1(R), r2(R), ..., ], where the individual relations are unordered sets. src dest bytes 1 2 20K 2 5 32K where indicates that t ∈ rj • as a serialization of r1 followed by a series of delta tuples that indicate updates to make to obtain r2, r3, ..., etc. • as a replacement sequence where some attribute 0 码力 | 45 页 | 1.22 MB | 1 年前3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020sensorData = env.addSource(new SensorSource) val maxTemp = sensorData .map(r => Reading(r.id,r.time,(r.temp-32)*(5.0/9.0))) .keyBy(_.id) .max("temp") maxTemp.print() sensorData = env.addSource(new SensorSource) val maxTemp = sensorData .map(r => Reading(r.id,r.time,(r.temp-32)*(5.0/9.0))) .keyBy(_.id) .max("temp") maxTemp.print() sensorData = env.addSource(new SensorSource) val maxTemp = sensorData .map(r => Reading(r.id,r.time,(r.temp-32)*(5.0/9.0))) .keyBy(_.id) .max("temp") maxTemp.print()0 码力 | 26 页 | 3.33 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020relation R contains a stream elementwhenever tuple s is in R(τ) − R(τ − 1). • Dstream (for “delete stream”) applied to relation R contains a stream elementwhenever tuple s is in R(τ − − 1) − R(τ). • Rstream (for “relation stream”) applied to relation R contains a stream elementwhenever tuple s is in R at time τ. 6 Vasiliki Kalavri | Boston University 2020 Imperative language: arrival time. Sequence: Let t1, … ,tn be tuples from a relation R. The list S = [t1, … ,tn] is called a sequence, of length n, of tuples from R. The empty sequence [ ] has length 0. We use t ∈ S to denote0 码力 | 53 页 | 532.37 KB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020sensorData = env.addSource(new SensorSource) val maxTemp = sensorData .map(r => Reading(r.id,r.time,(r.temp-32)*(5.0/9.0))) .keyBy(_.id) .timeWindow(Time.minutes(1)) .max("temp") DataStream[(String, Double)] = sensorData .map(r => (r.id, r.temperature)) .keyBy(_._1) .timeWindow(Time.seconds(15)) .reduce((r1, r2) => (r1._1, r1._2.min(r2._2))) 15 ReduceFunction example The Boston University 2020 val avgTempPerWindow: DataStream[(String, Double)] = sensorData .map(r => (r.id, r.temperature)) .keyBy(_._1) .timeWindow(Time.seconds(15)) .aggregate(new AvgTempFunction)0 码力 | 35 页 | 444.84 KB | 1 年前3
Flink如何实时分析Iceberg数据湖的CDC数据直接D入CDC到Hi2+分析 、流程能E作 2、Hi2+存量数据不受增量数据H响。 方案评估 优点 、数据不是CR写入; 2、每次数据D致都要 MERGE 存量数据 。T+ 方GT新3R效性差。 3、不M持CR1ps+rt。 缺点 SCaDk + )=AFa IL()(数据 MER,E .NTO GE=DE US.N, chan>=E ON GE=DE.GE=D.< = chan>=E 1、增量和全量表割p,时效性不足。 2、r计和l护额外hChang+ S+4表。 3、计算引擎并非原g支UCDC。 4、不支U实时U13+24。 缺点 为何选择 #+ink Iceberg ? #2 Flink 原生支持 C C 数据消费 ebezium 1lHLI W生支持 ./. 数据消费 -- BPDaRDs a mysOl BCB RaAlD sMSPBD .R0,T0 T,-L0 mysOl_AHLlMF sLaNsGMR aLC AHLlMF CaRa EPMm mysOl, aLC -- CM sMmD RPaLsEMPmaRHML, aLC sGMU ML RGD BlHDLR ;0L0.T HC, =PP0R LamD), CDsBPHNRHML, UDHFGR 1RO6 mysOl_AHLlMF; FHRGSA.BMm/TDPTDPHBa/ElHLI-BCB-BMLLDBRMPs Flink 原生支持0 码力 | 36 页 | 781.69 KB | 1 年前3
共 17 条
- 1
- 2













