PyFlink 1.15 Documentation-Djobmanager.memory.process.size=1024m \ -Dtaskmanager.memory.process.size=1024m \ -Dyarn.application.id=\ -Dyarn.ship-files=/path/to/shipfiles \ -pyarch shipfiles/venv.zip \ -pyclientexec /bin/flink run-application \ --target kubernetes-application \ --parallelism 8 \ -Dkubernetes.cluster-id= \ -Dtaskmanager.memory.process.size=4096m \ -Dkubernetes.taskmanager.cpu=2 \ -Dtaskmanager -Dkubernetes.cluster-id=my-first-flink-cluster Then you could submit PyFlink jobs to the session cluster as following: ./bin/flink run \ --target kubernetes-session \ -Dkubernetes.cluster-id=my-first-flink-cluster 0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentation-Djobmanager.memory.process.size=1024m \ -Dtaskmanager.memory.process.size=1024m \ -Dyarn.application.id=\ -Dyarn.ship-files=/path/to/shipfiles \ -pyarch shipfiles/venv.zip \ -pyclientexec /bin/flink run-application \ --target kubernetes-application \ --parallelism 8 \ -Dkubernetes.cluster-id= \ -Dtaskmanager.memory.process.size=4096m \ -Dkubernetes.taskmanager.cpu=2 \ -Dtaskmanager -Dkubernetes.cluster-id=my-first-flink-cluster Then you could submit PyFlink jobs to the session cluster as following: ./bin/flink run \ --target kubernetes-session \ -Dkubernetes.cluster-id=my-first-flink-cluster 0 码力 | 36 页 | 266.80 KB | 1 年前3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 DataStream API Basics Vasiliki Kalavri | Boston University 2020 case class Reading(id: String, time: Long, temp: Double) object MaxSensorReadings { def main(args: Array[String]) SensorSource) val maxTemp = sensorData .map(r => Reading(r.id,r.time,(r.temp-32)*(5.0/9.0))) .keyBy(_.id) .max("temp") maxTemp.print() env.execute("Compute max } } Example: Sensor Readings 7 Vasiliki Kalavri | Boston University 2020 case class Reading(id: String, time: Long, temp: Double) object MaxSensorReadings { def main(args: Array[String])0 码力 | 26 页 | 3.33 MB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020SensorSource) val maxTemp = sensorData .map(r => Reading(r.id,r.time,(r.temp-32)*(5.0/9.0))) .keyBy(_.id) .timeWindow(Time.minutes(1)) .max("temp") } } 3 Example: keyBy(_.id) // group readings in 1s event-time windows .window(TumblingEventTimeWindows.of(Time.seconds(1))) .process(new TemperatureAverager) val avgTemp = sensorData .keyBy(_.id) // DataStream[SensorReading] = ... // event-time sliding windows assigner val slidingAvgTemp = sensorData .keyBy(_.id) // create 1h event-time windows every 15 minutes .window(SlidingEventTimeWindows.of(Time.hours(1)0 码力 | 35 页 | 444.84 KB | 1 年前3
Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020component ID per vertex • initially equal to vertex ID • Iterative step: For each vertex • choose the min of neighbors’ component IDs and own component ID as the new ID • if the component ID changed if seen for the 1st time, create a component with ID the min of the vertex IDs • if in different components, merge them and update the component ID to the min of the component IDs • if only one of University 2020 Distributed Stream Connected Components 36 1. partition the edge stream, e.g. by source Id 2. maintain a disjoint set in each partition 3. periodically merge the partial disjoint sets into0 码力 | 72 页 | 7.77 MB | 1 年前3
Streaming in Apache FlinkEvents rideId Long a unique id for each ride taxiId Long a unique id for each taxi driverId Long a unique id for each driver isStart Boolean Events rideId Long a unique id for each ride taxiId Long a unique id for each taxi driverId Long a unique id for each driver startTime DateTime0 码力 | 45 页 | 3.00 MB | 1 年前3
Scalable Stream Processing - Spark Streaming and FlinkSpark Streaming and Flink Amir H. Payberah payberah@kth.se 05/10/2018 The Course Web Page https://id2221kth.github.io 1 / 79 Where Are We? 2 / 79 Stream Processing Systems Design Issues ▶ Continuous . TwitterUtils.createStream(ssc, None) KafkaUtils.createStream(ssc, [ZK quorum], [consumer group id], [number of partitions]) 15 / 79 Input Operations - Custom Sources (1/3) ▶ To create a custom source: in place, such as a MySQL table. 59 / 79 Structured Streaming Example (1/3) ▶ Assume we receive (id, time, action) events from a mobile app. ▶ We want to count how many actions of each type happened0 码力 | 113 页 | 1.22 MB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020// partition and key the stream on the sensor ID val keyedData: KeyedStream[Reading, String] = sensorData .keyBy(_.id) // apply a stateful FlatMapFunction on the keyed if (tempDiff > threshold) { // temperature changed by more than the threshold out.collect((reading.id, reading.temperature, tempDiff)) } // update lastTemp state this.lastTempState.update(reading.temperature) state in Flink 18 3. get state value 4. update state This is the state of the current key (sensor id) Vasiliki Kalavri | Boston University 2020 Use keyed state to store and access state in the context0 码力 | 24 页 | 914.13 KB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020window fires, post becomes inactive 41 Vasiliki Kalavri | Boston University 2020 case class Reading(id: String, time: Long, temp: Double) object MaxSensorReadings { def main(args: Array[String]) { addSource(new SensorSource) val maxTemp = sensorData .map(r => Reading(r.id,r.time,(r.temp-32)*(5.0/9.0))) .keyBy(_.id) .max("temp") maxTemp.print() env.execute("Compute max sensor0 码力 | 45 页 | 1.22 MB | 1 年前3
Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020numeric ids, starting from 1. • e.g., if ε=0.2, w=5 (5 items per window) • wcur: the current window id • We keep a list D of element frequencies and their maximum associated error. • Once a window | Boston University 2020 Lossy counting algorithm D = {} // empty list wcur = 1 // first window id N = 0 // elements seen so far Insert step For each element x in wcur: if x ∈ D, increase its0 码力 | 31 页 | 1.47 MB | 1 年前3
共 16 条
- 1
- 2













