Streaming in Apache Flinkserializer/deserializer for it) • Flink has a built-in type system which supports: • basic types, i.e., String, Long, Integer, Boolean, Array • composite types: Tuples, POJOs, and Scala case classes • Kryo setter Tuple2<String, Integer> person = new Tuple2<>("Fred", 35); // zero based index! String name = person.f0; Integer age = person.f1; public class Person { public String name; public Integer age; public Person() {}; public Person(String name, Integer age) { … }; } Person person = new Person("Fred Flintstone", 35); Setup • https://training0 码力 | 45 页 | 3.00 MB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flink16 / 79 Input Operations - Custom Sources (2/3) class CustomReceiver(host: String, port: Int) extends Receiver[String](StorageLevel.MEMORY_AND_DISK_2) with Logging { def onStart() { new Thread("Socket will be joined with the RDD generated by stream2. val stream1: DStream[String, String] = ... val stream2: DStream[String, String] = ... val joinedStream = stream1.join(stream2) 27 / 79 Join Operation join(windowedStream2) 28 / 79 Join Operation (3/3) ▶ Stream-dataset joins val dataset: RDD[String, String] = ... val windowedStream = stream.window(Seconds(20))... val joinedStream = windowedStream0 码力 | 113 页 | 1.22 MB | 1 年前3
PyFlink 1.15 Documentation25 1.3.3.1 O1: InaccessibleObjectException: Unable to make field private final byte[] java.lang.String.value accessible: module java.base does not “opens java.lang” to unnamed module @4e4aea35 . . . table_env.from_elements([(1, 'Hi'), (2, 'Hello')]) table.get_schema() [3]: root |-- _1: BIGINT |-- _2: STRING Create a Table with an explicit schema. 1.1. Getting Started 13 pyflink-docs, Release release-1 DataTypes. ˓→TINYINT()), DataTypes.FIELD("data", DataTypes. ˓→STRING())])) table.get_schema() [4]: root |-- id: TINYINT |-- data: STRING Create a Table from a Pandas DataFrame [5]: import pandas as0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentation25 1.3.3.1 O1: InaccessibleObjectException: Unable to make field private final byte[] java.lang.String.value accessible: module java.base does not “opens java.lang” to unnamed module @4e4aea35 . . . table_env.from_elements([(1, 'Hi'), (2, 'Hello')]) table.get_schema() [3]: root |-- _1: BIGINT |-- _2: STRING Create a Table with an explicit schema. 1.1. Getting Started 13 pyflink-docs, Release release-1 DataTypes. ˓→TINYINT()), DataTypes.FIELD("data", DataTypes. ˓→STRING())])) table.get_schema() [4]: root |-- id: TINYINT |-- data: STRING Create a Table from a Pandas DataFrame [5]: import pandas as0 码力 | 36 页 | 266.80 KB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 20202 Vasiliki Kalavri | Boston University 2020 object MaxSensorReadings { def main(args: Array[String]) { val env = StreamExecutionEnvironment.getExecutionEnvironment val sensorData = env.addSource(new Configuring a time characteristic 4 object AverageSensorReadings { def main(args: Array[String]) { // set up the streaming execution environment val env = StreamExecutionEnvironment.g Window functions 14 Vasiliki Kalavri | Boston University 2020 val minTempPerWindow: DataStream[(String, Double)] = sensorData .map(r => (r.id, r.temperature)) .keyBy(_._1) .timeWindow(Time0 码力 | 35 页 | 444.84 KB | 1 年前3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri | Boston University 2020 case class Reading(id: String, time: Long, temp: Double) object MaxSensorReadings { def main(args: Array[String]) { val env = StreamExecutionEnvironment.getExecutionEnvironment Kalavri | Boston University 2020 case class Reading(id: String, time: Long, temp: Double) object MaxSensorReadings { def main(args: Array[String]) { val env = StreamExecutionEnvironment.getExecutionEnvironment Kalavri | Boston University 2020 case class Reading(id: String, time: Long, temp: Double) object MaxSensorReadings { def main(args: Array[String]) { val env = StreamExecutionEnvironment.getExecutionEnvironment0 码力 | 26 页 | 3.33 MB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020Flink program: val env = StreamExecutionEnvironment.getExecutionEnvironment val checkpointPath: String = ??? // configure path for checkpoints on the remote filesystem val backend = new RocksDBState KeyedStream[Reading, String] = sensorData .keyBy(_.id) // apply a stateful FlatMapFunction on the keyed stream val alerts: DataStream[(String, Double, Double)] = 2020 class TemperatureAlertFunction(val threshold: Double) extends RichFlatMapFunction[Reading, (String, Double, Double)] { // the state handle object private var lastTempState: ValueState[Double] = _0 码力 | 24 页 | 914.13 KB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 map(String key, String value): // key: document name // value: document contents for each URL u in value: EmitIntermediate(u, "1"); reduce(String key, Iterator values):0 码力 | 54 页 | 2.83 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri | Boston University 2020 case class Reading(id: String, time: Long, temp: Double) object MaxSensorReadings { def main(args: Array[String]) { val env = StreamExecutionEnvironment.getExecutionEnvironment0 码力 | 45 页 | 1.22 MB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020the maximum parallelism for this application env.setMaxParallelism(512) val alerts: DataStream[(String, Double, Double)] = keyedSensorData .flatMap(new TemperatureAlertFunction(1.1)) // set the maximum0 码力 | 41 页 | 4.09 MB | 1 年前3
共 10 条
- 1













