Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020resource availability: the host must have enough resources for all assigned operators • Ensure security constraints: what are the trusted hosts for each operator? • Ensure state migration: if placement micro-batches D-Streams • During an interval, input data received is stored using RDDs • A D-Stream is a group of such RDDs which can be processed using common operators 45 Example • pageViews is a D-Stream0 码力 | 54 页 | 2.83 MB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020: “SAQL: A Stream-based Query System for Real- Time Abnormal System Behavior Detection”, USENIX Security '18 12 Interested in a more research-oriented project? Let’s discuss it during office hours0 码力 | 34 页 | 2.53 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020Operator replicates a stream, commonly to be used as input to multiple downstream operators. • Group by / Partition Operators split a stream into sub-streams according to a function or the event contents Kalavri | Boston University 2020 CQL GroupBy Example Select IStream(Count(*)) From S1 [Rows 1000] Group By S1.B Count the number or events in the last 1000 rows for each value of B 20 Vasiliki Kalavri University 2020 Some queries expressed using aggregates are monotonic: SELECT DeptNo FROM empl GROUP BY DeptNo HAVING SUM(empl.Sal) > 10000 The introduction of a new empl can only expand the set0 码力 | 53 页 | 532.37 KB | 1 年前3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020Consumers label themselves with a consumer group name, and each record published to a topic is delivered to one consumer instance within each subscribing consumer group. Consumer instances can be in separate separate processes or on separate machines. If all the consumer instances have the same consumer group, then the records will effectively be load balanced over the consumer instances. If all the consumer0 码力 | 26 页 | 3.33 MB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020topic T can be viewed as becoming a member of a group T. • Publishing an event on topic T can be viewed as broadcasting the event to all members of group T. • Topic hierarchies allow topic organization0 码力 | 33 页 | 700.14 KB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020as ranges • On restore, reads are sequential within each key-group, and often across multiple key-groups • The metadata of key-group-to-subtask assignments are small. No need to maintain explicit0 码力 | 41 页 | 4.09 MB | 1 年前3
Scalable Stream Processing - Spark Streaming and FlinkKinesis, ... TwitterUtils.createStream(ssc, None) KafkaUtils.createStream(ssc, [ZK quorum], [consumer group id], [number of partitions]) 15 / 79 Input Operations - Custom Sources (1/3) ▶ To create a custom express it as the following SQL query. SELECT action, WINDOW(time, "1 hour"), COUNT * FROM events GROUP BY action, WINDOW(time, "1 hour") 61 / 79 Structured Streaming Example (3/3) val inputDF = spark0 码力 | 113 页 | 1.22 MB | 1 年前3
Apache Flink的过去、现在和未来---------------------------- Stream Mode: 12:01> SELECT Name, SUM(Score), MAX(Time) FROM USER_SCORES GROUP BY Name; Flink 在阿里的服务情况 集群规模 超万台 状态数据 PetaBytes 事件处理 十万亿/天 峰值能力 17亿/秒 Flink 的过去 offline Real-time0 码力 | 33 页 | 3.36 MB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020val sensorData: DataStream[SensorReading] = ... val avgTemp = sensorData .keyBy(_.id) // group readings in 1s event-time windows .window(TumblingEventTimeWindows.of(Time.seconds(1))) .process(new0 码力 | 35 页 | 444.84 KB | 1 年前3
监控Apache Flink应用程序(入门)partition in this window. An increasing value over time is your best indication that the consumer group is not keeping up with the producers. millisBehindLatest user applies to FlinkKinesisConsumer0 码力 | 23 页 | 148.62 KB | 1 年前3
共 11 条
- 1
- 2













