Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020Indirectly • Producer writes to a file or database • Consumer periodically polls and retrieves new data • polling overhead, latency? • Consumer receives a notification when new data is available • Direct messaging • Direct network communication, UDP multicast, TCP • HTTP or RPC if the consumer exposes a service on the network • Failure handling: application needs to be aware of message loss message is processed only once, by a single consumer • Event retrieval is not defined by content / structure but its order • FIFO, priority producer consumer queue 6 Message brokers Message broker:0 码力 | 33 页 | 700.14 KB | 1 年前3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020key, a value, and a timestamp. A producer publishes a stream of records to a Kafka topic and a consumer subscribes to one or more topics and processes the stream of records published in them. Topics label themselves with a consumer group name, and each record published to a topic is delivered to one consumer instance within each subscribing consumer group. Consumer instances can be in separate all the consumer instances have the same consumer group, then the records will effectively be load balanced over the consumer instances. If all the consumer instances have different consumer groups0 码力 | 26 页 | 3.33 MB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020back-pressure has the effect that all operators slow down to match the processing speed of the slowest consumer. • If the bottleneck operator is far down the dataflow graph, back-pressure propagates to upstream exchange: If both producer and consumer run on the same node the buffer is recycled as soon as it is consumed. • The producer slows down according to the rate the consumer recycles buffers. Remote the buffer can be recycled as soon as it is on the TCP channel. • If there is no buffer on the consumer side, reading from the TCP connection is interrupted. • The producer uses a threshold to control0 码力 | 43 页 | 2.42 MB | 1 年前3
监控Apache Flink应用程序(入门)best indication that the consumer group is not keeping up with the producers. millisBehindLatest user applies to FlinkKinesisConsumer The number of milliseconds a consumer is behind the head of of the stream. For any consumer and Kinesis shard, this indicates how far it is behind the current time. 4.11 可能的报警条件 • records-lag-max > threshold • millisBehindLatest > threshold 4.12 Monitoring0 码力 | 23 页 | 148.62 KB | 1 年前3
Scalable Stream Processing - Spark Streaming and FlinkKinesis, ... TwitterUtils.createStream(ssc, None) KafkaUtils.createStream(ssc, [ZK quorum], [consumer group id], [number of partitions]) 15 / 79 Input Operations - Custom Sources (1/3) ▶ To create0 码力 | 113 页 | 1.22 MB | 1 年前3
共 5 条
- 1













