Google map plugin - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Operator selectivity 6 • The number of output elements produced per number of input elements • a map operator has a selectivity of 1, i.e. it produces one output element for each input element it processes merge X merge A A X merge A1 merge A2 A2 A1 X X ??? Vasiliki Kalavri | Boston University 2020 map(String key, String value): // key: document name // value: document contents for each URL → list(v2) (k1, v1) → list(k2, v2) map() reduce() 25 ??? Vasiliki Kalavri | Boston University 2020 MapReduce combiners example: URL access frequency 26 map() reduce() GET /dumprequest HTTP/1

0 码力 | 54 页 | 2.83 MB | 1 年前
3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020

NiagaraCQ Aurora TelegraphCQ STREAM Naiad Spark Streaming Samza Flink Millwheel Storm S4 Google Dataflow Now Evolution of Stream Processing 35 Vasiliki Kalavri | Boston University 2020 Distributed associated keys. 38 Distributed dataflow model Vasiliki Kalavri | Boston University 2020 topK map print source w1 w2 w3 w6 w4 w5 w8 w7 Twitter source Extract hashtags Count topics Trends getExecutionEnvironment  val sensorData = env.addSource(new SensorSource)  val maxTemp = sensorData  .map(r => Reading(r.id,r.time,(r.temp-32)*(5.0/9.0)))  .keyBy(_.id)  .max("temp")  maxTemp

0 码力 | 45 页 | 1.22 MB | 1 年前
3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020

flink.apache.org Apache Kafka: kafka.apache.org Apache Beam: beam.apache.org Google Cloud Platform: cloud.google.com 5 Vasiliki Kalavri | Boston University 2020 Outcomes At the end of the course machines) Google cluster • https://github.com/google/cluster-data/blob/master/ ClusterData2011_2.md Make sure to read and become familiar with the format and schema document: • https://drive.google.com Eclipse, or Netbeans with appropriate plugins installed. • gsutil for accessing datasets in Google Cloud Storage. More details: vasia.github.io/dspa20/exercises.html 14 Vasiliki Kalavri | Boston

0 码力 | 34 页 | 2.53 MB | 1 年前
3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020

stream processing • Recovery semantics and guarantees • Exactly-once processing in Apache Beam / Google Cloud Dataflow 2 Vasiliki Kalavri | Boston University 2020 Logic State <#Brexit, 520> fail at the same time? 18 Vasiliki Kalavri | Boston University 2020 Exactly-once in Google Cloud Dataflow Google Dataflow uses RPC for data communication • the sender will retry RPCs until it receives stored in a scalable key/value store. Vasiliki Kalavri | Boston University 2020 Exactly-once in Google Cloud Dataflow Checkpointing to address non-determinism • Each output is checkpointed together

0 码力 | 49 页 | 2.08 MB | 1 年前
3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020

company == ‘Uber’ and price < 100 • Predecessors of Complex Event Processing (CEP) systems 22 Google Cloud Pub/Sub Publishers and Subscribers are applications. 23 Use-cases • Balancing workloads workloads in network clusters • tasks can be efficiently distributed among multiple workers, such as Google Compute Engine instances. • Distributing event notifications • a service that accepts user signups invalidation events to update the IDs of objects that have changed. • Logging to multiple systems • a Google Compute Engine instance can write logs to the monitoring system, to a database for later querying

0 码力 | 33 页 | 700.14 KB | 1 年前
3
Scalable Stream Processing - Spark Streaming and Flink

20 / 79 Transformations (2/4) ▶ map • Returns a new DStream by passing each element of the source DStream through a given function. ▶ flatMap • Similar to map, but each input item can be mapped to 21 / 79 Transformations (2/4) ▶ map • Returns a new DStream by passing each element of the source DStream through a given function. ▶ flatMap • Similar to map, but each input item can be mapped to 21 / 79 Transformations (2/4) ▶ map • Returns a new DStream by passing each element of the source DStream through a given function. ▶ flatMap • Similar to map, but each input item can be mapped to

0 码力 | 113 页 | 1.22 MB | 1 年前
3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Kalavri | Boston University 2020 Streaming word count textStream .flatMap {_.split("\\W+")} .map {(_, 1)} .keyBy(0) .sum(1) .print() “live and let live” “live” “and” “let” “live” (live nt  val sensorData = env.addSource(new SensorSource)  val maxTemp = sensorData  .map(r => Reading(r.id,r.time,(r.temp-32)*(5.0/9.0)))  .keyBy(_.id)  .max("temp")  maxTemp nt  val sensorData = env.addSource(new SensorSource)  val maxTemp = sensorData  .map(r => Reading(r.id,r.time,(r.temp-32)*(5.0/9.0)))  .keyBy(_.id)  .max("temp")  maxTemp

0 码力 | 26 页 | 3.33 MB | 1 年前
3
PyFlink 1.15 Documentation

FIELD("data", DataTypes.STRING())])) def func(data: Row): return Row(data.id, data.data * 2) table.map(func).execute().print() +----+----------------------+--------------------------------+ | op | id | class MyFlatMapFunction(FlatMapFunction): def flat_map(self, value): for s in str(value.data).split('|'): yield Row(value.id, s) list(ds.flat_map(MyFlatMapFunction(), output_type=Types.ROW([Types.INT() part_size=1024 ** 3, rollover_interval=15 * 60 * 1000, inactivity_interval=5 *␣ ˓→60 * 1000)) .build()) ds.map(lambda i: (i[0] + 1, i[1]), Types.TUPLE([Types.INT(), Types.STRING()])).sink_ ˓→to(sink) # the result

0 码力 | 36 页 | 266.77 KB | 1 年前
3
PyFlink 1.16 Documentation

FIELD("data", DataTypes.STRING())])) def func(data: Row): return Row(data.id, data.data * 2) table.map(func).execute().print() +----+----------------------+--------------------------------+ | op | id | class MyFlatMapFunction(FlatMapFunction): def flat_map(self, value): for s in str(value.data).split('|'): yield Row(value.id, s) list(ds.flat_map(MyFlatMapFunction(), output_type=Types.ROW([Types.INT() part_size=1024 ** 3, rollover_interval=15 * 60 * 1000, inactivity_interval=5 *␣ ˓→60 * 1000)) .build()) ds.map(lambda i: (i[0] + 1, i[1]), Types.TUPLE([Types.INT(), Types.STRING()])).sink_ ˓→to(sink) # the result

0 码力 | 36 页 | 266.80 KB | 1 年前
3
Streaming in Apache Flink

} } Map Function DataStream rides = env.addSource(new TaxiRideSource(...)); DataStream enrichedNYCRides = rides .filter(new RideCleansing.NYCFilter()) .map(new Enrichment()); class Enrichment implements MapFunction { @Override public EnrichedRide map(TaxiRide taxiRide) throws Exception { return new EnrichedRide(taxiRide); } } FlatMap Function DataStream> input = … DataStream> smoothed = input.keyBy(0).map(new Smoother()); public static class Smoother extends RichMapFunction, Tuple2
0 码力 | 45 页 | 3.00 MB | 1 年前
3

共 14 条前往

页

分类

语言

格式

Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020

High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Scalable Stream Processing - Spark Streaming and Flink

Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020

PyFlink 1.15 Documentation

PyFlink 1.16 Documentation

Streaming in Apache Flink