PyFlink 1.15 Documentationword_count.py python3 word_count.py # You will see outputs as following: # Use --input to specify file input. # Printing result to stdout. Use --output to specify output path. # +I[To, 1] # +I[be,, 1] in the log file to see if there are any problems: # Get the installation directory of PyFlink python3 -c "import pyflink;import os;print(os.path.dirname(os.path.abspath(pyflink.__ ˓→file__)))" # It will logging under the log directory ls -lh /path/to/python/site-packages/pyflink/log # You will see the log file as following: (continues on next page) 1.1. Getting Started 5 pyflink-docs, Release release-1.150 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentationword_count.py python3 word_count.py # You will see outputs as following: # Use --input to specify file input. # Printing result to stdout. Use --output to specify output path. # +I[To, 1] # +I[be,, 1] in the log file to see if there are any problems: # Get the installation directory of PyFlink python3 -c "import pyflink;import os;print(os.path.dirname(os.path.abspath(pyflink.__ ˓→file__)))" # It will logging under the log directory ls -lh /path/to/python/site-packages/pyflink/log # You will see the log file as following: (continues on next page) 1.1. Getting Started 5 pyflink-docs, Release release-1.160 码力 | 36 页 | 266.80 KB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020& reconfiguration ??? Vasiliki Kalavri | Boston University 2020 • To recover from failures, the system needs to • restart failed processes • restart the application and recover its state 2 Checkpointing writes the JobGraph and all required metadata, such as the application’s JAR file, into a remote persistent storage system • Zookeeper also holds state handles and checkpoint locations 5 JobManager following steps: 1. It requests the storage locations from ZooKeeper to fetch the JobGraph, the JAR file, and the state handles of the last checkpoint from remote storage. 2. It requests processing slots0 码力 | 41 页 | 4.09 MB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020types • The system is unaware of which parts of an operator constitute state Streaming state 3 • Explicit state primitives including state types and interfaces • The system is aware of state persistent storage, e.g. a distributed filesystem or a database system • Available state backends in Flink: • In-memory • File system • RocksDB State backends 7 Vasiliki Kalavri | Boston University purposes! FsStateBackend • Stores state on TaskManager’s heap but checkpoints it to a remote file system • In-memory speed for local accesses and fault tolerance • Limited to TaskManager’s memory and0 码力 | 24 页 | 914.13 KB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020Events are commonly grouped into the same topic • in a similar way batch data belonging to the same file are grouped together • topics are commonly events of the same type: userCreated, userLoggedIn userLoggedOut, userSentPayment 4 Connecting producers to consumers • Indirectly • Producer writes to a file or database • Consumer periodically polls and retrieves new data • polling overhead, latency? broker: a system that connects event producers with event consumers. • It receives messages from the producers and pushes them to the consumers. • A TCP connection is a simple messaging system which0 码力 | 33 页 | 700.14 KB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flinkcategories of streaming sources: 1. Basic sources directly available in the StreamingContext API, e.g., file systems, socket connections. 2. Advanced sources, e.g., Kafka, Flume, Kinesis, Twitter. 3. Custom categories of streaming sources: 1. Basic sources directly available in the StreamingContext API, e.g., file systems, socket connections. 2. Advanced sources, e.g., Kafka, Flume, Kinesis, Twitter. 3. Custom from text data received over a TCP socket connection. ssc.socketTextStream("localhost", 9999) ▶ File stream • Reads data from files. streamingContext.fileStream[KeyClass, ValueClass, InputFormatClass](dataDirectory)0 码力 | 113 页 | 1.22 MB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020latency constraints that can tolerate approximate results. Slow down the flow of data: • The system buffers excess data for later processing, once input rates stabilize. • Requires a persistent process of discarding data when input rates increase beyond system capacity. • Load shedding techniques operate in a dynamic fashion: the system detects an overload situation during runtime and selectively lower quality. 4 ??? Vasiliki Kalavri | Boston University 2020 https://commons.wikimedia.org/wiki/File:Adaptive_streaming_overview_daseddon_2011_07_28.png 5 ??? Vasiliki Kalavri | Boston University0 码力 | 43 页 | 2.42 MB | 1 年前3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020./bin/flink run ./examples/batch/WordCount.jar \ --input file:///home/user/hamlet.txt --output file:///home/user/wordcount_out Run with a class entry point and arguments: ./bin/flink ./examples/batch/WordCount.jar \ --input file:///home/user/hamlet.txt --output file:///home/user/wordcount_out 19 Flink commands Vasiliki Kalavri | Boston University Vasiliki Kalavri | Boston University 2020 A distributed and fault-tolerant publish-subscribe messaging system and serves as the ingestion, storage, and messaging layer for large production streaming pipelines0 码力 | 26 页 | 3.33 MB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020alerts for abnormal system metrics • Detect invariant violations • Identify outlier tasks Inspired by this paper : “SAQL: A Stream-based Query System for Real- Time Abnormal System Behavior Detection” sure to read and become familiar with the format and schema document: • https://drive.google.com/file/d/0B5g07T_gRDg9Z0lsSTEtTWtpOW8/view Download and play around with “part-00000-of-00500.csv” of:0 码力 | 34 页 | 2.53 MB | 1 年前3
Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri | Boston University 2020 18 Detect DNS DDoS attacks • Flooding the resources of the targeted system by sending a large number of query from a botnet • Group queries by their top-level domain and analysis of a near-optimal cardinality estimation algorithm. 2007. https://hal.archives-ouvertes.fr/file/index/docid/406166/ filename/FlFuGaMe07.pdf • Cormode, Graham, and Shan Muthukrishnan. An improved0 码力 | 69 页 | 630.01 KB | 1 年前3
共 20 条
- 1
- 2













