Parquet - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

Build a lightweight logging and tracing tool with Apache Arrow, Parquet and DataFusion 朱霜

# RUST CHINA CONF 2023 Build a lightweight logging and tracing tool with Apache Arrow, Parquet and DataFusion 朱霜 2023.06.18 6.17-6.18 @Shanghai ## Content 1. Introduction 2. Duo - Observability duet: Logging and Tracing • What is Duo? • How does it work? 3. Apache Arrow, Parquet and DataFusion • A brief introduction to Arrow, Parquet, and DataFusion • How does Duo store and query log, span data? 4. on_close(&self, _id: Id, _ctx: Context<'_, S>) { ... } } ## Apache Arrow, Parquet, and DataFusion ## APACHE ARROW Parquet DATA FUSION ## Apache Arrow • Created by Wes McKinney, creator of Pandas (2016)

0 码力 | 26 页 | 11.05 MB | 2 年前
3
pandas: powerful Python data analysis toolkit - 0.25

statistical functions XLsxWriter 0.9.8 Excel writing blosc Compression for msgpack fastparquet 0.2.1 Parquet reading / writing gcsfs 0.2.2 Google Cloud Storage access html5lib HTML parser for read_html (see pandas-gbq 0.8.0 Google Big Query access psycopg2 PostgreSQL engine for sqlalchemy pyarrow 0.9.0 Parquet and feather reading / writing pymysql 0.7.11 MySQL engine for sqlalchemy pyreadstat SPSS files ( text/csv and Stata files, pandas supports a variety of other data formats such as Excel, SAS, HDF5, Parquet, and SQL databases. These are all read via a pd.read_* function. See the IO documentation for more

0 码力 | 698 页 | 4.91 MB | 2 年前
3
Apache Kyuubi 1.4.1 Documentation

92a98e2a9/p26_1.jpg) CREATE TABLE spark_catalog.default'.SRC(KEY INT, VALUE STRING) USING PARQUET; INSERT INTO TABLE spark_catalog.default'.SRC VALUES (11215016, 'Kent Yao'); ![Imag and 2 hundred files (parquet files): for big file(1G) 2. 10 billion data and 1 thousand files (parquet files): for medium file(200m) 3. 1 billion data and 10 thousand files (parquet files): for smaller string, dst_port int) stored as parquet") spark.sql(s"create table $connOrderbyOnlyIp (src_ip string, src_port int, dst_ip string, dst_port int) stored as parquet") spark.sql(s"create table

0 码力 | 233 页 | 4.62 MB | 2 年前
3
Apache Kyuubi 1.4.0 Documentation

584932cc7/p26_1.jpg) CREATE TABLE spark_catalog.default'.SRC(KEY INT, VALUE STRING) USING PARQUET; INSERT INTO TABLE spark_catalog.default'.SRC VALUES (11215016, 'Kent Yao'); ![Imag and 2 hundred files (parquet files): for big file(1G) 2. 10 billion data and 1 thousand files (parquet files): for medium file(200m) 3. 1 billion data and 10 thousand files (parquet files): for smaller string, dst_port int) stored as parquet") spark.sql(s"create table $connOrderbyOnlyIp (src_ip string, src_port int, dst_ip string, dst_port int) stored as parquet") spark.sql(s"create table

0 码力 | 233 页 | 4.62 MB | 2 年前
3
Apache Kyuubi 1.5.1 Documentation

improved • Downstream Improve the downstream read performance benefit from data skipping. Since the parquet and orc file support collect data statistic automatically when you write data e.g. minimum and maximum filter more efficient ##### 3.1.1. Supported table format |Table Format|Supported| |---|---| |parquet|Y| |orc|Y| |json|N| |csv|N| |text|N| ##### 3.1.2. Supported column data type |Column Data Type|Supported| and 2 hundred files (parquet files): for big file(1G) 2. 10 billion data and 1 thousand files (parquet files): for medium file(200m) 3. 1 billion data and 10 thousand files (parquet files): for smaller

0 码力 | 267 页 | 5.80 MB | 2 年前
3
Apache Kyuubi 1.5.2 Documentation

f02f15771ded9d/p39_2.jpg) CREATE TABLE spark_catalog.default.SRC(KEY INT, VALUE STRING) USING PARQUET; INSERT INTO TABLE spark_catalog.default.SRC VALUES (11215016, 'Kent Yao'); ![Image](/u ## • Downstream Improve the downstream read performance benefit from data skipping. Since the parquet and orc file support collect data statistic automatically when you write data e.g. minimum and maximum and 2 hundred files (parquet files): for big file(1G) 2. 10 billion data and 1 thousand files (parquet files): for medium file(200m) 3. 1 billion data and 10 thousand files (parquet files): for smaller

0 码力 | 172 页 | 6.94 MB | 2 年前
3
Apache Kyuubi 1.5.1 Documentation

b04c6254f16939/p39_2.jpg) CREATE TABLE spark_catalog.default.SRC(KEY INT, VALUE STRING) USING PARQUET; INSERT INTO TABLE spark_catalog.default.SRC VALUES (11215016, 'Kent Yao'); ![Image](/u ## • Downstream Improve the downstream read performance benefit from data skipping. Since the parquet and orc file support collect data statistic automatically when you write data e.g. minimum and maximum and 2 hundred files (parquet files): for big file(1G) 2. 10 billion data and 1 thousand files (parquet files): for medium file(200m) 3. 1 billion data and 10 thousand files (parquet files): for smaller

0 码力 | 172 页 | 6.94 MB | 2 年前
3
Apache Kyuubi 1.5.0 Documentation

be05d20df2c114/p39_2.jpg) CREATE TABLE spark_catalog.default.SRC(KEY INT, VALUE STRING) USING PARQUET; INSERT INTO TABLE spark_catalog.default.SRC VALUES (11215016, 'Kent Yao'); ![Image](/u ## • Downstream Improve the downstream read performance benefit from data skipping. Since the parquet and orc file support collect data statistic automatically when you write data e.g. minimum and maximum and 2 hundred files (parquet files): for big file(1G) 2. 10 billion data and 1 thousand files (parquet files): for medium file(200m) 3. 1 billion data and 10 thousand files (parquet files): for smaller

0 码力 | 172 页 | 6.94 MB | 2 年前
3
Apache Kyuubi 1.4.0 Documentation

97add51771a342/p36_2.jpg) CREATE TABLE spark_catalog.default.SRC(KEY INT, VALUE STRING) USING PARQUET; INSERT INTO TABLE spark_catalog.default.SRC VALUES (11215016, 'Kent Yao'); ![Image](/u and 2 hundred files (parquet files): for big file(1G) 2. 10 billion data and 1 thousand files (parquet files): for medium file(200m) 3. 1 billion data and 10 thousand files (parquet files): for smaller dst_ip_string, dst_port int) stored as parquet") spark.sql("create table $connOrderbyOnlyIp (src_ip string, src_port int, dst_ip_string, dst_port int) stored as parquet") spark.sql("create table

0 码力 | 148 页 | 6.26 MB | 2 年前
3
Apache Kyuubi 1.4.1 Documentation

4c9443f1999408/p36_2.jpg) CREATE TABLE spark_catalog.default.SRC(KEY INT, VALUE STRING) USING PARQUET; INSERT INTO TABLE spark_catalog.default.SRC VALUES (11215016, 'Kent Yao'); ![Image](/u and 2 hundred files (parquet files): for big file(1G) 2. 10 billion data and 1 thousand files (parquet files): for medium file(200m) 3. 1 billion data and 10 thousand files (parquet files): for smaller dst_ip_string, dst_port int) stored as parquet") spark.sql("create table $connOrderbyOnlyIp (src_ip string, src_port int, dst_ip_string, dst_port int) stored as parquet") spark.sql("create table

0 码力 | 148 页 | 6.26 MB | 2 年前
3

共 132 条前往

页

分类

语言

格式

Build a lightweight logging and tracing tool with Apache Arrow, Parquet and DataFusion 朱霜

pandas: powerful Python data analysis toolkit - 0.25

Apache Kyuubi 1.4.1 Documentation

Apache Kyuubi 1.4.0 Documentation

Apache Kyuubi 1.5.1 Documentation

Apache Kyuubi 1.5.2 Documentation

Apache Kyuubi 1.5.1 Documentation

Apache Kyuubi 1.5.0 Documentation

Apache Kyuubi 1.4.0 Documentation

Apache Kyuubi 1.4.1 Documentation

搜索

分类

语言

格式