Build a lightweight logging and tracing tool with Apache Arrow, Parquet and DataFusion 朱霜# RUST CHINA CONF 2023 Build a lightweight logging and tracing tool with Apache Arrow, Parquet and DataFusion 朱霜 2023.06.18 6.17-6.18 @Shanghai ## Content 1. Introduction 2. Duo - Observability duet: Logging and Tracing • What is Duo? • How does it work? 3. Apache Arrow, Parquet and DataFusion • A brief introduction to Arrow, Parquet, and DataFusion • How does Duo store and query log, span data? 4. on_close(&self, _id: Id, _ctx: Context<'_, S>) { ... } } ## Apache Arrow, Parquet, and DataFusion ## APACHE ARROW Parquet DATA FUSION ## Apache Arrow • Created by Wes McKinney, creator of Pandas (2016)0 码力 | 26 页 | 11.05 MB | 2 年前3
pandas: powerful Python data analysis toolkit - 0.25statistical functions XLsxWriter 0.9.8 Excel writing blosc Compression for msgpack fastparquet 0.2.1 Parquet reading / writing gcsfs 0.2.2 Google Cloud Storage access html5lib HTML parser for read_html (see pandas-gbq 0.8.0 Google Big Query access psycopg2 PostgreSQL engine for sqlalchemy pyarrow 0.9.0 Parquet and feather reading / writing pymysql 0.7.11 MySQL engine for sqlalchemy pyreadstat SPSS files ( text/csv and Stata files, pandas supports a variety of other data formats such as Excel, SAS, HDF5, Parquet, and SQL databases. These are all read via a pd.read_* function. See the IO documentation for more0 码力 | 698 页 | 4.91 MB | 2 年前3
Apache Kyuubi 1.4.1 Documentation92a98e2a9/p26_1.jpg) CREATE TABLE spark_catalog.default'.SRC(KEY INT, VALUE STRING) USING PARQUET; INSERT INTO TABLE spark_catalog.default'.SRC VALUES (11215016, 'Kent Yao'); : for big file(1G) 2. 10 billion data and 1 thousand files (parquet files): for medium file(200m) 3. 1 billion data and 10 thousand files (parquet files): for smaller0 码力 | 172 页 | 6.94 MB | 2 年前3
Apache Kyuubi 1.5.1 Documentationb04c6254f16939/p39_2.jpg) CREATE TABLE spark_catalog.default.SRC(KEY INT, VALUE STRING) USING PARQUET; INSERT INTO TABLE spark_catalog.default.SRC VALUES (11215016, 'Kent Yao'); : for big file(1G) 2. 10 billion data and 1 thousand files (parquet files): for medium file(200m) 3. 1 billion data and 10 thousand files (parquet files): for smaller0 码力 | 172 页 | 6.94 MB | 2 年前3
Apache Kyuubi 1.5.2 Documentationf02f15771ded9d/p39_2.jpg) CREATE TABLE spark_catalog.default.SRC(KEY INT, VALUE STRING) USING PARQUET; INSERT INTO TABLE spark_catalog.default.SRC VALUES (11215016, 'Kent Yao'); : for big file(1G) 2. 10 billion data and 1 thousand files (parquet files): for medium file(200m) 3. 1 billion data and 10 thousand files (parquet files): for smaller0 码力 | 172 页 | 6.94 MB | 2 年前3
Apache Kyuubi 1.4.1 Documentation4c9443f1999408/p36_2.jpg) CREATE TABLE spark_catalog.default.SRC(KEY INT, VALUE STRING) USING PARQUET; INSERT INTO TABLE spark_catalog.default.SRC VALUES (11215016, 'Kent Yao'); : for big file(1G) 2. 10 billion data and 1 thousand files (parquet files): for medium file(200m) 3. 1 billion data and 10 thousand files (parquet files): for smaller dst_ip_string, dst_port int) stored as parquet") spark.sql("create table $connOrderbyOnlyIp (src_ip string, src_port int, dst_ip_string, dst_port int) stored as parquet") spark.sql("create table0 码力 | 148 页 | 6.26 MB | 2 年前3
Apache Kyuubi 1.4.0 Documentation97add51771a342/p36_2.jpg) CREATE TABLE spark_catalog.default.SRC(KEY INT, VALUE STRING) USING PARQUET; INSERT INTO TABLE spark_catalog.default.SRC VALUES (11215016, 'Kent Yao'); : for big file(1G) 2. 10 billion data and 1 thousand files (parquet files): for medium file(200m) 3. 1 billion data and 10 thousand files (parquet files): for smaller dst_ip_string, dst_port int) stored as parquet") spark.sql("create table $connOrderbyOnlyIp (src_ip string, src_port int, dst_ip_string, dst_port int) stored as parquet") spark.sql("create table0 码力 | 148 页 | 6.26 MB | 2 年前3
共 142 条
- 1
- 2
- 3
- 4
- 5
- 6
- 15
相关搜索词
Apache ArrowParquetDataFusion日志和跟踪工具轻量级pandasDataFrameSeries数据结构时间序列Apache Kyuubi多租户高可用性/负载均衡Hive Beeline数据湖/湖 houseKyuubiMonitoringLogging SystemConfigurationPerformance OptimizationBuildingDeveloper ToolsMulti TenancyHigh AvailabilityApache ZookeeperSpark SQLLoggingApache SparkKyuubi ServerApache ZooKeeperJDBC高可用性













