ClickHouse on Kubernetes
Background ● Premier provider of software and services for ClickHouse ● Incorporated in UK with distributed team in US/Canada/Europe ● US/Europe sponsor of ClickHouse community ● Offerings: ○ 24x7 support Linux” Actually it’s an open-source platform to: ● manage container-based systems ● build distributed applications declaratively ● allocate machine resources efficiently ● automate application ClickHouse on Kubernetes? 1. Provisioning 2. Persistence 3. Networking 4. Transparency kube-system namespace The ClickHouse operator turns complex data warehouse configuration into a single easy-to-manage0 码力 | 29 页 | 3.87 MB | 1 年前3Что нужно знать об архитектуре ClickHouse, чтобы его эффективно использовать
мешают друг другу… ClickHouse: Шардирование + Distributed таблицы! Когда одного сервера не хватает Чтение из Distributed таблицы Чтение из Distributed таблицы CSV 227 Gb, ~1.3 млрд строк SELECT passenger_count Шардов 1 3 140 Время, с. 1,224 0,438 0,043 Ускорени е x2.8 x28.5 Запись в Distributed таблицу Запись в Distributed таблицу › Хочется защититься от аппаратного сбоя… › Данные должны быть доступны0 码力 | 28 页 | 506.94 KB | 1 年前3ClickHouse on Kubernetes
Background ● Premier provider of software and services for ClickHouse ● Incorporated in UK with distributed team in US/Canada/Europe ● US/Europe sponsor of ClickHouse community ● Offerings: ○ 24x7 support Linux” Actually it’s an open-source platform to: ● manage container-based systems ● build distributed applications declaratively ● allocate machine resources efficiently ● automate application easy-to-manage resource ClickHouse Operator ClickHouseInstallation YAML file (Apache 2.0 source, distributed as Docker image) ClickHouse cluster resources kubectl apply create resources What0 码力 | 34 页 | 5.06 MB | 1 年前32. 腾讯 clickhouse实践 _2019丁晓坤&熊峰
高内存,廉价存储: 单机配置: Memory128G CPU核数24 SATA20T,RAID5 万兆网卡 一切以用户价值为依归 5 部署与监控管理 1 生产环境部署方案: Distributed Table Replica1Replica1 Replica1Replica1 Replica1Replica1 Shard01 Shard02 Shard03 Load Balancing0 码力 | 26 页 | 3.58 MB | 1 年前31. Machine Learning with ClickHouse
BY sipHash64(pickup_datetime) -- expression for sampling SAMPLE BY expression must be evenly distributed! 12 / 62 How to sample data SAMPLE x OFFSET y SELECT count() FROM trips_sample_time 4329923210 码力 | 64 页 | 1.38 MB | 1 年前30. Machine Learning with ClickHouse
BY sipHash64(pickup_datetime) -- expression for sampling SAMPLE BY expression must be evenly distributed! 12 / 62 How to sample data SAMPLE x OFFSET y SELECT count() FROM trips_sample_time 4329923210 码力 | 64 页 | 1.38 MB | 1 年前3ClickHouse in Production
https://github.com/donnemartin/system-design-primer 3 / 97 Highload Architecture › Webserver (Apache, Nginx) › Cache (Memcached) https://github.com/donnemartin/system-design-primer 4 / 97 Highload Architecture Cache (Memcached) › Message Broker (Kafka, Amazon SQS) › Coordination system (Zookeeper, etcd) https://github.com/donnemartin/system-design-primer 5 / 97 Highload Architecture › Webserver (Apache, Nginx) Broker (Kafka, Amazon SQS) › Coordination system (Zookeeper, etcd) › MapReduce (Hadoop, Spark) › Network File System (S3, HDFS) https://github.com/donnemartin/system-design-primer 6 / 97 Highload Architecture0 码力 | 100 页 | 6.86 MB | 1 年前37. UDF in ClickHouse
systems and algorithms Active GitHub User • https://github.com/hczhcz • Interested in computer system and language stuff • 8 organizations, 90+ repos, 600+ followers ClickHouse Contributor Begin Content in a ML System • Pre-analyzing the data • Extracting features • Constructing relationship graphs • Generating reports • ... Begin Content Area = 16,30 7 Intensive Tasks in a ML System • Pre-analyzing + Output Task = Query or external program Query = “CREATE TABLE ... AS SELECT ...” A Database System and A ML Pipeline Begin Content Area = 16,30 10 Why ClickHouse Limited hardware resources &0 码力 | 29 页 | 1.54 MB | 1 年前3Тестирование ClickHouse которого мы заслуживаем
contrib shared clang-8 release thread contrib static gcc-8 release — contrib static gcc-8 release — system static И это не все... 11 / 77 Тестирование ClickHouse, которого мы заслуживаем ClickHouse не joinGet(toDateTimeOrNull((CAST(([885455.14523]) AS String)))); SELECT (SELECT 1) FROM remote('127.0.0.{1,2}', system.one); 21 / 77 Тестирование ClickHouse, которого мы заслуживаем Про интеграцию С чем интегрируется которого мы заслуживаем Тесты производительности: анализ запросов Запрос: SELECT count() FROM system.numbers WHERE NOT ignore( materialize('xxxxxxxxxxxxxxxxxx') AS s, concat(s, s, s, s, s, s, s, s0 码力 | 84 页 | 9.60 MB | 1 年前3蔡岳毅-基于ClickHouse+StarRocks构建支撑千亿级数据量的高可用查询引擎
灵活创建不同的虚拟集群用于适当的场合; Ø 随时调整服务器,新增/缩减服务器; 分布式: k8s的集群式部署 全球敏捷运维峰会 广州站 采用ClickHouse后平台的查询性能 system.query_log表,记录已经 执行的查询记录 query:执行的详细SQL,查询相关记录可以 根据SQL关键字筛选该字段 query_duration_ms:执行时间 memory_usage:占用内存 • 数据导入之前要评估好分区字段; • 数据导入时根据分区做好Order By; • 左右表join的时候要注意数据量的变化; • 是否采用分布式; • 监控好服务器的cpu/内存波动/`system`.query_log; • 数据存储磁盘尽量采用ssd; • 减少数据中文本信息的冗余存储; • 特别适用于数据量大,查询频次可控的场景,如数据分析,埋点日志系统; 全球敏捷运维峰会0 码力 | 15 页 | 1.33 MB | 1 年前3
共 11 条
- 1
- 2