ClickHouse on KubernetesClickHouse on Kubernetes! Alexander Zaitsev Altinity Background ● Premier provider of software and services for ClickHouse ● Incorporated in UK with distributed team in US/Canada/Europe 24x7 support for ClickHouse deployments ○ Software (Kubernetes, cluster manager, tools & utilities) ○ POCs/Training What is Kubernetes? “Kubernetes is the new Linux” Actually it’s an open-source machine resources efficiently ● automate application deployment Why run ClickHouse on Kubernetes? Other applications are already there Easier to manage than deployment on hosts Bring0 码力 | 34 页 | 5.06 MB | 1 年前3
ClickHouse on KubernetesClickHouse on Kubernetes! Alexander Zaitsev, Altinity Limassol, May 7th 2019 Altinity Background ● Premier provider of software and services for ClickHouse ● Incorporated in UK with 24x7 support for ClickHouse deployments ○ Software (Kubernetes, cluster manager, tools & utilities) ○ POCs/Training What is Kubernetes? “Kubernetes is the new Linux” Actually it’s an open-source machine resources efficiently ● automate application deployment Why run ClickHouse on Kubernetes? 1. Other applications are already there 2. Portability 3. Bring up data warehouses quickly0 码力 | 29 页 | 3.87 MB | 1 年前3
ClickHouse in Productionbackend › clickhouse-mysql-data-reader – MySQL replica › clickhouse-operator – configurator for Kubernetes › clickhousedb_fdw – foreign data wrapper › clickhouse_sinker – data loader from Kafka › Tabix EventTime DateTime, BannerID UInt64, Cost UInt64, CounterType Enum('Hit'=0, 'Show'=1, 'Click'=2) ) ENGINE = HDFS('hdfs://hdfs1:9000/event_log.parq', 'Parquet') 50 / 97 In ClickHouse: DDL CREATE TABLE EventTime DateTime, BannerID UInt64, Cost UInt64, CounterType Enum('Hit'=0, 'Show'=1, 'Click'=2) ) ENGINE = HDFS('hdfs://hdfs1:9000/event_log.parq', 'Parquet') Ok. 0 rows in set. Elapsed: 0.004 sec. 510 码力 | 100 页 | 6.86 MB | 1 年前3
ClickHouse: настоящее и будущееVideo streaming analytics Media & news analytics Social recommendations Classifieds. Dating Search engine optimization Telecom traffic analysis DPI analysis CDR records analysis Fraud & spam detection DDoS ClickHouse — доступная система 9 ClickHouse можно развернуть: • На своих серверах • В облаках; с Kubernetes • На инфраструктуре заказчика • На личном ноутбуке ClickHouse доступен под разные платформы: • Data Hub Support For Semistructured Data 27 JSO data type: CREATE TABLE games (data JSON) ENGINE = MergeTree; • You can insert arbitrary nested JSONs • Types are automatically inferred on INSERT0 码力 | 32 页 | 2.62 MB | 1 年前3
ClickHouse: настоящее и будущееVideo streaming analytics Media & news analytics Social recommendations Classifieds. Dating Search engine optimization Telecom traffic analysis DPI analysis CDR records analysis Fraud & spam detection DDoS ClickHouse — доступная система 9 ClickHouse можно развернуть: • На своих серверах • В облаках; с Kubernetes • На инфраструктуре заказчика • На личном ноутбуке ClickHouse доступен под разные платформы: • Data Hub Support For Semistructured Data 27 JSO data type: CREATE TABLE games (data JSON) ENGINE = MergeTree; • You can insert arbitrary nested JSONs • Types are automatically inferred on INSERT0 码力 | 32 页 | 776.70 KB | 1 年前3
3. Sync Clickhouse with MySQL_MongoDBCRUD directly Can’t update/delete table frequently in Clickhouse Possible Solutions 2. MySQL Engine Not suitable for big tables Not suitable for MongoDB Possible Solutions 3. Reinit whole table ● Mutations are stuck (KILL MUTATION) ● Zookeeper OOM because of SQL length (Put ids in a Memory Engine temp table) Final Product ● Only one config file needed for a new Clickhouse table ● Init and history state Create Update Update Delete Future ● Auto configure through web ● Auto deploy on Kubernetes ● Open source? ● Github: kevwan Q&A Thanks0 码力 | 38 页 | 7.13 MB | 1 年前3
8. Continue to use ClickHouse as TSDBDateTime, `Name` String, `Age` UInt8, ..., `HeartRate` UInt8, `Humidity` Float32, ... ) ENGINE = MergeTree() PARTITION BY toYYYYMM(Time) ORDER BY (Name, Time, Age, ...); ► Column-Orient Model `Name` LowCardinality(String), `Age` UInt8, ..., `HeartRate` UInt8, `Humidity` Float32, ... ) ENGINE = MergeTree() PARTITION BY toYYYYMM(Time) ORDER BY (Name, Time, Age, ...); ► Column-Orient Model UInt8, ..., `time_series` AggregateFunction( groupArray, Tuple(DateTime, Float64)) ) ENGINE = AggregatingMergeTree() PARTITION BY toYYYYMM(time_series_interval) ORDER BY (metric_name, time_series_interval)0 码力 | 42 页 | 911.10 KB | 1 年前3
1. Machine Learning with ClickHouseto sample data SAMPLE x OFFSET y CREATE TABLE trips_sample_time ( pickup_datetime DateTime ) ENGINE = MergeTree ORDER BY sipHash64(pickup_datetime) -- Primary Key SAMPLE BY sipHash64(pickup_datetime) You can store model as aggregate function state in a separate table Example CREATE TABLE models ENGINE = MergeTree ORDER BY tuple() AS SELECT stochasticLinearRegressionState(total_amount, trip_distance) function state in ClickHouse You can save aggregate function result into table. CREATE TABLE tab ENGINE = Memory AS SELECT sumState(number) AS x FROM numbers(5) Use sumMerge to get final result SELECT0 码力 | 64 页 | 1.38 MB | 1 年前3
0. Machine Learning with ClickHouse to sample data SAMPLE x OFFSET y CREATE TABLE trips_sample_time ( pickup_datetime DateTime ) ENGINE = MergeTree ORDER BY sipHash64(pickup_datetime) -- Primary Key SAMPLE BY sipHash64(pickup_datetime) You can store model as aggregate function state in a separate table Example CREATE TABLE models ENGINE = MergeTree ORDER BY tuple() AS SELECT stochasticLinearRegressionState(total_amount, trip_distance) function state in ClickHouse You can save aggregate function result into table. CREATE TABLE tab ENGINE = Memory AS SELECT sumState(number) AS x FROM numbers(5) Use sumMerge to get final result SELECT0 码力 | 64 页 | 1.38 MB | 1 年前3
C++ zero-cost abstractions на примере хеш-таблиц в ClickHouseпробы (Linear probing). Пример ClickHouse HashMap. Квадратичные пробы (Quadratic probing). Пример: Google DenseHashMap. 1. Хорошая кэш-локальность. 2. Нужно аккуратно выбирать хэш-функцию. 3. Нельзя хранить 1) Выбор load factor 18 18 0.5 хороший вариант для линейных проб с шагом 1 ClickHouse HashMap, Google DenseHashMap использует 0.5 Abseil HashMap использует 0.875 Способ размещения в памяти 19 19 Способ элементов. Это ~600 MB, не влазит в LL-кэши. Хеш-таблица Время ClickHouse HashMap 7.366 сек. Google DenseMap 10.089 сек. Abseil HashMap 9.011 сек. std::unordered_map 44.758 сек. Бенчмарки 28 280 码力 | 49 页 | 2.73 MB | 1 年前3
共 15 条
- 1
- 2













