ClickHouse in Productionhttps://badoo.com/ 26 / 97 ClickHouse in Production: Badoo HTTP Server Aggregation Services HDFS Storage MR Aggregation Old Events Pusher Events Storage Old Events Database PHP API Client Graphs Browser Mobile App 27 / 97 ClickHouse in Production: Badoo HTTP Server Aggregation Services HDFS Storage MR Aggregation Old Events Pusher Events Storage Old Events Database PHP API Client Graphs Browser Mobile App 28 / 97 ClickHouse in Production: Badoo HTTP Server Aggregation Services HDFS Storage MR Aggregation Old Events Pusher Events Storage Old Events Database PHP API Client Graphs0 码力 | 100 页 | 6.86 MB | 1 年前3
7. UDF in ClickHouseenters, it will be “added” to the aggregation state • Then, the aggregate function yields a result field • When a row leaves, it will be “removed” from the aggregation state • It requires some special Begin Content Area = 16,30 25 Aggregate Function Implementation avg(x) • Data are handled per group and row • Internal state = (count(), sum(x)) • To add one row • sum(x) += x[i] • count() += 10 码力 | 29 页 | 1.54 MB | 1 年前3
Что нужно знать об архитектуре ClickHouse, чтобы его эффективно использоватьнеделю. SELECT Referer, count(*) AS count FROM hits WHERE CounterID = 1234 AND Date >= today() - 7 GROUP BY Referer ORDER BY count DESC LIMIT 10 Типичный запрос в системе веб-аналитики Быстро читаем Distributed таблицы CSV 227 Gb, ~1.3 млрд строк SELECT passenger_count, avg(total_amount) FROM trips GROUP BY passenger_count NYC taxi benchmark Шардов 1 3 140 Время, с. 1,224 0,438 0,043 Ускорени е x2 https://t.me/clickhouse_ru › GitHub: https://github.com/yandex/ClickHouse/ › Google group: https://groups.google.com/group/clickhouse Спасибо0 码力 | 28 页 | 506.94 KB | 1 年前3
5. ClickHouse at Ximalaya for Shanghai Meetup 2019 PDFtimestamps, arrayEnumerate(pages) as index FROM (SELECT * FROM client_log_all ORDER BY timestamp) GROUP BY user ����������������� ���������� SELECT user, groupArray(page) as pages, groupArray(timestamp) pages[i+2]='Order'), index, pages) as level_3 FROM (SELECT * FROM client_log_all ORDER BY timestamp) GROUP BY user • ����������������������������������������������������������������������� ����������������� 'Order' ), sortedPages, nextSortedPages, nextNextSortedPages) as level_2 … FROM client_log_all GROUP BY user • �������������� ������������������������������� �������� ������ ������ ���� ���� � �������0 码力 | 28 页 | 6.87 MB | 1 年前3
1. Machine Learning with ClickHousequery SELECT cab_type, simpleLinearRegression(trip_distance, total_amount) FROM trips WHERE <...> GROUP BY cab_type ┌─cab_type─┬─simpleLinearRegression(trip_distance, total_amount)─┐ │ yellow │ (2.4343401638740527 toYear(pickup_datetime) AS y, simpleLinearRegression(trip_distance, total_amount) FROM trips WHERE <...> GROUP BY y ┌────y─┬─simpleLinearRegression(trip_distance, total_amount)─┐ │ 2009 │ (2.553562453857034,3 year, stochasticLinearRegressionState(total_amount, trip_distance) AS model FROM trips WHERE <...> GROUP BY year Ok. 39 / 62 Apply several trained models SELECT evalMLMethod(model, trip_distance), total_amount0 码力 | 64 页 | 1.38 MB | 1 年前3
0. Machine Learning with ClickHouse query SELECT cab_type, simpleLinearRegression(trip_distance, total_amount) FROM trips WHERE <...> GROUP BY cab_type ┌─cab_type─┬─simpleLinearRegression(trip_distance, total_amount)─┐ │ yellow │ (2.4343401638740527 toYear(pickup_datetime) AS y, simpleLinearRegression(trip_distance, total_amount) FROM trips WHERE <...> GROUP BY y ┌────y─┬─simpleLinearRegression(trip_distance, total_amount)─┐ │ 2009 │ (2.553562453857034,3 year, stochasticLinearRegressionState(total_amount, trip_distance) AS model FROM trips WHERE <...> GROUP BY year Ok. 39 / 62 Apply several trained models SELECT evalMLMethod(model, trip_distance), total_amount0 码力 | 64 页 | 1.38 MB | 1 年前3
3. Sync Clickhouse with MySQL_MongoDBevery day…… Possible Solutions 4. CollapsingMergeTree ● FINAL is slow ● GROUP BY id HAVING sum(sign)>0 ○ Need to use GROUP BY in every query ○ Not suitable for multi-column primary key Our Solution:0 码力 | 38 页 | 7.13 MB | 1 年前3
2. Clickhouse玩转每天千亿数据-趣头条select column1, column2 from table where column=value 凡是涉及group by, order by, distinct, join这样的SQL内存占用不再是O(1) 解决: 1:max_bytes_before_external_group_by 2:max_bytes_before_external_sort 3:uniq / uniqCombined0 码力 | 14 页 | 1.10 MB | 1 年前3
2. 腾讯 clickhouse实践 _2019丁晓坤&熊峰ARRAY JOIN Goals GROUP BY key ORDER BY value DESC LIMIT 10 SELECT play_times_key AS key, sum(play_times_value) AS value FROM wegame ARRAY JOIN play_times_key, play_times_value GROUP BY key ORDER BY0 码力 | 26 页 | 3.58 MB | 1 年前3
3. 数仓ClickHouse多维分析应用实践-朱元(for query) exceeded 解决:通过在users.xml 配置 max_bytes_before_external_sort max_bytes_before_external_group_by 2. 用户并发量一上来,负载太高 解决:目前是在中间加redis缓存0 码力 | 14 页 | 3.03 MB | 1 年前3
共 13 条
- 1
- 2













