2. Clickhouse玩转每天千亿数据-趣头条processing significantly slower than inserts. 分析: 1:直接落盘,异步merge - background_pool_size 2:一个Insert Request,涉及N个分区的数据,在磁盘上就会生成N个数据目录,merge跟不上 3:一个目录,一个zxid,zookeeper集群的压力大,插入速度严重变慢 解决: 1:增大background_pool_size治标不治本 凡是涉及group by, order by, distinct, join这样的SQL内存占用不再是O(1) 解决: 1:max_bytes_before_external_group_by 2:max_bytes_before_external_sort 3:uniq / uniqCombined / uniqHLL12 4:Join时小表放到右边,“右表广播” ^v^ 我们遇到的问题 zookeeper相关的问题0 码力 | 14 页 | 1.10 MB | 1 年前3
ClickHouse: настоящее и будущее• Массивы, кортежи, лямбда функции • Комбинаторы агрегатных функций • LIMIT BY, ASOF JOIN, ANY/SEMI JOIN, argMin/argMax Функции для предметной области из коробки: • Click-stream: функции обработки оптимизаций JOIN 18 • Не учитывается сортировка таблицы для JOIN • Нет cost based optimizer для переупорядочивания JOIN • Нет grace hash алгоритма для JOIN • Нет shuffle для распределённых JOIN • И вообще вообще распределённые JOIN плохо работают Отсутствие UPSERT 19 • Отсутствие точечных UPDATE и DELETE, а также UNIQUE KEY CONSTRAINT • Реализовать unique key в распределённой системе — нетривиальная0 码力 | 32 页 | 2.62 MB | 1 年前3
ClickHouse: настоящее и будущееМассивы, кортежи, лямбда функции • Комбинаторы агрегатных функций • LIMIT BY, ASOF JOIN, ANY/SEMI JOIN, argMin/argMax Функции для предметной области из коробки: • Click-stream: функции обработки оптимизаций JOIN 18 • Не учитывается сортировка таблицы для JOIN • Нет cost based optimizer для переупорядочивания JOIN • Нет grace hash алгоритма для JOIN • Нет shuffle для распределённых JOIN • И вообще вообще распределённые JOIN плохо работают Отсутствие UPSERT 19 • Отсутствие точечных UPDATE и DELETE, а также UNIQUE KEY CONSTRAINT • Реализовать unique key в распределённой системе — нетривиальная0 码力 | 32 页 | 776.70 KB | 1 年前3
1. Machine Learning with ClickHousesionState(...) AS model FROM trips WHERE <...> AND (toYear(pickup_date) = 2010) FINAL allows to merge all data into single model SELECT finalizeAggregation(model) FROM models FINAL ┌─finalizeAggreg several trained models SELECT evalMLMethod(model, trip_distance), total_amount FROM trips LEFT JOIN models ON year = toYear(pickup_datetime) LIMIT 5 ┌─evalMLMethod(model, trip_distance)─┬─total_amount─┐ SELECT evalMLMethod(model, trip_distance) - total_amount AS diff FROM trips_mergetree_third LEFT JOIN models ON year = toYear(pickup_datetime) ) ┌───────────────MSE─┐ │ 4.145554613376103 │ └───────────────────┘0 码力 | 64 页 | 1.38 MB | 1 年前3
0. Machine Learning with ClickHouse sionState(...) AS model FROM trips WHERE <...> AND (toYear(pickup_date) = 2010) FINAL allows to merge all data into single model SELECT finalizeAggregation(model) FROM models FINAL ┌─finalizeAggreg several trained models SELECT evalMLMethod(model, trip_distance), total_amount FROM trips LEFT JOIN models ON year = toYear(pickup_datetime) LIMIT 5 ┌─evalMLMethod(model, trip_distance)─┬─total_amount─┐ SELECT evalMLMethod(model, trip_distance) - total_amount AS diff FROM trips_mergetree_third LEFT JOIN models ON year = toYear(pickup_datetime) ) ┌───────────────MSE─┐ │ 4.145554613376103 │ └───────────────────┘0 码力 | 64 页 | 1.38 MB | 1 年前3
4. ClickHouse在苏宁用户画像场景的实践表示groupBitmap聚合函数的中间状态。 可以通过groupBitmapState创建。 13 注:ClickHouse聚合函数有一些函数后缀可以使用: -State:获取聚合的中间计算结果 -Merge:将中间计算结果迚行合幵计算,返回最终结果 -MergeState:将中间计算结果迚行合幵计算,返回合幵后的中间结果 ClickHouse集成RoaringBitmap Bitmap的运算函数集: 2019-10-02 5 p1 8 2019-10-02 5 p2 一张简单的订单明细表 detail_order,如何计算用户的日留存? 15 标签 SQL 大表join,count distinct 都比较慢,而且容易 OOM! Bitmap应用示例 order_date uv_bitmap 2019-10-01 {1,2,3} 2019-10-020 码力 | 32 页 | 1.47 MB | 1 年前3
6. ClickHouse在众安的实践利用clickhouse实时计算的高效性能,对原始数据进行查询分析,从而支 持用户灵活的定义标签并让用户实时得到反馈。 标签平台 clickhouse 保单表 用户表 用户行为表 数据 • 历史保单数据 join 用户数据 join 用户行为数据 • 100+亿行,50+列 • 用户id • 事业部 • 入库时间 • first_policy_premium • ... • phone_flag • ha_flag formatReadableSize(ProfileEvents.Values) : toString(ProfileEvents.Values) as value from system.query_log array join ProfileEvents where event_date = today() and type = 2 and query_id = '05ff4e7d-2b8c-4c41-b03d-094f9d8b02f2';0 码力 | 28 页 | 4.00 MB | 1 年前3
ClickHouse in ProductionSELECT OrderID, sum(Cost) as SumCost, countDistinct(BannerID) as BannerCount FROM EventLogLocal INNER JOIN BannerTable ON BannerID=bannerid GROUP BY OrderID ORDER BY SumCost desc LIMIT 1; 72 / 97 More Examples: SELECT OrderID, sum(Cost) as SumCost, countDistinct(BannerID) as BannerCount FROM EventLogLocal INNER JOIN BannerTable ON BannerID=bannerid GROUP BY OrderID ORDER BY SumCost desc LIMIT 1; ┌─OrderID──┬──S SELECT OrderID, sum(Cost) as SumCost, countDistinct(BannerID) as BannerCount FROM EventLogLocal INNER JOIN BannerTable ON BannerID=bannerid GROUP BY OrderID ORDER BY SumCost desc LIMIT 1; ┌─OrderID──┬──S0 码力 | 100 页 | 6.86 MB | 1 年前3
蔡岳毅-基于ClickHouse+StarRocks构建支撑千亿级数据量的高可用查询引擎采用ClickHouse后平台的查询性能 全球敏捷运维峰会 广州站 ClickHouse应用小结 • 数据导入之前要评估好分区字段; • 数据导入时根据分区做好Order By; • 左右表join的时候要注意数据量的变化; • 是否采用分布式; • 监控好服务器的cpu/内存波动/`system`.query_log; • 数据存储磁盘尽量采用ssd; • 减少数据中文本信息的冗余存储; • 特别适用于数据量大,查询频次可控的场景,如数据分析,埋点日志系统; 全球敏捷运维峰会 广州站 StarRocks应用小结 • 发挥分布式的优势,要提前做好分区字段规划; • 支持各种join,语法会相对clickhouse简单很多; • 一个sql可以多处用; • 建立好守护进程以及节点监控; 全球敏捷运维峰会 广州站 THANK YOU!0 码力 | 15 页 | 1.33 MB | 1 年前3
2. 腾讯 clickhouse实践 _2019丁晓坤&熊峰play_times_value) AS value FROM wegame ARRAY JOIN Goals GROUP BY key ORDER BY value DESC LIMIT 10 SELECT play_times_key AS key, sum(play_times_value) AS value FROM wegame ARRAY JOIN play_times_key, play_times_value0 码力 | 26 页 | 3.58 MB | 1 年前3
共 14 条
- 1
- 2













