Phaser 3 - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

PromQL 从入门到精通

packets 后面的值是 OS 启动以来发出去的总的包量，都是很大的值，我们通常不太关注这个值当前是多少，更关注的是最近 1 分钟收到/发出多少包，或者每秒收到/发出多少包。 1 2 3 4 5 6 7 8 而对于监控数据采集器而言，一般是周期性运行的，比如每 10 秒采集一次，每次采集网卡收到/发出的包这个数据的时候，都只能采集到当前的值，就像执行 ifconfig 。如果返回了结果，比如上例中返回了3条结果，告警引擎就会认为有异常产生，生成3个告警事件。当然，有的时候，偶尔一次触发了阈值我们认为不算啥事，希望连续触发多次才告警，此时就要使用 prometheus alerting rule 的 for 关键字，或者夜莺中的持续时长的配置，表示在一个时间范围内多次执行，每次都触发了才告警。像上例触发了3个告警事件，如果后面继续周期性使用promql查询查不到数据了，就说明最新 5分钟的负载，需求是：最近1分钟的负载大于8或者最近5分钟的负载大于8，就告警，promql写法： system_load1{app="clickhouse"} > 8 or 1 2 3 1 2 system_load5{app="clickhouse"} > 8 unless vector1 unless vector2，结果是一个由vector1中的元素组成的向量，在vector2中没有完全匹

0 码力 | 16 页 | 2.77 MB | 1 年前
3
OpenMetrics - Standing on the shoulders of Titans

OpenMetrics Outro Plans Next steps Full OpenMetrics support in Prometheus, InfluxDB, OpenCensus, M3DB, etc Spreading the word CNCF sandbox to incubating Richard Hartmann, RichiH@{freenode,OFTC,IRCnet} ,method=" post" ,code=" 200" } 1027 http_requests_total{env=" prod" ,method=" post" ,code=" 400" } 3 http_requests_total{env=" prod" ,method=" post" ,code=" 500" } 12 http_requests_total{env=" prod" ,method=" ,method=" post" ,code=" 200" } 1027 http_requests_total{env=" prod" ,method=" post" ,code=" 400" } 3 http_requests_total{env=" prod" ,method=" post" ,code=" 500" } 12 http_requests_total{env=" prod" ,method="

0 码力 | 21 页 | 84.83 KB | 1 年前
3
Intro to Prometheus - With a dash of operations & observability

Prometheus Introduction Background Operations & observability Outro Time split 1 1/3 Prometheus 2 1/3 Observability 3 1/3 Questions Richard Hartmann & Frederic Branczyk @TwitchiH & @fredbrancz Intro to ,method=" post" ,code=" 200" } 1027 http_requests_total{env=" prod" ,method=" post" ,code=" 400" } 3 http_requests_total{env=" prod" ,method=" post" ,code=" 500" } 12 http_requests_total{env=" prod" ,method="

0 码力 | 19 页 | 63.73 KB | 1 年前
3
B站统⼀监控系统的设计,演进与实践分享

(实验性质使⽤用) • prometheus 2.0 (tsdb) HA prometheus server1 server2 server3 prometheus IDC HA prometheus server1 server2 server3 prometheus IDC Federation pr s s s pr I pr s s s pr I IDC1 告警规则: 业务A 慢请求⽐比例例 > 80% 案例例2 告警规则: 磁盘容量量可⽤用率 <10% 告警规则: 磁盘容量量预计将于3⼩小时后饱和 0 now -1h +3h predict_linear(node_filesystem_free{}[1h], 3 * 3600) < 0 异常检测异常流量量 abs(requests - requests:holt_winters_rate1h

0 码力 | 34 页 | 650.25 KB | 1 年前
3
Prometheus Deep Dive - Monitoring. At scale.

2.6 Beyond Outro Three main features Storage backend Caveat: Prometheus 2.0 comes with storage v3 Staleness handling Remote read & write API is now stable-ish Links to in-depth talks about these Dive Introduction Intro 2.0 to 2.2.1 2.4 - 2.6 Beyond Outro Long-term storage Solutions Storage v3 supports backups efficiently and effectively Remote read-write allows you to integrate with a growing Further reading Prometheus 2017 Dev Summit: https://docs.google.com/document/d/ 1DaHFao0saZ3MDt9yuuxLaCQg8WGadO8s44i3cxSARcM/edit Prometheus 2018 Dev Summit: https://docs.google.com/document/d/ 1-C5Pycoc

0 码力 | 34 页 | 370.20 KB | 1 年前
3
1.6 利用夜莺扩展能力打造全方位监控系统

Nightingale 众多企业已上生产，共同打磨夜莺 Server01 Server02 Agentd Agentd LoadBalance 1. 单机版Prom 2. 集群版m3db 3. 集群版n9e-tsdb 3种存储方案，按需选择 Agentd 夜莺设计实现 Agentd 数据采集第四部分监控系统的核心功能，是数据采集、存储、分析、展示，完备性看采集能力，是否能够兼容并包，纳入更多生态的能力，

0 码力 | 40 页 | 3.85 MB | 1 年前
3
告警OnCall事件中心建设方法白皮书

，这样分级才有意义，比如通知渠道不同，通知范围不同，或者介入处理的人的范围不同，处理时效不同，如果某两个级别对应完全一样的处理逻辑，就可以合并成一个级别。我的做法是把告警分成 3 个级别。级别通知渠道说明 Critical 电话、短信、即时消息、邮件影响收入的、影响客户的，必须立刻处理 Warning 短信、即时消息、邮件无需立刻处理，但是如果不处理，时间久了就会段内所有 A 服务的告警收敛成一个故障，所有 B 服务的告警收敛成另一个故障。看起来效果好多了，只是没办法和现实中的告警和故障建立完美的对应关系，不过从降噪收敛角度来看，够用了。 3、根据时间 + 文本相似度做收敛文本相似度需要引入算法，但是算法总得有个规律，我们很想把某个故障相关的告警聚拢到一起，但是显然，很难有个行之有效的规律，没有规律的算法效果自然好不到哪儿去。

0 码力 | 23 页 | 1.75 MB | 1 年前
3

共 7 条前往

页

分类

语言

格式

PromQL 从入门到精通

OpenMetrics - Standing on the shoulders of Titans

Intro to Prometheus - With a dash of operations & observability

B站统⼀监控系统的设计,演进与实践分享

Prometheus Deep Dive - Monitoring. At scale.

1.6 利用夜莺扩展能力打造全方位监控系统

告警OnCall事件中心建设方法白皮书

分类

语言

格式

PromQL 从入门到精通

OpenMetrics - Standing on the shoulders of Titans

Intro to Prometheus - With a dash of operations & observability

B站统⼀监控系统的设计,演进 与实践分享

Prometheus Deep Dive - Monitoring. At scale.

1.6 利用夜莺扩展能力打造全方位监控系统

告警OnCall事件中心建设方法白皮书

B站统⼀监控系统的设计,演进与实践分享