Google Kubernetes Engine (GKE) - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

Prometheus Deep Dive - Monitoring. At scale.

Prometheus team member Frederic Branczyk Red Hat (previously CoreOS) All things Prometheus / Kubernetes Kubernetes SIG-Instrumentation lead Prometheus team member Richard Hartmann & Frederic Branczyk @TwitchiH Prometheus Deep Dive Introduction Intro 2.0 to 2.2.1 2.4 - 2.6 Beyond Outro Prometheus 101 Inspired by Google’s Borgmon Time series database int64 timestamp, float64 value Ecosystem of instrumentation & exporters to 2.2.1 2.4 - 2.6 Beyond Outro Storage Test setup Kubernetes cluster with dedicated Prometheus nodes 800 microservice instances and Kubernetes components 120k samples/sec 300k active time series

0 码力 | 34 页 | 370.20 KB | 1 年前
3
Intro to Prometheus - With a dash of operations & observability

Prometheus team member Frederic Branczyk Red Hat (previously CoreOS) All things Prometheus / Kubernetes Kubernetes SIG-Instrumentation lead Prometheus team member Richard Hartmann & Frederic Branczyk @TwitchiH Prometheus Introduction Background Operations & observability Outro Prometheus 101 Inspired by Google’s Borgmon Time series database unit64 millisecond timestamp, float64 value Instrumentation & exporters

0 码力 | 19 页 | 63.73 KB | 1 年前
3
告警OnCall事件中心建设方法白皮书

的监控系统，比如阿里云不但有云监控，还有 ARMS，还有 SLS。大部分公司都不会只使用一套监控系统，网络设备的监控可能采用的 Zabbix，Kubernetes 的监控可能用的 Prometheus（Kubernetes 可能有多套，以至于 Prometheus 可能有多套）或者 Nightingale，日志的监控可能用的 Elastalert，如果上云了，可能还会有多套不同的云监控（尤其是多云场景下）。

0 码力 | 23 页 | 1.75 MB | 1 年前
3
PromQL 从入门到精通

记录，所以高基数的一侧是左侧，故而使用 group_left。另外举一个例子，说明 group_left group_right 的一个常见用法，比如我们使用 kube-state- metrics 来采集 Kubernetes 各个对象的指标数据，其中针对 pod 有个指标是 kube_pod_labels，会把 pod 的一些信息放到这个指标的标签里，指标值是1，相当于一个元信息，比如： kube_pod_labels{

0 码力 | 16 页 | 2.77 MB | 1 年前
3
OpenMetrics - Standing on the shoulders of Titans

People Acknowledgements Main work has been done by Prometheus team Ben Kochie Brian Brazil myself Google Sumeer Bhola Uber Jerome Froelich Rob Skillington Richard Hartmann, RichiH@{freenode,OFTC,IRCnet} OpenMetrics Outro People First commitments, too many for full list Cloudflare CNCF at large GitLab Google Grafana InfluxData Prometheus ;) RobustPerception SpaceNet Uber Richard Hartmann, RichiH@{freenode support since 0.4.0 Test your own OM output: robustperception.io/checking-openmetrics-output-is-valid Google and Uber want to create another reference parser to weed out bugs Richard Hartmann, RichiH@{freenode

0 码力 | 21 页 | 84.83 KB | 1 年前
3
1.6 利用夜莺扩展能力打造全方位监控系统

如果贵司的业务强依赖IT技术，IT故障会直接影响营业收入，稳定性体系一定要重视起来，而监控，就是稳定性体系中至关重要的一环运维监控需求来源 01.监控的原始需求来自业务稳定性左图是2013年的一个新闻，讲 Google宕机的影响。2020年也出现过aws大规模宕机的情况，影响不止是55万美元，直接影响大半个互联网！ 2018年有美国调研机构指出，如果服务器宕机1分钟，银行会损失 27万美元，制造业会损失42万美

0 码力 | 40 页 | 3.85 MB | 1 年前
3

共 6 条前往

页

分类

语言

格式

Prometheus Deep Dive - Monitoring. At scale.

Intro to Prometheus - With a dash of operations & observability

告警OnCall事件中心建设方法白皮书

PromQL 从入门到精通

OpenMetrics - Standing on the shoulders of Titans

1.6 利用夜莺扩展能力打造全方位监控系统