Special Resource Operator - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

GPU Resource Management On JDOS

GPU Resource Management On JDOS 梁永清 liangyongqing1@jd.com 提供的服务 1. 用于实验的 GPU 容器 2.基于 Kubeflow 的机器学习训练服务 3.模型管理和模型 Serving 服务 Experiment Training Serving 均基于容器，不对业务方直接提供 GPU 物理机 GPU 实验 JDOS 常规的容器服务

0 码力 | 11 页 | 13.40 MB | 1 年前
3
Node Operator: Kubernetes Node Management Made Simple

Node Operator: Kubernetes Node Management Made Simple 陈俊(Joe), Ant Financial Agenda • Background and Motivation • Introduction of Operators • Node-Operator • Advanced Topic: Topic: Kube-on-Kube-Operator • Achievement • Q&A Background: DC/OS From Sigma 2.0(Swarm) to Sigma 3.1(Kubernetes) Background: Cluster Scale • Production environment: • Dozens of Cluster • 5k+ Nodes / Cluster deployment system can not meet the requirements of resource management. Operator Observe Action Analyze • Observe: watch desired resource and actual resource • Analyze: difference from desired and actual

0 码力 | 18 页 | 11.70 MB | 1 年前
3
Operator Pattern 用 Go 扩展 Kubernetes 的最佳实践

Operator Pattern：用 Go 扩展 K8s 的最佳实践吴学强 ApeCloud KubeBlocks Maintainer & 研发总监目录认识我们 00 什么是 Operator 01 Operator 基础模型 02 Operator 最佳实践 03 我们是谁云猿生（ApeCloud）是一家提供数据库内核与管理平台的基础软件开发商. KubeBlocks 从被收购到卷王（si）回到初（qi）心（dian） KubeBlocks Maintainer & 研发总监 free6om 什么是 Operator 第一部分 Operator 前世今生 TPR Operator CRD Operator Pattern 2015.11 2016.12 2017.12 Now K8s 1.1 版本中正式推出 TPR （ThirdPartyResource），首次尝 K8s API 的扩展性问题，但存在诸多问题，Alpha 阶段既夭折 CoreOS 提出 Operator 概念，用于管理和运行基于应用程序领域的复杂有状态应用程序。给出了用 TPR + controller- runtime 早期版本的 sample： etcd operator K8s 1.9 版本发布，CRD进入 beta 阶段并正式取代 TPR； controller-runtime

0 码力 | 21 页 | 3.06 MB | 9 月前
3
Kubernetes开源书 - 周立

(Dashboard) Dashboard 是⼀个Kubernetes集群通⽤、基于Web的UI。它允许⽤户管理/排错集群中应⽤程序以及集群本身。 Container Resource Monitoring（容器资源监控） Container Resource Monitoring 将容器的通⽤时序指标记录到⼀个中⼼化的数据库中，并提供⼀个UI以便于浏览该数据。 Cluster-level Logging（集群级别的⽇志） Namespace为Name（名称）提供了范围。在Namespace中，资源的名称必须唯⼀，但不能跨Namespace。 Namespace是⼀种在多种⽤途之间划分集群资源的⽅法（通过 resource quota ）。在未来的Kubernetes版本中，同⼀Namespace中的对象默认有相同的访问控制策略。没有必要使⽤多个Namespace来分隔稍微不同的资源，例如同⼀软件的不同版本，可使⽤ redis matchExpressions: 09-Label和Selector 28 - {key: tier, operator: In, values: [cache]} - {key: environment, operator: NotIn, values: [dev]} matchLabels 是 { key,value } 的映射。 matchLabels

0 码力 | 135 页 | 21.02 MB | 1 年前
3
KubeCon2020/腾讯会议大规模使用Kubernetes的技术实践

stateful service Ø Advanced scheduling to improve service stability Ø Quota management to optimize resource orchestration efficiency Ø High performance and comprehensive autoscaling What is TKEx Ø Based management. • Support big data and AI jobs. • Optimize the isolation of resources, and improve resource utilization using hybrid deployment of online and offline services. • Support Service Mesh. Rosource Manage & Schedule Ceres Job Queue Manager Spark-Operator OfflineJobs Scheduler Kubeflow Hybrid Deploy StatefulSetPlus-Operator Tencent Cloud Mesh MultiCluster-Route-Manager Application

0 码力 | 19 页 | 10.94 MB | 1 年前
3
Kubernetes Native DevOps Practice

Kubernetes Capabilities/Advantages to Build DevOps Solution • Architecture and Features • CRD and operator design • Pipeline / Stage/ Task / Task Template / Version Control • Logging, monitoring, autoscaling Environment variable [] VolumeMounts - Files to be shared or persisted [] Resources - Resource requirement ActiveDeadlineSeconds Timeout of build task Lifecycle - Actions defined Kubernetes Capabilities and Advantages to Build DevOps Solution • Architecture and Features • CRD and operator design • Pipeline/Stage/Task/Task Template/Version Control/UI generation/Volume... • Logging

0 码力 | 21 页 | 6.39 MB | 1 年前
3
k8s操作手册 2.3

1/manifests/�gera- operator.yaml # wget h�ps://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/custom- resources.yaml #创建控制器 Install the Tigera Calico operator and custom resource defini�ons defini�ons # kubectl create -f �gera-operator.yaml #如果需要更改镜像，可编辑此文件 # more �gera-operator.yaml | grep -i image: image: quay.io/�gera/operator:v1.30.4 #修改pod网段 # sed -i custom resource # kubectl create -f custom-resources.yaml #如果需要更改镜像，只能部署后，只能更改相应的daemonset及deployment #查看calico节点状态，状态全部为Running则为启动成功 # kubectl get pods -n �gera-operator NAME

0 码力 | 126 页 | 4.33 MB | 1 年前
3
可觀測性 (Observability) 在 Kubernetes Day2 Operation的考量與實踐

Kubernetes-native monitoring and logging for security and availability • 中央管理面板必須包含強大的雲原生環境監控功能 • Resource utilization tools • Kubernetes Day2 管理運營必須包括幫助公司了解其成本、優化資源利用率並最終降低總體成本的工具。 Click to edit Master 提供企業上雲與應用部署（K8S）的資安偵測與修補 Click to edit Master title style 15 GitOps 的好朋友 – xxxOperator • Operator 的目標是將 operation 知識放入軟件中 • Operator 運行在 Kubernetes 集群內並根據宣告式 (Declarative) 的 CRD 文件來自動化常見的 Day 1和 Day2 的活動。 15 https://runbooks.prometheus-operator.dev/ Click to edit Master title style 22 不同類型的 Runbook • 手動 • Step-by-step instructions followed by the operator • 半自動 • A combination of operator- followed steps with

0 码力 | 30 页 | 3.01 MB | 1 年前
3
用户界面State of the UI_ Leveraging Kubernetes Dashboard and Shaping its Future

pod ● Global search ● Login mechanism ● Settings page ● Support for Cron Jobs ● Redesigned resource creation ● ...and much much more. github.com/kubernetes/dashboard/releases In-progress work [the foundation] on which we can build our custom command center.” → Survey response → Cluster Operator, running Kubernetes on-prem and in the cloud 2. Feature parity with kubectl 3. Multi-cluster management resources, that would be a huge win.” → Survey response → Cluster Operator, running Kubernetes in GCP and on-prem ● Custom Resource Definitions support ● Service topology view ● Mobile device support

0 码力 | 41 页 | 5.09 MB | 1 年前
3
逐灵&木苏-阿里巴巴 K8S 超大规模实践经验

• 数十个集群 • 数十万的节点 • 单集群规模 10,000 节点 • • 数万个应用 • 超百万的容器 Online Service AI Job FaaS Middleware Resource management, Scheduling, Automated operation, etc. Workloads Containers Cluster Management IDC Controller Kubernetes Platform沉淀公共运维能力 • Operator Platform Kubernetes API Server Operator Manager sidecar framework 运维能力 operator sidecar framework 运维能力 operator 运维平台运维基础能力沉淀运维平台运维能力编程框架 Kubernetes

0 码力 | 33 页 | 8.67 MB | 6 月前
3

共 45 条前往

页

分类

语言

格式

GPU Resource Management On JDOS

Node Operator: Kubernetes Node Management Made Simple

Operator Pattern 用 Go 扩展 Kubernetes 的最佳实践

Kubernetes开源书 - 周立

KubeCon2020/腾讯会议大规模使用Kubernetes的技术实践

Kubernetes Native DevOps Practice

k8s操作手册 2.3

可觀測性 (Observability) 在 Kubernetes Day2 Operation的考量與實踐

用户界面State of the UI_ Leveraging Kubernetes Dashboard and Shaping its Future

逐灵&木苏-阿里巴巴 K8S 超大规模实践经验