GPU Resource Management On JDOS
GPU Resource Management On JDOS 梁永清 liangyongqing1@jd.com 提供的服务 1. 用于实验的 GPU 容器 2.基于 Kubeflow 的机器学习训练服务 3.模型管理和模型 Serving 服务 Experiment Training Serving 均基于容器,不对业务方直接提供 GPU 物理机 GPU 实验 JDOS 常规的容器服务0 码力 | 11 页 | 13.40 MB | 1 年前3Node Operator: Kubernetes Node Management Made Simple
Node Operator: Kubernetes Node Management Made Simple 陈俊(Joe), Ant Financial Agenda • Background and Motivation • Introduction of Operators • Node-Operator • Advanced Topic: • Upgrade Master & Node Components reliably • Canary Rollout • Master & Node Component Versions Management Motivation: Work Order Deployment Worker Order • Upgrade Nodes Versions • Upgrade Node 10.10 Complicated architecture Work order deployment system can not meet the requirements of resource management. Operator Observe Action Analyze • Observe: watch desired resource and actual resource0 码力 | 18 页 | 11.70 MB | 1 年前3QCon北京2018/QCon北京2018-《Kubernetes-+面向未来的开发和部署》-Michael+Chen
Operating System Physical Infrastructure Containers VMware Hypervisor VMs Docker Containers User Cases 9 •Ready-to-go development •Self-service portal Developer Sandbox • New application development Very manual, no fault tolerance, hard to scale, etc • Scheduling, provisioning, and resource management of multiple containers – Docker, Mesos à Kubernetes Support – AWS, Azure, Google à Kubernetes ContainerImage2 Replicas: 2 Kubernetes 101 at the Highest Level • Container Cluster = “Desired State Management” – Kubernetes Cluster Services (w/API) • Node = Container Host w/agent called “Kubelet” • Application0 码力 | 42 页 | 10.97 MB | 1 年前301. K8s扩展功能解析
Kubernetes User Interface | Application Catalog | Monitoring | Logging Management Plane Infrastructure Services - Policy Management - Cluster Operations - User Management - Lifecycle Management Infrastructure0 码力 | 12 页 | 1.08 MB | 1 年前3Kubernetes Native DevOps Practice
Easy to be customized as user requirements are diverse • Easy to setup, maintain, extend and scale • Reduce the learning curve for customer and ourselves • Get consistent user experience and data, leverage Unified logging、monitoring、alert with PaaS Consistent data Node group of build nodes Node group of user applications Scheduling customization Cluster Resource Auto Scaling kubelet can do image GC CI/CD Examples - Build Docker Image dockerfile using ConfigMap Job - pod template - volumes user build task • build the docker images init task • prepare code repository - volumes DevOps Operator0 码力 | 21 页 | 6.39 MB | 1 年前3Kubernetes安全求生指南
k8s clusters Manages access to k8s API for developers IT Operator IaaS Management Internet User Application User Trust Boundary Trust Boundary Trust Boundary Trust Boundary ©2019 VMware • 對原生Kubernetes API提供認證與角 色權限控管(RBAC) • 集中帳號權限管理-可整合外部Active Directory/LDAP 如何實踐 • 透過User Account & Authentication (UAA) 服務達成PKS API 呼叫認證 • 透過 CredHub服務安全地自動化產生與 保存帳號權限 • 這幾項服務可以針對多個 Auditing h. Authentication and Authorization i. Compliance j. File System Permissions k. User Account Management 所有強化在發佈前都經過測試驗證 您不再需要每回合升級都從頭來過 若發現CVE漏洞官方立刻提供修補 •The following servers are not0 码力 | 23 页 | 2.14 MB | 1 年前3QCon北京2017/智能化运维/Self Hosted Infrastructure:以自动运维 Kubernetes 为例
distributed system Self driving infrastructure Topics ● Cluster management systems ● Today’s problems with operating cluster management systems ● A self-driving approach Motivation: microservices components ○ dynamic dependencies ○ fast deployment iteration ● Solution: automation Cluster management system ● Automation ○ Scheduling ○ Deployment ○ Healing ○ Discovery/load balancing ○ Scaling Kubernetes? ● Operational expertise around app management in k8s extends to k8s itself ○ E.g. scaling ● Bootstrapping simplified ● Simply cluster life cycle management ○ E.g. updates ● Upstream improvements0 码力 | 73 页 | 1.58 MB | 1 年前3Putting an Invisible Shield on Kubernetes Secrets
complicated! ü User access management => raw and extensive! ü Secrets management => crucial! • Financial-grade security [1] KubeCon China 2018: Node Operator: Kubernetes Node Management Made Simple - extensions-webhook: /mutating-secret • Annotation: /storage-transform-disable=• Emergency management • High Availability guarantee • KMS • API server & kms-plugin • Cron job backup for KEKs (from 0 码力 | 33 页 | 20.81 MB | 1 年前3Kubernetes开源书 - 周立
⽂件创建Deployment的⼀种⽅法是在 kubectl 命令⾏界⾯中使⽤ kubectl create 命令,将 .yaml ⽂件 作为参数传递。 例如: $ kubectl create -f docs/user-guide/nginx-deployment.yaml --record 将会输出类似如下的内容: deployment "nginx-deployment" created 必填字段 在Kubernetes对象的 关于Node的⼀般信息,如内核版本、Kubernetes版本(kubelet和kube-proxy版本)、Docker版本(如果使⽤了Docker 的话)、OS名称。信息由Kubelet从Node收集。 Management(管理) 与 pods 、 services 不同,Node不是由Kubernetes创建的:它是由Google Compute Engine等云提供商在外部创建 的,或存在于物理机或虚 io/docs/concepts/workloads/controllers/replicationcontroller/) 之间的唯⼀区别就是选择器⽀持。 ReplicaSet⽀持 labels user guide 描述的新的set-based selector requirement,⽽Replication Controller仅⽀持equality- based selector requirement。0 码力 | 135 页 | 21.02 MB | 1 年前3Kubernetes & YARN: a hybrid container cloud
��� ����� �� Jian He Staff Engineer @Alibaba cluster management team Staff Engineer @Hortonworks Hadoop Committer & Project Management Committee member Bushuang Gao Senior Engineer @Alibaba deployment are built around containers. YARN Application centric: top down. Scheduling sequence: Queue -> user -> application -> container request ������������� kubernetes Based on api-server watch mechanism NODE Online service Console Offline jobs L&W L&W GRPC RPC: VTRON RPC: VTRON RPC Resource management VTRON: Virtual Total Resources Of Node cgroup �������� ������� Kubernetes YARN Online service0 码力 | 42 页 | 25.48 MB | 1 年前3
共 45 条
- 1
- 2
- 3
- 4
- 5