GPU Resource Management On JDOS
GPU Resource Management On JDOS 梁永清 liangyongqing1@jd.com 提供的服务 1. 用于实验的 GPU 容器 2.基于 Kubeflow 的机器学习训练服务 3.模型管理和模型 Serving 服务 Experiment Training Serving 均基于容器,不对业务方直接提供 GPU 物理机 GPU 实验 JDOS 常规的容器服务0 码力 | 11 页 | 13.40 MB | 1 年前3KubeCon2020/大型Kubernetes集群的资源编排优化
Resource orchestration optimization of kubernetes cluster in large scale Patrickxie ( 谢谆志) Background Cloud has been the general trend. How to manage so many clusters ,resources and businesses How to ensure load balancing of cluster nodes 1 2 Improper resource requests 3 Multi-tenant resource preemption How to expand horizontally more quickly and flexibly 4 Region1 How do you manage K8S scheduling is based on the resource request of Pod. However, in many cases, some nodes have low resource requests but high load, while some nodes have high resource requests but low load. Dynamic-Scheduler0 码力 | 27 页 | 3.91 MB | 1 年前3vmware组Kubernetes on vSphere Deep Dive KubeCon China VMware SIG
placement of pods. This is used to spread pods across availability zones, while still respecting resource access and availability concerns. When Kubernetes runs on vSphere, the hypervisor platform also automated placement options, for both control plane and worker nodes. 2 levels of scheduling and resource management are active. Currently no automatic scheduling integration occurs, that is, Kubernetes affinity groups, NUMA, etc.). This session will explain the options to gain better performance, resource optimization and availability through tuning of vSphere, and Kubernetes configuration and labeling0 码力 | 25 页 | 2.22 MB | 1 年前3VMware SIG Deep Dive into Kubernetes Scheduling
placement of pods. This is used to spread pods across availability zones, while still respecting resource access and availability concerns. When Kubernetes runs on vSphere, the hypervisor platform also automated placement options, for both control plane and worker nodes. 2 levels of scheduling and resource management are active. Currently no automatic scheduling integration occurs, that is, Kubernetes affinity groups, NUMA, etc.). This session will explain the options to gain better performance, resource optimization and availability through tuning of vSphere, and Kubernetes configuration and labeling0 码力 | 28 页 | 1.85 MB | 1 年前3Kubernetes & YARN: a hybrid container cloud
Efficient placement of service container and tasks When placed together, don’t affect each other Resource contention ���� ������ ���������� - Online workload low 1:00am – 6:00am - Offline jobs scale VTRON RPC Resource management VTRON: Virtual Total Resources Of Node cgroup �������� ������� Kubernetes YARN Online service usage Offline job resource usage Online service resource quota Offline Offline job resource quota �������� ������� Kubernetes YARN Online service usage Offline job resource usage Online service resource quota Offline job resource quota buffer Over- subscription ��������0 码力 | 42 页 | 25.48 MB | 1 年前3全球架构师峰会2019北京/大数据/Kubernetes 运行大数据工作负载的探索和实践&mdash
Huawei(Now) - Cloud Native batch system (Volcano) development • IBM spectrum computing - Cluster resource and workload scheduling platform development l Gaps for Spark • Agenda l Why Spark on Kubernetes Consolidate online service and offline analysis l Ecosystem( Monitor, logging etc) l Fine grained resource isolation l …… About Spark on Kubernetes l https://github.com/apache-spark-on-k8s/spark l The will add support for dynamic resource allocation, external shuffle service, Kerberos etc. How it works Spark on Kubernetes Spark-operator Gaps for spark Ø Dynamic Resource Allocation Ø Spark external0 码力 | 25 页 | 3.84 MB | 1 年前301. K8s扩展功能解析
extend managed resource into a current Kubernetes cluster • Auto-generated API in Kubernetes API server • Customized resource controller to implement your business logic of managed resource • Natural Natural Kubernetes experience for operating your own resource with Kubernetes RBAC and authentication. • What it comes from • From ThirdPartyResource in Kubernetes 1.6 • Create CRD with spec in Kubernetes CRD and Resource Item my-crontab.yaml © 2017 Rancher Labs, Inc. How Does The Controller Work ETCD API Server Kubernetes Core controllers added creating running stoped deleted Resource Item0 码力 | 12 页 | 1.08 MB | 1 年前3Kubernetes开源书 - 周立
(Dashboard) Dashboard 是⼀个Kubernetes集群通⽤、基于Web的UI。它允许⽤户管理/排错集群中应⽤程序以及集群本身。 Container Resource Monitoring(容器资源监控) Container Resource Monitoring 将容器的通⽤时序指标记录到⼀个中⼼化的数据库中,并提供⼀个UI以便于浏览该数 据。 Cluster-level Logging(集群级别的⽇志) Namespace为Name(名称)提供了范围。在Namespace中,资源的名称必须唯⼀,但不能跨Namespace。 Namespace是⼀种在多种⽤途之间划分集群资源的⽅法(通过 resource quota )。 在未来的Kubernetes版本中,同⼀Namespace中的对象默认有相同的访问控制策略。 没有必要使⽤多个Namespace来分隔稍微不同的资源,例如同⼀软件的不同版本,可使⽤ 如果要明确保留⾮Pod进程的资源,可创建⼀个“placeholder pod(占位Pod)”。使⽤以下模板: apiVersion: v1 kind: Pod metadata: name: resource-reserver spec: containers: - name: sleep-forever image: gcr.io/google_containers/pause:00 码力 | 135 页 | 21.02 MB | 1 年前3KubeCon2020/腾讯会议大规模使用Kubernetes的技术实践
stateful service Ø Advanced scheduling to improve service stability Ø Quota management to optimize resource orchestration efficiency Ø High performance and comprehensive autoscaling What is TKEx Ø Based management. • Support big data and AI jobs. • Optimize the isolation of resources, and improve resource utilization using hybrid deployment of online and offline services. • Support Service Mesh. services Flexible and dynamic resource management Dynamic Scheduler is to solve the problem of unbalanced node load in the cluster. Ø Base on current and history node resource usage. Ø Extend Predicate0 码力 | 19 页 | 10.94 MB | 1 年前3Node Operator: Kubernetes Node Management Made Simple
deployment system can not meet the requirements of resource management. Operator Observe Action Analyze • Observe: watch desired resource and actual resource • Analyze: difference from desired and actual actual config • Action: manage resource to desired config Operator: Advantages • Declarative system • Manage resource to final state continually • kube-apiserver oriented programming • CustomResourceDefinition Biz-Cluster master components state, and manage Biz-Cluster master components through Kubernetes resource, such as Deployment, Pod, etc. Work Together Achievement • Anyone can operate and maintenance0 码力 | 18 页 | 11.70 MB | 1 年前3
共 31 条
- 1
- 2
- 3
- 4