A Day in the Life of a Data Scientist Conquer Machine Learning Lifecycle on KubernetesAutomate repeatable ML experiments with containers • Deploy ML components to Kubernetes with Kubeflow • Scale and test ML experiments with Helm • Manage training jobs and pipelines with Argo • Serve trained models dedicated to the study of building, evolving and operating rapidly-changing resilient systems at scale” (Jez Humble) • Applying Agile practices to operations • Infrastructure as code • Ops teams embracing be a mix of GPU or CPU nodes • Massive Scale • OpenAI dedicates up to 10k cores for a single experiment • Autoscaling capabilities: Pay for what you use, scale down when idle • Parallel training instead0 码力 | 21 页 | 68.69 MB | 1 年前3
Serverless Kubernetes - KubeCon• 系统监控和长期维护 极致弹性 Scale your pods elastically • 直接基于pod扩容,而不是node,不再受限于node数量 • 无需预留计算容量 Pod Pod Pod Pod Pod Pod Pod Pod Pod Pod kubectl scale deployment scale in seconds “unlimited” Secret, ConfigMap • ServiceAccount • Logs, Exec, Attach, Top • Scaling, HPA • Helm 面向Cloud Scale的架构设计 Etcd K8S API Server Viking 侦听 Pod, Service, Ingress等资源变化 ECI SLB DNS 双向同步 IaaS资源状态 使用场景 Use cases • Multimedia processing • IoT sensor messages processing • Stream processing at scale • Chat bots • Batch jobs or scheduled tasks • HTTP REST APIs and web applicated • Mobile backends0 码力 | 16 页 | 4.25 MB | 1 年前3
KubeCon2020/腾讯会议大规模使用Kubernetes的技术实践Kubernetes at the Scale of Tencent Meeting Garnett Wang 王涛 Expert Engineer, Tencent Cloud About Me Garnett Wang, Tencent Cloud • Expert Software Engineer • Technical Director of TKEx Platform , a utilization using hybrid deployment of online and offline services. • Support Service Mesh. • Large-scale and high-performance autoscaling capabilities. • Multi-tenant and quota management. • etc. TKEx Management VWA Controller (Vertical Workload Autoscaler) HPAPlus Controller HNA Controller Auto Scale CronHPA Controller CLB-Service/Ingress-Controller Efficient and reliable container release Ø Why0 码力 | 19 页 | 10.94 MB | 1 年前3
KubeCon2020/大型Kubernetes集群的资源编排优化Resource orchestration optimization of kubernetes cluster in large scale Patrickxie ( 谢谆志) Background Cloud has been the general trend. How to manage so many clusters ,resources and businesses and each HPA is individually configurable HPA2 HPAn CronHPA 07-28 10:00 07-28 11:00 Scale up to 100 Scale down to 3 How do CronHPA work with HPA 07-28 10:00 07-28 11:00 CronHPA takes over the takes over the work If workload still high load and HPA maxReplicas > CronHPA replicas CronHPA scale up How to solve the problem of improper resource requests Node Resource Oversold Node resource0 码力 | 27 页 | 3.91 MB | 1 年前3
QCon北京2018/QCon北京2018-《Kubernetes-+面向未来的开发和部署》-Michael+Chenprovides the tooling to create and run single containers – Very manual, no fault tolerance, hard to scale, etc • Scheduling, provisioning, and resource management of multiple containers – Docker, Mesos – Google/Pivotal/VMware 21 Container scheduling, scale, resiliency, and Day 2 Desired state of Kubernetes Kubernetes cluster scheduling, scale, resiliency, and Day 2 VMware PKS Value Proposition and Storage Iterate & Troubleshoot Issues Trend & Alert on Anomalies Visualize Metrics at Scale Self-Service Metrics Analytics for All Engineering & Business Wavefront By VMware SaaS-Based0 码力 | 42 页 | 10.97 MB | 1 年前3
Node Operator: Kubernetes Node Management Made SimpleAchievement • Q&A Background: DC/OS From Sigma 2.0(Swarm) to Sigma 3.1(Kubernetes) Background: Cluster Scale • Production environment: • Dozens of Cluster • 5k+ Nodes / Cluster • 10k+ Nodes / largest Cluster Kubernetes repo support • Agile, flexible and convenient Node-Operator: Overview • User: SREs who can scale & offline Nodes through posting Machine CRs. • Node-Operator: difference Machine and Node state Kubernetes. • NPD(Node Problem Detector): post Node state to kube- apiserver. Node-Operator: Scale Nodes Node-Operator Node-Operator: Upgrade Nodes Node-Operator Node-Operator: Grayscale Publish0 码力 | 18 页 | 11.70 MB | 1 年前3
Kubernetes & YARN: a hybrid container cloud
������ ���������� - Online workload low 1:00am – 6:00am - Offline jobs scale up while online workload remains idle - Offline jobs scale down while online workload comes back ������ ��� ��� ����� ��� ���������� Co-location GPU FPGA relatime - More resource dimension - Expand Alibaba internal co-location scale (Fuxi & sigma) ����������� ���������������� ����� ������ ������� ��� ������� ���� ��� �� ����������0 码力 | 42 页 | 25.48 MB | 1 年前3
基于 Kubernetes 构建标准可扩展的云原生应用管理平台-孙健波、周正喜Revision Route $ heroku apps $ heroku domains $ heroku releases $ heroku pipeline $ rio run $ rio scale $ rio weight/promote $ rio route $ rio up riofile 抽象程度 vs 可扩展性 • 随着抽象程度的增高可以显著降低学习曲线,但是却不得不在扩展性上妥协 Common Workload Types Manual Scaler K8s Operators Kubernetes + OAM K8s Plugin HPA Deployment scale-to-0 Function Unified Model Layer Platform Capability Pool 统一的模型层 平台统一“能力池” 模块化的交付系统 - GitOps com/zzxwill/try-cloudnative/tree/master/cloudnativeto- presentation-20201029/kubevela - 应用运维 - Route - Scale - Capability management https://github.com/zzxwill/try-cloudnative/tree/master/capabilities - 集成0 码力 | 27 页 | 3.60 MB | 9 月前3
k8s操作手册 2.3--force --grace-period=0 ★scale调整replicas副本数 # kubectl scale deployment deployName --replicas=5 # kubectl scale deployment/deployName --replicas=5 # kubectl scale statefulset/nginx-statefulset set --replicas=3 #scale命令调整的副本数会写入相应的dep/sts配置清单中 ★第8章、Service和Ingress ★创建Service ①ClusterIP类型 # vi mynginx-svc.yml #内容如下 apiVersion: v1 kind: Service #创建service资源 metadata:0 码力 | 126 页 | 4.33 MB | 1 年前3
Kubernetes开源书 - 周立,则 Rollback to an earlier Deployment revision (回滚到之前的Deployment修 订版本)。每次回滚都会更新Deployment的修订版本。 Scale up the Deployment to facilitate more load (扩展Deployment,以便更多的负载) Pause the Deployment (暂停Deploym up replica set nginx-deployment-1564180365 to 3 伸缩Deployment 可使⽤如下命令伸缩Deployment: $ kubectl scale deployment nginx-deployment --replicas=10 deployment "nginx-deployment" scaled 假设您的群集启⽤了 horizontal Running and Ready之后,但在启动web-2之前,web-2将不会启动,直到web-0成功重启并Running and Ready。 如果⽤户通过patch StatefulSet来scale部署的示例,例如设置 replicas=1 ,则web-2将⾸先被终⽌。在web-2完全关闭 和删除之前,web-1不会被终⽌。如果web-0失败发⽣在web-2终⽌并完全关闭之后、web-1终⽌之前,web-1将不会终0 码力 | 135 页 | 21.02 MB | 1 年前3
共 27 条
- 1
- 2
- 3













