A Day in the Life of a Data Scientist Conquer Machine Learning Lifecycle on Kubernetes
A Day in the Life of a Data Scientist Conquer Machine Learning Lifecycle on Kubernetes Brian Redmond • Cloud Architect @ Microsoft (18 years) • Azure Global Black Belt Team • Live in Pittsburgh, PA code • Ops teams embracing source control (git) • Automated testing • Repeatable/consistent • CI/CD • This has worked well for App Dev. Now time for AI/ML • But, must ensure data scientist are not hindered Scalable • Easy to explore hyper-parameters space • Easy to do distributed training But really, Data Scientists shouldn’t have to care about containers, kubernetes and all that stuff • Pachyderm can0 码力 | 21 页 | 68.69 MB | 1 年前3Advancing the Tactical Edge with K3s and SUSE RGS
hort of organizations working in association with the U.S. Department of Defense to drive open source innovation into strategic de- fense initiatives. The company is delivering technology solutions locations with the use of groundbreaking technologies, to enable decision-making at the point of data collection. Fast, insight-driven decision-making in highly dynamic and dangerous conditions is Allen’s innova- tive edge computing solution, SmartEdge, addresses the increasing need to gather data in real time and perform analysis at the point of collection, supplying imme- diate insight which0 码力 | 8 页 | 888.26 KB | 1 年前3绕过conntrack,使用eBPF增强 IPVS优化K8s网络性能
Iptables is widely adopted in popular Linux distributions • Cons • O(N^2) in control plane / O(N) in data plane • Poor in scheduling algorithm • Iptables rules are difficult to debug IPVS mode • Services organized in hash table • IPVS DNAT • conntrack/iptables SNAT • Pros • O(1) time complexity in control/data plane • Stably runs for two decades • Support rich scheduling algorithm • Cons • Performance Post-route Iptables snat Conntrack Post-route Pre-route IPVS entry BPF SNAT IPVS mode data path IPVS-eBPF mode data path How eBPF does SNAT • Why does SNAT with eBPF • eBPF program is easy to deploy0 码力 | 24 页 | 1.90 MB | 1 年前3在大规模Kubernetes集群上实现高SLO的方法
Processing Base on the failure reason Unhealth node is healed or removed. Reason classification: Source Feature Example System Failure caused by cluster itself RuntimeError, ImageFailed, Unscheduled, FailedPostStartHook, Unhealthy… Trace system Increase of SLO Data Collect Audit log Event The unhealthy node Monitoring Isolation Recover Degrade Data Analysis Failures/Machine Failures/Reason Report User Storage Analysis Platform Trace Report Weakness The trace system Data Collect: Collect Audit log for the whole cluster. Data analysis: Analyze failure reason if pod is failed. Reason analysis: Analyze0 码力 | 11 页 | 4.01 MB | 1 年前3k8s操作手册 2.3
firewall-cmd --add-rich-rule='rule family="ipv4" source address="10.99.1.0/24" accept' # firewall-cmd --add-rich-rule='rule family="ipv4" source address="10.244.0.0/16" accept' # firewall-cmd --run�me-to-permanent docker # docker info ★配置docker服务使用systemd去管理(以及信任本地镜像仓库) # vi /etc/docker/daemon.json { "data-root": "/docker_data", "registry-mirrors": [ "h�ps://cof-lee.com:5443" ], "insecure-registries": [ "cof-lee /etc/docker/daemon.json <data-root": "/docker_data", "registry-mirrors": [ "h�ps://cof-lee.com:5443" ], "insecure-registries": [ "cof-lee 0 码力 | 126 页 | 4.33 MB | 1 年前3Amazon Elastic Kubernetes Service (EKS) 初探秘
(mitigation: Firecracker) • gotchas: unnecessary privileged users, no scans, trust • code analysis • source available? • gotchas: big surface, many languages { } } • sanitizing user input • static code sensitive config (passwords, API keys, etc.) • gotchas: commits-to-source, non-separated access (dev has cleartext password) { • business core data • Personal Identifiable Information (PII) • gotchas: leaks leaks, GDPR (in Europe) { host container dependencies code config user data © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential 云端安全工具 Amazon Inspector0 码力 | 39 页 | 1.83 MB | 1 年前3秘钥管理秘钥Turtles all the way down - Securely managing Kubernetes Secrets
intensive cryptanalytic attacks ● A cryptoperiod is the time during which a key is used to encrypt data Key rotation: cryptoperiod There are lots of factors that influence the choice of cryptoperiod Strength of cryptographic algorithms used ○ Implementation ○ Operating environment ○ Volume of data ○ Re-keying method ○ Number of key copies ○ Personnel turnover ○ Threat model ○ New and disruptive cardholder data against disclosure and misuse. 3.6 Fully document and implement all key-management processes and procedures for cryptographic keys used for encryption of cardholder data, including0 码力 | 52 页 | 2.84 MB | 1 年前3Kubernetes Native DevOps Practice
scale • Reduce the learning curve for customer and ourselves • Get consistent user experience and data, leverage with PaaS capability • Facilitate our PaaS and micro-service product Kubernetes Capabilities/Advantages agent to collecting log data ElasticSearch ElasticSearch Monitor/Alert Service CronJob Node Pod Node Pod Unified logging、monitoring、alert with PaaS Consistent data Node group of build nodes configuration and history in MySQL • Logging in central logging service - ElasticSearch • Metric data in monitoring system - prometheus • Alertmanager to invoke various alert and related actions docker0 码力 | 21 页 | 6.39 MB | 1 年前3sealos 以 kubernetes 为内核的云操作系统
manger manger metadata data data metadata metadata Data and metadata store in different devices File data/metadata Heartbeat Heartbeat metabata Heartbeat metabata data 存储 openebs localPV Why: IO,对接 sealfs 分布式文件系统,避免 fuse 用户态内核态反复横跳 在 Sealos 上使用 GPU 在 Sealos 上利用 Cilium + BPF 实现流量统计 Slide source credit to: How to Make Linux Microservice-Aware with Cilium and eBPF (InfoQ, 2019) 集群生命周期管理 创建集群0 码力 | 29 页 | 7.64 MB | 9 月前3基于 Kubernetes 构建标准可扩展的云原生应用管理平台-孙健波、周正喜
缺乏交互、复用、可移植能 力。不同重复造轮子只是适 配不同 API 如何基于 K8s ,构建出一个既用户友好,又高可扩展,还 统一、标准化的应用管理平台? 简单的“客户端”抽象: DCL (Data Configuration Language) 对 K8s 资源进行抽象实际上就是在操纵 YAML 数据,通过 DCL 来完成相比于 CRD + controller 更简单 CUE • 功能强大:专注于操纵数据,而不是写 Unified Model Layer Platform Capability Pool 统一的模型层 平台统一“能力池” 模块化的交付系统 - GitOps “应用”配置 Git (as source of truth) 持续集成 ● Build ● Run Unit Tests ● Build Docker Image ● Push Docker Image Image Registry code 三者结合呢? • 基于 CUE 的客户端抽象 • 基于 OAM 的应用模型 • 围绕 GitOps 的持续交付 = “以应用为中心”的 K8s KubeVela Git (as source of truth) 持续集成 ● Build ● Run Unit Tests ● Build Docker Image ● Push Docker Image Image Registry0 码力 | 27 页 | 3.60 MB | 9 月前3
共 44 条
- 1
- 2
- 3
- 4
- 5