CurveBS IO Processing Flow
overall architecture, data organization and topology structure of CURVE. CurveBS uses the central node Metadata Server (MDS) to manage virtual disk mapping and data replicas distribution. persists the data in ETCD l Collect cluster status and schedule. 2. Chunkserverl Responsible for data storage l Multi-replicas consistency 3. The client l Provides read and write data interfaces metadata l Interacts with the chunkServer to read and write data 4. Snapshotcloneserver l Independent of core services l Snapshot data is stored in object storage which supports S3 apis. Therefore0 码力 | 13 页 | 2.03 MB | 5 月前3Estimation of Availability and Reliability in CurveBS
protocol to maintain consistency of stored data. It generally takes the form of 3 replicas of data. If one replica fails, the system can read and write data successfully on the other two replicas. When Assume that the total number of disks in Curve system is N, the number of replicas is R, and the data recovery Time in the case of failed disks is T. The Annual Failure Rate of disks is AFR, and the the data on one disk is distributed in about 50 copysets, which means when a disk fails, up to 50 other disks will restore the data on that disk at the same time. According to this, the data recovery0 码力 | 2 页 | 34.51 KB | 5 月前3Curve for CNCF Main
high performance cloud native file systemUse Cases • Container • Database • Data apps(middleware/bigdata/ai) • Data backupContainer • Aggregates underlying storage in the cloud (AWS EBS, AWS S3 container-native storageDatabase • Database services orchestrated in the cloud • Curve can backup / sync data to slave cloud • When master cloud failure happens, Database service can move to the slave access data by POSIX interface • Infrequent data is moved to OSS, and frequent data is moved to high speed storage transparentlyData backup • Curve (CurveBS, CurveFS) can backup data to remote0 码力 | 21 页 | 4.56 MB | 5 月前3OID CND Asia Slide: CurveFS
Problems for stateful apps ○ storage capacity expansion ○ capacity imbalance ○ apps bundled with data locations ● Requirements for elastic block storage ● Requirements for file systemopen-source storage cluster with 1200 disks (each have MTBF 1.2 million hours) 3 replicas Restore data on a disk within 5 minutes Data availability of 6 nines can be achievedCloud native Support Currently we offer algorithm ● Raft Consistency protocol High performance ● pre-created file pool ● data strip like RAID ● Zero data copy ● RDMA Cloud NativeCluster topology The physical pool is used to physically0 码力 | 24 页 | 3.47 MB | 5 月前3Curve支持S3 数据缓存方案
const char *data) : s3ClientAdaptor_(s3ClientAdaptor), chunkCacheManager_(chunkCacheManager), chunkPos_(chunkPos), len_(len) { data_ = new char[len]; memcpy(data_, data, len); } virtual ~DataCache() { delete data_; data_ = NULL; } void Write(uint32_t cachePos, uint32_t len, const char* data); CURVEFS_ERROR Flush(); private: S3ClientAdaptor chunkCacheManager_; uint64_t chunkId; uint32_t chunkPos_;© XXX Page 8 of 9 uint32_t len_; char* data_; }; 详细设计 Write流程 1.加锁,根据inode和fsid找到对应的fileCacheManager,如果没有则生成新的fileCacheManager,解锁,调用fileC0 码力 | 9 页 | 179.72 KB | 5 月前3Curve Detail Introduction for CNCF
system https://www.opencurve.io/Agenda • CurveBS Architecture • CurveBS Topology • CurveBS Data Organization • MetaData Server (MDS) • ChunkServer • Client • CurveBS IO processing flow CurveFS Data Organization • CurveFS file Organization • CurveFS MetaServer • CurveFS Client • CurveFS MKNode Flow • CurveFS Write to S3 FlowCurveBS ArchitectureCurveBS TopologyCurveBS Data OrganizationMetaData resource changes and to collect the system’s runtime status to operatorCurveFS ArchitectureCurveFS Data OrganizationCurveFS File OrganizationCurveFS Metadata ServerCurveFS ClientCurveFS Mknod FlowCurveFS0 码力 | 23 页 | 6.53 MB | 5 月前3Curve核心组件之snapshotclone
• 6.mds调用chunkserver接口,删除内部快照 数据 快照流程: chunk chunk chunk chunkserver meta object data object data object S3 Snap Task etcd mds client 3.获取快照元数据 datastore metastore http service clone 将卷从临时卷rename为克隆目标卷名。 • 9. 更新克隆卷状态为Cloned。 克隆流程: chunk chunk chunk chunkserver meta object data object data object S3 Snap Task etcd MDS client 2.创建克隆卷 3.分配卷空间 7.拷贝数据 datastore metastore http0 码力 | 23 页 | 1.32 MB | 5 月前3CurveFS Copyset与FS对应关系
的元数据的分片策略。 通过分析chubaofs的源代码。chubaofs的用volume管理一个文件系统,每个volume有若干meta partition和data partition。meta partition管理的元数据,data partition管理数据。meta partition管理inode和dentry信息。 创建一个文件系统时,如何初始化meta partition? master\cluster master\cluster.go, chubaofs的文件系统使用volume的来表示,在创建一个文件系统的时候,会创建3个meta partition和10个data partition。chubaofs的data partition的功能我们使用curve块设备替换。meta partition的创建,以及meta partition的管理的,下面会详细分析一下。 2.1、meta partition的创建0 码力 | 19 页 | 383.29 KB | 5 月前3CurveFs 用户权限系统调研
/**© XXX Page 10 of 33 * Initialize filesystem * * The return value will passed in the private_data field of * fuse_context to all file operations and as a parameter to the * destroy() method. kernel to preemptively fill its caches * when it anticipates that userspace will soon read more * data.© XXX Page 12 of 33 * * Asynchronous direct I/O requests are generated if * FUSE_CAP_ASYNC_DIO /dev/sdk dumpe2fs 1.43.4 (31-Jan-2017) Filesystem volume name:Last mounted on: /data/chunkserver16 Filesystem UUID: 5ba783e9-44bd-49ce-b8bc-b7ba0ef33531 Filesystem magic number: 0 码力 | 33 页 | 732.13 KB | 5 月前3NJSD eBPF 技术文档 - 0924版本
进程共享内存通信延迟10us+ • others 开销 10us+ • fuse_ll_ops开销10us-基于FUSE的优化框架 • 框架优化的要点 • 共享inode cache • 共享data cache的映射 • GETATTR流程 • ⽂件读取流程 • 相关⼯作 • extFUSE • google android12 passthrough什么是eBPF • name) • value inode id • inode map BPF_MAP_TYPE_HASH • key inode id • value fuse_attr (⽂件属性)基于data cache部分 • bpf程序类型 BPF_PROG_TYPE_EXTFUSE • Hook点及⽅法 • fuse_file_read_iter, fuse_file_write_iter0 码力 | 20 页 | 7.40 MB | 5 月前3
共 19 条
- 1
- 2