复杂环境下的视觉同时定位与地图构建the tracking success ratio after initialization. Group A: simple translation Group B: there are loops Group C: slow and nearly pure rotation Group D: fast motion with strong rotation 时间统计 • 台式机上的计算时间 Personal Homepage: http:www.cad.zju.edu.cn/home/gfzhang: Email: zhangguofeng@cad.zju.edu.cn ZJUCVG Group Website: http:www.zjucvg.net:0 码力 | 60 页 | 4.61 MB | 1 年前3
从推荐模型的基础特点看大规模推荐类深度学习系统的设计 袁镱基于GPU的多级存储训练:更⾼的性价⽐ � 推荐模型GPU训练的挑战 � 显存(A100最⼤80GB)放不下TB级的模型 � GPU多线程并⾏计算能⼒对稀疏数据不友好 � ⽅案 � 原有:内存能够存储的参数->对应的样本量Group � 新增:显存能够存储的参数->对应的样本量Pass � 新增:GPU并⾏操作友好->CSR格式的显存数据访问 SSD磁盘 10TB 全部参数 内存 1TB 即将⽤到的参数 显存 问题:TB模型实时多地传输和加载成本⾼ � ⽅案:⾼低频分别上线 � 更灵活的⽤法:模型多切⽚,按需上线 � Dssm � wdl ... 分布式Serving集群 副本1 副本2 Group 1 Group N 副本1 副本2 推理节点 SDK MB级别DNN部分 Sparse Hotkey TB级别Embedding部分 全量模型,TB级,低峰期(Cos存储) 增量模型,GB级,20分钟(Cos存储) embedding values 痛点: 1. 更少的values: 变⻓Embedding 特征出现次数少,⽤1个float 结合show/click,有效果提升 2. 更少的key: group lasso key级别的稀疏化 3. 更短的values a) 混合精度: float16+int8+int4 b) 量化压缩,1bit或2bit 优点:与优化器⽆关 缺点:10 码力 | 22 页 | 6.76 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 4 - Efficient Architecturespicture of a snake or a grizzly bear might trigger caution or fear. In a way, we subconsciously group these animals in our head. We don’t necessarily know everything about a dog and cat, but we know that these blocks via a recurrence cell. They fall under the Recurrence group. The efficient transformers under the Memory/Downsampling group use additional parameters to act as a memory. This memory is used during the training process. The transformers that use sparse attention are grouped under the Sparse group. After input sequence and the attention parameters, the next component to attack is the softmax computation0 码力 | 53 页 | 3.92 MB | 1 年前3
阿里云上深度学习建模实践-程孟力AutoFeature 特征组合 • Count select count(1) group by col • GroupByThenMax/Min/Avg/Sum select max(col2) group by col1 • CrossCount[2] select count (1) group by col1,col2 特征组合 + 特征选择 特征选择0 码力 | 40 页 | 8.51 MB | 1 年前3
AI大模型千问 qwen 中文文档Qwen Qwen is the large language model and large multimodal model series of the Qwen Team, Alibaba Group. Now the large language models have been upgraded to Qwen1.5. Both language models and multimodal "your_model_path" quant_path = "your_quantized_model_path" quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM �→" } # Load your tokenizer and model with AutoAWQ tokenizer "your_model_path" quant_path = "your_quantized_model_path" quantize_config = BaseQuantizeConfig( bits=8, # 4 or 8 group_size=128, damp_percent=0.01, desc_act=False, # set to False can significantly speed up inference but0 码力 | 56 页 | 835.78 KB | 1 年前3
Lecture 7: K-Meanslearning problem Given: N unlabeled examples {x1, · · · , xN}; no. of desired partitions K Goal: Group the examples into K “homogeneous” partitions Loosely speaking, it is classification without ground looks at similarities, no labels are given Without labels, similarity can be hard to define Goal: Group the examples into K “homogeneous” partitions Thus using the right distance/similarity is very important0 码力 | 46 页 | 9.78 MB | 1 年前3
PyTorch Release Notesexperimental UCC process group for the distributed backend. Users can experiment with it by creating UCC as the default process group via: torch.distributed.init_process_group(backend="ucc", kwargs) or or a side process group with any default via: torch.distributed.init_process_group(backend=any_backend, default_pg_kwargs) ucc_pg = torch.distributed.new_group(backend="ucc", ucc_pg_kwargs) Announcements HDMI Licensing LLC. OpenCL OpenCL is a trademark of Apple Inc. used under license to the Khronos Group Inc. NVIDIA Corporation | 2788 San Tomas Expressway, Santa Clara, CA 95051 https://www.nvidia.com0 码力 | 365 页 | 2.94 MB | 1 年前3
深度学习与PyTorch入门实战 - 40. Batch NormNormalization ▪ Batch Normalization Batch Norm https://medium.com/syncedreview/facebook-ai-proposes-group-normalization- alternative-to-batch-normalization-fb0699bffae7 Pipeline nn.BatchNorm2d Class variables0 码力 | 16 页 | 1.29 MB | 1 年前3
Lecture Notes on Linear Regressioneach step is cheaper. One variants of SGD is so- called mini-batch SGD, where we pick up a small group of training data and do average to accelerate and smoothen the convergence. For example, by randomly0 码力 | 6 页 | 455.98 KB | 1 年前3
超大规模深度学习在美团的应用-余建平NN网络矩阵按行切分,解决请求包不均衡问题 特征按照Hash方式分布式存储 • 模型并行调超参 grid search random search PS的多模型训练 • 提高内存使用效率 model group内共享特征key的存储 • 超大规模模型 -> 高扇出的分布式PS • 长尾效应:单个分片的抖动(网络、CPU)对请求影响变大 单分片4个9的可用性 16分片整体可用性:99.99%0 码力 | 41 页 | 5.96 MB | 1 年前3
共 16 条
- 1
- 2













