《Efficient Deep Learning Book》[EDL] Chapter 2 - Compression Techniquesis a boiler-plate code. You can refer to the TFLite guide for more details. We start the model conversion by creating a converter object using the from_keras_model() method of TFLiteConverter. A call to Moreover, the quantized model was 4X smaller than the original model. Deep learning is an exciting and fast growing field which is fortunate to enjoy a large community of researchers, developers and entrepreneurs0 码力 | 33 页 | 1.96 MB | 1 年前3
《TensorFlow 2项目进阶实战》4-商品检测篇:使用RetinaNet瞄准你的货架商品两阶段检测器(Two-stage Detectors) •R-CNN •Fast R-CNN •Faster R-CNN •R-FCN 一阶段检测器(One-stage Detectors) •YOLO v1 •YOLO v2 •YOLO v3 理论:R-CNN系列二阶段模型综述 R-CNN 开启CNN+目标检测的大门 R-CNN Fast R-CNN Fast R-CNN Faster R-CNN 理论:YOLO系列一阶段模型概述0 码力 | 67 页 | 21.59 MB | 1 年前3
机器学习课程-温州大学-09深度学习-目标检测此类算法将检测问题分为两个阶段, 第一阶段生成大量可能含有目标的候选区域(Region Proposal),并附 加大概的位置信息; 第二个阶段对其进行分类,选出包含目标的候选区域并对其位置进行 修正(常使用R-CNN、Fast R-CNN、Faster R-CNN等算法)。 13 1.目标检测概述 3.基于深度学习的One Stage目标检测框架 (速度有优势) 此类检测算法属于端到端(End-to-End),不需要生成大量候选区域 RCNN算法 39 4.Faster RCNN算法 Region Proposal Networks RPN网络的作用: RPN专门用来提取候选框,一方面RPN耗时少, 另一方面RPN可以很容易结合到Fast RCNN中,成为一个整体。 40 4.Faster RCNN算法 Faster RCNN训练步骤 • 第一步,训练RPN,该网络用ImageNet预训练的模型初始化,并端到端微调,用于生成region 成region proposal; • 第二步,训练Faster RCNN,由imageNet model初始化,利用第一步的RPN生成的region proposals作为输入数据,训练Fast R-CNN一个单独的检测网络,这时候两个网络还没有共享卷 积层; • 第三步,调优RPN,用第二步的Faster RCNN model初始化RPN再次进行训练,但固定共享的卷 积层,并且只微0 码力 | 43 页 | 4.12 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 1 - Introductionprimarily deals with questions that someone deploying a model would ask. Is the model small, is it fast, etc.? More concretely, how many parameters does the model have, what is the disk size, RAM consumption with 8-bit unsigned int weights, and having integration with libraries like GEMMLOWP and XNNPACK for fast inference. Similarly, PyTorch uses QNNPACK to support quantized operations. Refer to Figure 1-17 in IoT and edge devices, both Google and NVidia have come up with accelerators that can be used for fast inference on-devices. The EdgeTPU (see Figure 1-18 for reference) , like the TPU, specialized in accelerating0 码力 | 21 页 | 3.17 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 6 - Advanced Learning Techniques - Technical Reviewwe will use the BERT-Small and BERT-Base variants. BERT_ENCODERS = { # Recommended, because it is fast and has same interface as base BERT 'bert-small': "https://tfhub.dev/tensorflow/small_bert/bert def shuffle_weights(model, weights=None): """Shuffle the weights in the given model. This is a fast approximation of re-initializing the model weights. """ if weights is None: weights = model.get_weights() Sequence Length Warmup for Training GPT Models." arXiv, 13 Aug. 2021, doi:10.48550/arXiv.2108.06084. 20 Fast.AI Course: https://github.com/fastai/fastbook/blob/780b76bef3127ce5b64f8230fce60e915a7e0735/07_sizing_and0 码力 | 31 页 | 4.03 MB | 1 年前3
动手学深度学习 v2.0R‐CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600 13.8.2 Fast R‐CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601 13.8.3 CNN features, R‐CNN)(Girshick et al., 2014)也是将深度模型应用于目标检测的开创性工作之一。本节将介绍R‐CNN及其一 系列改进方法:快速的R‐CNN(Fast R‐CNN)(Girshick, 2015)、更快的R‐CNN(Faster R‐CNN)(Ren et al., 2015)和掩码R‐CNN(Mask R‐CNN)(He et al., 2 8.2 Fast R-CNN R‐CNN的主要性能瓶颈在于,对每个提议区域,卷积神经网络的前向传播是独立的,而没有共享计算。由于 这些区域通常有重叠,独立的特征抽取会导致重复的计算。Fast R-CNN (Girshick, 2015)对R‐CNN的主要改进 之一,是仅在整张图象上执行卷积神经网络的前向传播。 图13.8.2: Fast R‐CNN模型 图13.8.2中描述了Fast R‐CNN模型。它的主要计算如下:0 码力 | 797 页 | 29.45 MB | 1 年前3
Machine Learning∂L ∂a[L] j σ′(z[L] j ) • ∂L/∂a[L] j measures how fast the loss is changing as a function of the j-th output activation • σ′(z[L] j ) measures how fast the activation function σ is changing at zL j •0 码力 | 19 页 | 944.40 KB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 5 - Advanced Compression Techniquessignificant drop in accuracy in the next section. 4 Elsen, E., Dukhan, M., Gale, T., & Simonyan, K. (2019). Fast Sparse ConvNets. arXiv, 1911.09723. Retrieved from https://arxiv.org/abs/1911.09723v1 3 https://github illustration of this problem. 13 Kurtz, Mark, et al. "Inducing and exploiting activation sparsity for fast inference on deep neural networks." International Conference on Machine Learning. PMLR, 2020. 120 码力 | 34 页 | 3.18 MB | 1 年前3
AI大模型千问 qwen 中文文档cache_dir=training_args.cache_dir, model_max_length=training_args.model_max_length, padding_side="right", use_fast=False, ) if training_args.use_lora: lora_config = LoraConfig( r=lora_args.lora_r, lora_alpha=lora_args $DISTRIBUTED_ARGS src/train_bash.py \ --deepspeed $DS_CONFIG_PATH \ --stage sft \ --do_train \ --use_fast_tokenizer \ --flash_attn \ --model_name_or_path $MODEL_PATH \ --dataset your_dataset \ --template0 码力 | 56 页 | 835.78 KB | 1 年前3
阿里云上深度学习建模实践-程孟力工程优化: 千亿特征优化 模型蒸馏 AVX/SSE优化 Graph优化 [User Graph去重] 内存Allocate优化 ParallelStringOp [split/type conversion] Sequence Feature [side info] Op Fusion [hash + embedding] Overlap Execution [FG OP化] Item0 码力 | 40 页 | 8.51 MB | 1 年前3
共 23 条
- 1
- 2
- 3













