0 码力 |
10 页 |
1.24 MB
| 1 年前 3
Go toolchain internals and implementation based on arm64
Wei Xiao (肖玮)
Arm Staff Software Engineer
Wei.Xiao@arm.com
’ alt=‘OCR图片’/>
Go toolchain overview
A toolchain is a package composed of the compiler infrastructure.
’ alt=‘OCR图片’/>
Go toolchain example
$go build -x helloworld.go
/golang/pkg/tool/linux_arm64/compile -o $WORK/b001/pkg.a -trimpath $WORK/b001 -p main -complete -buildid Lz0Z4IaaV-BMteKblcuy $WORK/b001/importcfg -pack -c=4 ./helloworld.go
/golang/pkg/tool/linux_arm64/buildid -w $WORK/b001/pkg.a # internal
/golang/pkg/tool/linux_arm64/link -o $WORK/b001/exe/a.out -importcfg $WORK/b001/importcfg.link 0 码力 |
22 页 |
2.19 MB
| 1 月前 3 Manual for Arm-based Computers
Version 1.0, January 2023
www.moxa.com/products
© 2023 Moxa Inc. All rights reserved.
MOXA $ ^{®} $
# Moxa Industrial Linux 3.0 (Debian 11) Manual for Arm-based Computers 6
Eligible Computing Platforms ..... 7
2. Getting Started ..... 8
Connecting to the Arm-based Computer ..... 8
Connecting through the Serial Console ..... 8
Connecting via the .... 20
Login Policy ..... 23
Clearing the TPM Module ..... 24
Localizing Your Arm-based Computer ..... 24
Adjusting the Time ..... 24
NTP Time Synchronization ..... 25 0 码力 |
111 页 |
2.94 MB
| 2 年前 3 ## Python for Good >>> PyCon China 2022
ARM 芯片的 Python + AI 算力优化
主讲人:朱宏林-阿里云程序语言与编译器团队
Python

HELLO WORLD 程序。过去这些程序总跑在 GPU 或者 x86 架构的 CPU 上。然而综合考虑到功耗、成本、性能等因素,云厂商们开始建设 ARM 架构的服务平台,如何整合 Python + AI 的相关软件并使其在该平台上发挥最高的性能成为了工程师们关注的焦点。
- 矩阵乘法是深度学习计算的重要组成部分,我们利用 ARM 架构新提供的矩阵扩展对 bf16 类型的矩阵乘法计算进行优化,该优化将纯矩阵乘法的运算速度提升 3 倍以上 OpenBLAS 和 PyTorch 中。
- 本次演讲,将向大家介绍我们在倚天 710 ARM 芯片上开展的 Python + AI 优化工作,以及在 ARM 云平台上部署 Python + AI 任务的最佳实践。
## 深度学习
• 广泛使用的深度学习框架
• TensorFlow、PyTorch
• 结合硬件(ARM 服务端芯片)
• 倚天 710
• AWS graviton
• 矩阵乘法 0 码力 |
24 页 |
4.00 MB
| 2 年前 3 [Image](/uploads/documents/4/f/5/8/4f5831fc6a31121411d9dc2cb0142e51/p1_1.jpg)
## Bringing together the Arm ecosystem

## Linaro best-in-class Deep Learning performance by leveraging Neural Network acceleration in IP and SoCs from the Arm ecosystem, through collaborative seamless integration with the ecosystem of AI/ML software frameworks and libraries

## Arm NN open source project
• Linaro-hosted https://www.mlplatform.org/
• Git and review servers
• Forums 0 码力 |
7 页 |
1.23 MB
| 1 年前 3 https://gitee.com/tinylab/riscv-lab
## • ARM Lab
– Learning embedded ARM software development, merged in Linux Lab Disk for ARM
– https://gitee.com/tinylab/arm-lab
### 1.3 Demonstration
#### 1.3.1 Free components have been prebuilt|
|Rootfs|Support include initrd, harddisk, mmc and nfs, Debian availab for ARM|
|Docker|Cross toolchains from gcc-4.3 available in one command, external ones configurable|
|Access|Accessible make list
[ aarch64/raspi3 ]:
ARCH = arm64
CPU ?= cortex-a53
LIUNX ?= v5.1
ROOTDEV_LIST := /dev/mmcblk0 /dev/ram0
ROOTDEV ?= /dev/mmcblk0
[ aarch64/virt ]:
ARCH = arm64
CPU ?= cortex-a57
LIUNX ?= v5.1
ROOTDEV_LIST 0 码力 |
66 页 |
1.12 MB
| 2 年前 3 – https://gitee.com/tinylab/riscv-lab
## • ARM Lab
– Learning embedded ARM software development, merged in Linux Lab Disk
– https://gitee.com/tinylab/arm-lab
### 1.3 Demonstration
#### 1.3.1 Free Video components have been rebuilt|
|Rootfs|Support includes initrd, harddisk, MMC and NFS, Debian available for ARM|
|Docker|Cross-toolchains from GCC-4.3 available in one command, external ones configurable|
|Access|Accessible make list
[ aarch64/raspi3 ]:
ARCH = arm64
CPU ?= cortex-a53
LIUNX ?= v5.1
ROOTDEV_LIST := /dev/mmcblk0 /dev/ram0
ROOTDEV ?= /dev/mmcblk0
[ aarch64/virt ]:
ARCH = arm64
CPU ?= cortex-a57
LIUNX ?= v5.1
ROOTDEV_LIST 0 码力 |
65 页 |
1.12 MB
| 2 年前 3 ## Announcing
In Visual Studio 2022 version 17.4,
## Native ARM64 Toolchain
• Develop for ARM64 on ARM64 with no emulation
• Includes ARM64 versions of Ninja and CMake
- Available with the C++ Desktop https://aka.ms/ARM64-native for more details
## Announcing
In Visual Studio 2022 version 17.7,
## New MSVC Backend Optimizations
• Host of new backend improvements
• Both machine-independent and ARM64-specific ARM64-specific
• ARM64 improvements cover both scalar and vector (NEON) instructions
void absolute_difference(
int * __restrict a, int * __restrict b,
int * __restrict c, int n) {
for (int i = 0; i < n; i++) 0 码力 |
55 页 |
3.27 MB
| 1 年前 3 TVM@AliOS
## PRESENTATION AGENDA
☑ TVM @ AliOS Overview
TVM @ AliOS ARM CPU
TVM @ AliOS Hexagon DSP
TVM @ AliOS Intel GPU
☑ Misc
## PART ONE TVM @ AliOS Overview
## AliOS Overview
• AliOS (www.alios AliOS | 驱动万物智能
## PART TWO AliOS TVM @ ARM CPU
## AliOS TVM@ARM CPU
• Support TFLite (Open Source and Upstream Master)
• Optimize on INT8 & FP32
## AliOS TVM @ ARM CPU INT8
Convolution
• NHWC layout ## AliOS TVM @ ARM CPU INT8
TVM / QNNPACK Speed Up @ Mobilenet V2 @ rasp 3b+ AARCH64

## AliOS TVM @ ARM CPU INT8
Depthwise 0 码力 |
27 页 |
4.86 MB
| 1 年前 3 == 0) // 使用"TEST AX,AX",避免"CMP $0,AX"
a > 0 ? b : c // 使用CMOV,避免分支跳转
if // 使用JZ/JC/JNZ等条件跳转指令
后端:ARM相关优化
$$
x = y + z * 8
$$
$$
x = y - z * 8
$$
// 加/与/或运算指令,其中一个操作数可以带移位,单周期完成
$$
x = z * 8 - y
$$
SUB指令,减数可以带移位,单周期完成 simplification for MUL-SUB.
The CL implements the optimization with MADD/MSUB.
The total size of pkg/android_arm64/ decreases about 20KB, excluding cmd/compile/.
The go1 benchmark shows a little improvement for author=benshi001
我对Go编译器的优化
The FP load/store on arm64 have register indexed forms. And this CL implements this optimization.
The total size of pkg/android_arm64 (excluding cmd/compile) decreases about 400 0 码力 |
36 页 |
1.63 MB
| 1 月前 3
|