-
rights reserved.
Presenter: Haichen Shen, Yao Wang
Amazon SageMaker Neo, Deep Engine Science
Dynamic Model in TVM
AWS AI© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Models with models© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Support dynamic model in TVM
●
Support Any-dim in typing
●
Use shape function to compute the type at runtime
●
Virtual input_name = "data"
input_shape = [tvm.relay.Any(), 3, 224, 224]
dtype = "float32"
block = get_model('resnet50_v1', pretrained=True)
mod, params = relay.frontend.from_mxnet(block, shape={input_name:
0 码力 |
24 页 |
417.46 KB
| 5 月前 3
-
0 码力 |
127 页 |
2.06 MB
| 5 月前 3
-
Memory Model
C++11 – C++23About Me:
alex.dathskovsky@speedata.io
www.linkedin.com/in/alexdathskovsky
https://www.cppnext.comAlex Dathskovsky | alex.dathskovsky@speedata.io | www.linkedin.com/in/a
0 码力 |
112 页 |
5.17 MB
| 5 月前 3
-
perspective:
> I understand C++, and I kinda get assembly because of compiler explorer.
Our typical model of AoT is “what C and C++ do”, and I want to expand the understanding for what other computation models native execution of the workload.
Can customize its generated code to include a processor cache model
which allows it to compute the cache misses and memory stall time of
a workload, at slowdowns of native execution of the workload.
Can customize its generated code to include a processor cache model
which allows it to compute the cache misses and memory stall time of
a workload, at slowdowns of
0 码力 |
111 页 |
3.98 MB
| 5 月前 3
-
Change Happening Faster Than Ever?
Yes, It Is
•
AI User + Usage + CapEx Growth =
Unprecedented
•
AI Model Compute Costs High / Rising + Inference Costs Per Token Falling =
Performance Converging + Developer 2/24
2/25
4/25
75%
60%
10%
21%
15%
0%
Details on
Page 293
USA – LLM #1
China
USA – LLM #2
AI Model Compute Costs High / Rising +
Inference Costs Per Token Falling =
Performance Converging + Developer Change Happening Faster Than Ever?
Yes, It Is
•
AI User + Usage + CapEx Growth =
Unprecedented
•
AI Model Compute Costs High / Rising + Inference Costs Per Token Falling =
Performance Converging + Developer
0 码力 |
340 页 |
12.14 MB
| 4 月前 3
-
· · · · · · · · · · · · · · · · · · · · · · · · · · · · 3375 14.3.15 TiFlash Pipeline Execution Model· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 3376 14.4 TiDB Distributed eXecution storage architecture, enabling easy scaling of computing or storage capacity separately. The computing layer supports a maximum of 512 nodes, each node sup- ports a maximum of 1,000 concurrencies, and the maximum 2)) to a �→ = 1 AND b = 2 #56005 @ghazalfamilyusa • Increase the cost of table scans in the cost model for scenarios with a high risk of suboptimal execution plans, making the optimizer prefer indexes
0 码力 |
6730 页 |
111.36 MB
| 9 月前 3
-
· · · · · · · · · · · · · · · · · · · · · · · · · · · · 3359 14.3.15 TiFlash Pipeline Execution Model· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 3360 14.4 TiDB Distributed eXecution storage architecture, enabling easy scaling of computing or storage capacity separately. The computing layer supports a maximum of 512 nodes, each node sup- ports a maximum of 1,000 concurrencies, and the maximum 2)) to a �→ = 1 AND b = 2 #56005 @ghazalfamilyusa • Increase the cost of table scans in the cost model for scenarios with a high risk of suboptimal execution plans, making the optimizer prefer indexes
0 码力 |
6705 页 |
110.86 MB
| 9 月前 3
-
· · · · · · · · · · · · · · · · · · · · · · · · · · · · 3329 14.3.15 TiFlash Pipeline Execution Model· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 3330 14.4 TiDB Distributed eXecution storage architecture, enabling easy scaling of computing or storage capacity separately. The computing layer supports a maximum of 512 nodes, each node sup- ports a maximum of 1,000 concurrencies, and the maximum This feature simplifies TiProxy deployment and �→ reduces the complexity of the database access layer. 2.2.1 Feature details 2.2.1.1 Performance • The optimizer allows pushing the Projection operator
0 码力 |
6606 页 |
109.48 MB
| 9 月前 3
-
· · · · · · · · · · · · · · · · · · · · · · · · · · · · 3321 14.3.15 TiFlash Pipeline Execution Model· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 3322 14.4 TiDB Distributed eXecution storage architecture, enabling easy scaling of computing or storage capacity separately. The computing layer supports a maximum of 512 nodes, each node sup- ports a maximum of 1,000 concurrencies, and the maximum optimizes the process of load- ing statistics from multiple perspectives, such as the concurrency model and memory allocation, to reduce latency, improve throughput, and avoid slow loading of statistics
0 码力 |
6549 页 |
108.77 MB
| 9 月前 3
-
· · · · · · · · · · · · · · · · · · · · · · · · · · · · 3288 14.3.15 TiFlash Pipeline Execution
Model· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 3289 14.4 TiDB Distributed eXecution storage architecture, enabling easy scaling of computing or storage capacity separately. The computing
layer supports a maximum of 512 nodes, each node sup- ports a maximum of 1,000 concurrencies, and the maximum authentication (introduced in v8.1.0)
TiCDC supports client authentication using mutual Transport Layer �→ Security (mTLS) or TiDB username and password. This feature enables �→ CLI or OpenAPI clients 0 码力 |
6479 页 |
108.61 MB
| 9 月前 3
|