Skip to content

Commit 1ae4738

Browse files
committed
umd: open source compiler code
Release DLA1.2.0 compiler source code Signed-off-by: Prashant Gaikwad <[email protected]> Signed-off-by: Mitch Harwell <[email protected]> Signed-off-by: Gunjan Mehta <[email protected]> Signed-off-by: Ken Adams <[email protected]> Signed-off-by: Arvind M <[email protected]>
1 parent 38a6300 commit 1ae4738

File tree

211 files changed

+369753
-77
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

211 files changed

+369753
-77
lines changed

CompilerFeatures.md

+7
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,13 @@
3232
||EltWise MAX|&#10004;|Not implemented in SW|
3333
|**LRN**||&#10004;|Not implemented in SW|
3434

35+
### Frameworks support
36+
37+
|Framework &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|Status &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|
38+
|---------|-------|
39+
|Caffe|&#10004;|
40+
|ONNX|Future|
41+
3542
### Networks verification report
3643

3744
|Network &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|Configuration &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|fp16 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |int8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |

LowPrecision.md

+25-34
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Low precision support in NVDLA
22

3-
Use of low precision such 8-bit, 4-bit, or even lower number of bits for inference is one of the optimization methods used in deep learning. NVDLA architecture includes INT8 (8-bit) precision support. It helps to compress the model reducing memory footprint and to improve performance with a small degradation in accuracy. Using INT8 precision for inference requires quantizing pre-trained models from floating point to INT8 and programming converters in NVDLA for scaling/re-scaling tensors.
3+
Use of low precision such 8-bit, 4-bit, or even lower number of bits for inference is one of the optimization methods used in deep learning. It helps to compress the model reducing memory footprint and to improve performance with a small degradation in accuracy. Using INT8 precision for inference requires quantizing pre-trained models from floating point to INT8 and programming converters in NVDLA for scaling/re-scaling tensors.
44

55
### NVDLA architecture for INT8 precision support includes the following:
66
- INT8 input/output data read/write
@@ -9,14 +9,24 @@ Use of low precision such 8-bit, 4-bit, or even lower number of bits for inferen
99
- Per-tensor and per-kernel output re-scaling using output converters
1010

1111
### Steps to generate INT8 quantized model:
12-
- Analyze the dynamic range of per-layer tensors and calculate scale factors
12+
- Analyze the dynamic range of per-layer tensors and calculate scale factors using TensorRT
13+
- Import scale factors generated using TensorRT to NVDLA JSON format
1314
- Quantize model weights and determine the converter parameters using scale factors
1415

15-
#### Analyze dynamic range of per-layer tensors and calculate scale factors
16-
A calibration tool can collect the dynamic range of the output tensor for each layer over a dataset of images. This dynamic range information can be used to calculate per-tensor scale factors. The NVDLA Compiler uses the following JSON schema to import scale factors.
16+
#### Analyze dynamic range of per-layer tensors and calculate scale factors using TensorRT
17+
A calibration tool collects the dynamic range of the output tensor for each layer over a dataset of images. This dynamic range information can be used to calculate per-tensor scale factors. For NVDLA, calibration interface TensorRT is used to generate scale factors.
18+
19+
Refer to https://github.com/NVIDIA/TensorRT/tree/release/5.1/samples/opensource/sampleINT8 for sample application which explains how to use TensorRT to generate scales factors.
20+
21+
Notes:
22+
- Use IInt8EntropyCalibrator2 for calibration.
23+
- Dump calibration scales using writeCalibrationCache() to import it in NVDLA JSON format.
24+
- Do not set --useDLACore for calibration, it is used to generate runtime engine through TensorRT for NVIDIA Xavier platform such NVIDIA Jetson AGX Xavier which has NVDLA integrated.
1725

1826
##### JSON schema for calibration table
1927

28+
The NVDLA Compiler uses the following JSON schema to import scale factors generated from TensorRT.
29+
2030
```
2131
{
2232
"type" : "object",
@@ -45,37 +55,18 @@ A calibration tool can collect the dynamic range of the output tensor for each l
4555
}
4656
```
4757

48-
##### Sample calibration table for first few layers of ResNet-50 using symmetric scaling
58+
##### How to covert calibration cache dump to NVDLA JSON format?
4959

50-
```
51-
{
52-
"data" : {
53-
"scale": 0.00781453,
54-
"min": 0,
55-
"max": 0,
56-
"offset": 0
57-
},
58-
"conv1" : {
59-
"scale": 0.0891214,
60-
"min": 0,
61-
"max": 0,
62-
"offset": 0
63-
},
64-
"pool1" : {
65-
"scale": 0.0891214,
66-
"min": 0,
67-
"max": 0,
68-
"offset": 0
69-
},
70-
"res2a_branch1" : {
71-
"scale": 0.119546,
72-
"min": 0,
73-
"max": 0,
74-
"offset": 0
75-
}
76-
}
77-
```
60+
[calib_txt_to_json.py](https://github.com/nvdla/sw/tree/master/umd/utils/calibdata/calib_txt_to_json.py) can be used to convert calibration scales generated from TensorRT to NVDLA JSON format.
7861

7962
#### Quantize model weights and determine the converter parameters
8063

81-
The NVDLA Compiler has the ability to quantize model weights and determine the converter parameters using the scale factors from the calibration table.
64+
The NVDLA Compiler has the ability to quantize model weights and determine the converter parameters using the scale factors from the calibration table.
65+
66+
Use --calibtable argument to use calibration table generated from TensorRT as input to NVDLA compiler.
67+
68+
#### Example
69+
70+
Sample calibration table for [ResNet-50 Caffe model](https://github.com/KaimingHe/deep-residual-networks) is shared at [calib.json](https://github.com/nvdla/sw/tree/master/umd/utils/calibdata/calib.json)
71+
72+
This calibration table can be used with NVDLA compiler and [ResNet-50 Caffe model](https://github.com/KaimingHe/deep-residual-networks) to run ResNet-50 on NVDLA INT8 configuration

Roadmap.md

+25
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# NVDLA Roadmap
2+
3+
### DLA 1.3.0
4+
5+
- HW Multibatch for FC layers
6+
- Multi-input network support
7+
- Support different precision and format for input tensors
8+
- Buffer pre-registration
9+
- INT8 deconvolution
10+
- Deconvolution optmization
11+
- Support deconvolution with stride > 32
12+
- INT8 group convolution
13+
- Depthwise convolution optmization
14+
- ReLU-N
15+
- Machine Translation Layer (MTL)
16+
17+
Note: APIs are expected to change in DLA1.3.0
18+
19+
### Future
20+
21+
- Memory optimzations
22+
- ONNX
23+
- Sample application for accuracy
24+
- Sample application for object detection
25+

umd/Makefile

+13-5
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (c) 2017, NVIDIA CORPORATION. All rights reserved.
1+
# Copyright (c) 2017-2019, NVIDIA CORPORATION. All rights reserved.
22
#
33
# Redistribution and use in source and binary forms, with or without
44
# modification, are permitted provided that the following conditions
@@ -25,11 +25,19 @@
2525
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
2626

2727

28-
SUBDIRS = core/runtime \
29-
tests/runtime
28+
COMPILER_SUBDIRS = core/src/compiler \
29+
apps/compiler
3030

31-
subdirs:
32-
for dir in $(SUBDIRS); do \
31+
RUNTIME_SUBDIRS = core/src/runtime \
32+
apps/runtime
33+
34+
compiler:
35+
for dir in $(COMPILER_SUBDIRS); do \
36+
$(MAKE) -C $$dir; \
37+
done
38+
39+
runtime:
40+
for dir in $(RUNTIME_SUBDIRS); do \
3341
$(MAKE) -C $$dir; \
3442
done
3543

umd/apps/compiler/CompileTest.cpp

+121
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
/*
2+
* Copyright (c) 2017-2019, NVIDIA CORPORATION. All rights reserved.
3+
*
4+
* Redistribution and use in source and binary forms, with or without
5+
* modification, are permitted provided that the following conditions
6+
* are met:
7+
* * Redistributions of source code must retain the above copyright
8+
* notice, this list of conditions and the following disclaimer.
9+
* * Redistributions in binary form must reproduce the above copyright
10+
* notice, this list of conditions and the following disclaimer in the
11+
* documentation and/or other materials provided with the distribution.
12+
* * Neither the name of NVIDIA CORPORATION nor the names of its
13+
* contributors may be used to endorse or promote products derived
14+
* from this software without specific prior written permission.
15+
*
16+
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
17+
* EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18+
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
19+
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
20+
* CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
21+
* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
22+
* PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
23+
* PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
24+
* OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
25+
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
26+
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
27+
*/
28+
29+
#include "main.h"
30+
31+
#include "nvdla/IProfile.h"
32+
#include "nvdla/IProfiler.h"
33+
#include "nvdla/IWisdom.h"
34+
#include "nvdla/INetwork.h"
35+
#include "nvdla/ICompiler.h"
36+
#include "nvdla/ITargetConfig.h"
37+
38+
#include "ErrorMacros.h"
39+
#include "nvdla_os_inf.h"
40+
41+
NvDlaError compileProfile(const TestAppArgs* appArgs, TestInfo* i)
42+
{
43+
NvDlaError e = NvDlaSuccess;
44+
std::string profileName = "";
45+
std::string targetConfigName = "";
46+
47+
NvDlaFileHandle file = 0;
48+
std::string fileName = "";
49+
NvU8 *buffer = 0;
50+
NvU64 size = 0;
51+
52+
nvdla::ICompiler* compiler = i->wisdom->getCompiler();
53+
if (!compiler)
54+
ORIGINATE_ERROR_FAIL(NvDlaError_BadParameter, "wisdom->getCompiler() failed");
55+
56+
if (!(appArgs->configtarget != ""))
57+
ORIGINATE_ERROR_FAIL(NvDlaError_NotInitialized, "No target config found to load");
58+
59+
targetConfigName = appArgs->configtarget;
60+
61+
// Determine profile
62+
PROPAGATE_ERROR_FAIL(generateProfile(appArgs, &profileName, i));
63+
64+
// Compile
65+
NvDlaDebugPrintf("compiling profile \"%s\"... config \"%s\"...\n", profileName.c_str(), targetConfigName.c_str());
66+
PROPAGATE_ERROR_FAIL(compiler->compile(profileName.c_str(), targetConfigName.c_str(), &i->compiledLoadable));
67+
68+
// Get loadable buffer and dump it into a file
69+
PROPAGATE_ERROR_FAIL(compiler->getLoadableImageSize(profileName.c_str(),
70+
&size));
71+
if (size == 0) {
72+
ORIGINATE_ERROR_FAIL(NvDlaError_BadParameter,
73+
"Invalid size for a loadable");
74+
}
75+
76+
buffer = (NvU8 *) NvDlaAlloc(size);
77+
if (buffer == NULL) {
78+
ORIGINATE_ERROR_FAIL(NvDlaError_InsufficientMemory,
79+
"Failed to allocate buffer for loadable");
80+
}
81+
PROPAGATE_ERROR_FAIL(compiler->getLoadableImage(profileName.c_str(),
82+
buffer));
83+
fileName = profileName + ".nvdla";
84+
PROPAGATE_ERROR_FAIL(NvDlaFopen(fileName.c_str(), NVDLA_OPEN_WRITE, &file));
85+
PROPAGATE_ERROR_FAIL(NvDlaFwrite(file, buffer, size));
86+
87+
fail:
88+
NvDlaFclose(file);
89+
if (buffer != NULL)
90+
NvDlaFree(buffer);
91+
return e;
92+
}
93+
94+
NvDlaError compile(const TestAppArgs* appArgs, TestInfo* i)
95+
{
96+
NvDlaError e = NvDlaSuccess;
97+
98+
i->compiledLoadable = 0;
99+
100+
NvDlaDebugPrintf("creating new wisdom context...\n");
101+
i->wisdom = nvdla::createWisdom();
102+
if (!i->wisdom)
103+
ORIGINATE_ERROR_FAIL(NvDlaError_BadParameter, "createWisdom() failed");
104+
105+
NvDlaDebugPrintf("opening wisdom context...\n");
106+
if (!i->wisdom->open(i->wisdomPath))
107+
ORIGINATE_ERROR_FAIL(NvDlaError_BadParameter, "wisdom->open() failed to open: \"%s\"", i->wisdomPath.c_str());
108+
109+
// Compile
110+
PROPAGATE_ERROR_FAIL(compileProfile(appArgs, i));
111+
112+
NvDlaDebugPrintf("closing wisdom context...\n");
113+
i->wisdom->close();
114+
115+
fail:
116+
if (i->wisdom != NULL) {
117+
nvdla::destroyWisdom(i->wisdom);
118+
i->wisdom = NULL;
119+
}
120+
return e;
121+
}

0 commit comments

Comments
 (0)