FunASR/runtime/docs/benchmark_onnx_cpp.md

250 lines
12 KiB
Markdown

# CPU Benchmark (ONNX-cpp)
## Configuration
### Data set:
Aishell1 [test set](https://www.openslr.org/33/) , the total audio duration is 36108.919 seconds.
### Tools
#### Install [modelscope and funasr](https://github.com/alibaba-damo-academy/FunASR#installation)
```shell
pip3 install torch torchaudio
pip install -U modelscope
pip install -U funasr
```
#### Export [onnx model](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/export)
```shell
python -m funasr.export.export_model --model-name damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch --export-dir ./export --type onnx --quantize True
```
#### Building for Linux/Unix
Download onnxruntime
```shell
# download an appropriate onnxruntime from https://github.com/microsoft/onnxruntime/releases/tag/v1.14.0
# here we get a copy of onnxruntime for linux 64
wget https://github.com/microsoft/onnxruntime/releases/download/v1.14.0/onnxruntime-linux-x64-1.14.0.tgz
tar -zxvf onnxruntime-linux-x64-1.14.0.tgz
```
Install openblas
```shell
sudo apt-get install libopenblas-dev #ubuntu
# sudo yum -y install openblas-devel #centos
```
Build runtime
```shell
git clone https://github.com/alibaba-damo-academy/FunASR.git && cd funasr/runtime/onnxruntime
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=release .. -DONNXRUNTIME_DIR=/path/to/onnxruntime-linux-x64-1.14.0
make
```
## [Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)
```shell
./funasr-onnx-offline-rtf \
--model-dir ./damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch \
--quantize true \
--wav-path ./aishell1_test.scp \
--thread-num 32
Node: '--quantize false' means fp32, otherwise it will be int8
```
Number of Parameter: 220M
Storage size: 880MB
Storage size after int8-quant: 237MB
CER: 1.95%
CER after int8-quant: 1.95%
### Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz 16core-32processor with avx512_vnni
| concurrent-tasks | processing time(s) | RTF | Speedup Rate |
|---------------------|:------------------:|:--------:|:------------:|
| 1 (onnx fp32) | 2129s | 0.058974 | 17 |
| 1 (onnx int8) | 1020s | 0.02826 | 35 |
| 8 (onnx fp32) | 273s | 0.007553 | 132 |
| 8 (onnx int8) | 128s | 0.003558 | 281 |
| 16 (onnx fp32) | 146s | 0.00403 | 248 |
| 16 (onnx int8) | 67s | 0.001868 | 535 |
| 32 (onnx fp32) | 133s | 0.003672 | 272 |
| 32 (onnx int8) | 64s | 0.001778 | 562 |
| 64 (onnx fp32) | 136s | 0.003771 | 265 |
| 64 (onnx int8) | 67s | 0.001846 | 541 |
| 96 (onnx fp32) | 137s | 0.003788 | 264 |
| 96 (onnx int8) | 68s | 0.001875 | 533 |
### Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz 32core-64processor without avx512_vnni
| concurrent-tasks | processing time(s) | RTF | Speedup Rate |
|---------------------|:------------------:|----------|:------------:|
| 1 (onnx fp32) | 2903s | 0.080404 | 12 |
| 1 (onnx int8) | 2714s | 0.075168 | 13 |
| 8 (onnx fp32) | 373s | 0.010329 | 97 |
| 8 (onnx int8) | 340s | 0.009428 | 106 |
| 16 (onnx fp32) | 189s | 0.005252 | 190 |
| 16 (onnx int8) | 174s | 0.004817 | 207 |
| 32 (onnx fp32) | 109s | 0.00301 | 332 |
| 32 (onnx int8) | 88s | 0.00245 | 408 |
| 64 (onnx fp32) | 113s | 0.003129 | 320 |
| 64 (onnx int8) | 79s | 0.002201 | 454 |
| 96 (onnx fp32) | 115s | 0.003183 | 314 |
| 96 (onnx int8) | 80s | 0.002222 | 450 |
## [FSMN-VAD](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx/summary) + [Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) + [CT-Transformer](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx/summary)
```shell
./funasr-onnx-offline-rtf \
--model-dir ./damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch \
--quantize true \
--vad-dir ./damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
--punc-dir ./damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx \
--wav-path ./aishell1_test.scp \
--thread-num 32
Node: '--quantize false' means fp32, otherwise it will be int8
```
### Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz 16core-32processor with avx512_vnni
| concurrent-tasks | processing time(s) | RTF | Speedup Rate |
|---------------------|:------------------:|:--------:|:------------:|
| 1 (onnx fp32) | 2134s | 0.0591 | 17 |
| 1 (onnx int8) | 1047s | 0.029 | 34 |
| 8 (onnx fp32) | 273s | 0.007557 | 132 |
| 8 (onnx int8) | 132s | 0.003647 | 274 |
| 16 (onnx fp32) | 147s | 0.004061 | 246 |
| 16 (onnx int8) | 69s | 0.001916 | 521 |
| 32 (onnx fp32) | 133s | 0.003675 | 272 |
| 32 (onnx int8) | 65s | 0.001786 | 559 |
| 64 (onnx fp32) | 136s | 0.003767 | 265 |
| 64 (onnx int8) | 67s | 0.001867 | 535 |
| 96 (onnx fp32) | 137s | 0.003802 | 262 |
| 96 (onnx int8) | 69s | 0.001904 | 524 |
### Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz 32core-64processor without avx512_vnni
| concurrent-tasks | processing time(s) | RTF | Speedup Rate |
|---------------------|:------------------:|----------|:------------:|
| 1 (onnx fp32) | 3073s | 0.0851 | 12 |
| 1 (onnx int8) | 2840s | 0.0787 | 13 |
| 8 (onnx fp32) | 389s | 0.01079 | 93 |
| 8 (onnx int8) | 355s | 0.0098 | 101 |
| 16 (onnx fp32) | 199s | 0.005513 | 181 |
| 16 (onnx int8) | 171s | 0.004784 | 210 |
| 32 (onnx fp32) | 113s | 0.00314 | 318 |
| 32 (onnx int8) | 92s | 0.00255 | 391 |
| 64 (onnx fp32) | 115s | 0.0032 | 312 |
| 64 (onnx int8) | 81s | 0.002232 | 448 |
| 96 (onnx fp32) | 117s | 0.003257 | 307 |
| 96 (onnx int8) | 81s | 0.002258 | 442 |
## [FSMN-VAD](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx/summary) + [Paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) +[Ngram](https://www.modelscope.cn/models/damo/speech_ngram_lm_zh-cn-ai-wesp-fst/summary) + [CT-Transformer](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx/summary)
```shell
./funasr-onnx-offline-rtf \
--model-dir ./damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch \
--quantize true \
--vad-dir ./damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
--punc-dir ./damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx \
--lm-dir ./damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
--wav-path ./aishell1_test.scp \
--thread-num 32
Node: '--quantize false' means fp32, otherwise it will be int8
```
### Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz 16core-32processor with avx512_vnni
| concurrent-tasks | processing time(s) | RTF | Speedup Rate |
|------------------|:------------------:|:------:|:------------:|
| 1 (onnx fp32) | 2506s | 0.0694 | 14 |
| 1 (onnx int8) | 1448s | 0.0401 | 25 |
| 8 (onnx fp32) | 326s | 0.0090 | 110 |
| 8 (onnx int8) | 184s | 0.0051 | 195 |
| 16 (onnx fp32) | 178s | 0.0049 | 202 |
| 16 (onnx int8) | 99s | 0.0027 | 361 |
| 32 (onnx fp32) | 152s | 0.0042 | 236 |
| 32 (onnx int8) | 85s | 0.0023 | 422 |
| 64 (onnx fp32) | 157s | 0.0043 | 228 |
| 64 (onnx int8) | 89s | 0.0024 | 403 |
| 96 (onnx fp32) | 158s | 0.0044 | 227 |
| 96 (onnx int8) | 91s | 0.0025 | 396 |
```shell
# using hotwords
./funasr-onnx-offline-rtf \
--model-dir ./damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch \
--quantize true \
--vad-dir ./damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
--punc-dir ./damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx \
--lm-dir ./damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
--hotword hotwords_dev_600.txt \
--wav-path ./aishell1_test.scp \
--thread-num 32
Node: '--quantize false' means fp32, otherwise it will be int8
```
### Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz 16core-32processor with avx512_vnni
| concurrent-tasks | processing time(s) | RTF | Speedup Rate |
|------------------|:------------------:|:------:|:------------:|
| 1 (onnx fp32) | 3172s | 0.0878 | 11 |
| 1 (onnx int8) | 2140s | 0.0592 | 16 |
| 8 (onnx fp32) | 412s | 0.0114 | 87 |
| 8 (onnx int8) | 268s | 0.0074 | 134 |
| 16 (onnx fp32) | 218s | 0.0060 | 165 |
| 16 (onnx int8) | 140s | 0.0038 | 257 |
| 32 (onnx fp32) | 183s | 0.0050 | 196 |
| 32 (onnx int8) | 116s | 0.0032 | 310 |
| 64 (onnx fp32) | 188s | 0.0052 | 192 |
| 64 (onnx int8) | 120s | 0.0033 | 299 |
| 96 (onnx fp32) | 191s | 0.0052 | 188 |
| 96 (onnx int8) | 122s | 0.0033 | 294 |
## [FSMN-VAD](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx/summary) + [Paraformer-en](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-en-16k-common-vocab10020-onnx/summary) + [CT-Transformer](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx/summary)
```shell
./funasr-onnx-offline-rtf \
--model-dir ./damo/speech_paraformer-large_asr_nat-en-16k-common-vocab10020-onnx \
--quantize true \
--vad-dir ./damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
--punc-dir ./damo/punc_ct-transformer_zh-cn-common-vocab272727-onnx \
--wav-path ./librispeech_test_clean.scp \
--thread-num 32
Node: '--quantize false' means fp32, otherwise it will be int8
```
### Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz 16core-32processor with avx512_vnni
| concurrent-tasks | processing time(s) | RTF | Speedup Rate |
|---------------------|:------------------:|----------|:------------:|
| 1 (onnx fp32) | 1327s | 0.0682 | 15 |
| 1 (onnx int8) | 734s | 0.0377 | 26 |
| 8 (onnx fp32) | 169s | 0.0087 | 114 |
| 8 (onnx int8) | 94s | 0.0048 | 205 |
| 16 (onnx fp32) | 89s | 0.0046 | 217 |
| 16 (onnx int8) | 50s | 0.0025 | 388 |
| 32 (onnx fp32) | 78s | 0.0040 | 248 |
| 32 (onnx int8) | 43s | 0.0022 | 448 |
| 64 (onnx fp32) | 79s | 0.0041 | 243 |
| 64 (onnx int8) | 44s | 0.0022 | 438 |
| 96 (onnx fp32) | 80s | 0.0041 | 240 |
| 96 (onnx int8) | 45s | 0.0023 | 428 |