GPU 性能测试指标
GPU burn 官网:
gpu burn 性能测试
https://blog.csdn.net/weixin_40109345/article/details/106040332 gpu burn 性能测试安装
Shell
Copy
(base) root@278493cc22a8fd69:~/performance_test/gpu-burn# ./gpu_burn 60
GPU 0: P102-100 (UUID: GPU-441ecc72-0be5-f954-8d85-61658fbb8d81)
Initialized device 0 with 10156 MB of memory (9975 MB available, using 8978 MB of it), using FLOATS
11.7% proc'd: 2795 (9270 Gflop/s) errors: 0 temps: 21 C
Summary at: Mon Dec 14 15:01:19 CST 2020
23.3% proc'd: 6708 (9193 Gflop/s) errors: 0 temps: 25 C
Summary at: Mon Dec 14 15:01:26 CST 2020
35.0% proc'd: 10621 (9173 Gflop/s) errors: 0 temps: 29 C
Summary at: Mon Dec 14 15:01:33 CST 2020
46.7% proc'd: 13975 (9159 Gflop/s) errors: 0 temps: 31 C
Summary at: Mon Dec 14 15:01:40 CST 2020
58.3% proc'd: 17888 (9154 Gflop/s) errors: 0 temps: 32 C
Summary at: Mon Dec 14 15:01:47 CST 2020
68.3% proc'd: 21242 (9144 Gflop/s) errors: 0 temps: 34 C
Summary at: Mon Dec 14 15:01:53 CST 2020
80.0% proc'd: 24596 (9141 Gflop/s) errors: 0 temps: 35 C
Summary at: Mon Dec 14 15:02:00 CST 2020
91.7% proc'd: 28509 (9139 Gflop/s) errors: 0 temps: 36 C
Summary at: Mon Dec 14 15:02:07 CST 2020
100.0% proc'd: 31863 (9123 Gflop/s) errors: 0 temps: 37 C
Killing processes.. Freed memory for dev 0
Uninitted cublas
done
Tested 1 GPUs:
GPU 0: OK
前置环境准备
gpu 驱动安装,如果机器没有预装显卡驱动需要手动安装显卡驱动
执行 nvidia-smi 如果提示没有该命令则 nvidia 显卡驱动没有安装
下载对应的 Nvidia 显卡驱动,执行
Shell
Copy
bash NVIDIA-Linux-x86_64-450.80.02.run
进行安装
这里 benchmark 的 readme 推荐使用 conda 进行安装
Conda 是一个包,依赖和环境管理工具,如果系统没有预装 conda 工具需要进行预装
进入 conda 安装页面我们发现有两个安装器可以使用
Miniconda
Anaconda
第一次使用 Miniconda 安装 conda
Shell
Copy
benchmark 安装
首先使用 conda 命令安装 python 环境
Shell
Copy
conda install -y python=3.7
使用 conda 安装 pytorch, torchvision and torchtext 工具
Shell
Copy
conda install -y pytorch torchtext torchvision -c pytorch-nightly
提示需要安装的包如下:
Shell
Copy
package | build
---------------------------|-----------------
cudatoolkit-10.2.89 | hfd86e86_1 365.1 MB
ffmpeg-4.2.2 | h20bf706_0 59.6 MB
gnutls-3.6.5 | h71b1129_1002 1.6 MB
lame-3.100 | h7b6447c_0 323 KB
libopus-1.3.1 | h7b6447c_0 491 KB
libuv-1.40.0 | h7b6447c_0 736 KB
libvpx-1.7.0 | h439df22_0 1.2 MB
nettle-3.4.1 | hbb512f6_0 5.0 MB
ninja-1.10.2 | py37hff7bd54_0 1.4 MB
openh264-2.1.0 | hd408876_0 722 KB
pytorch-1.8.0.dev20210110 |py3.7_cuda10.2.89_cudnn7.6.5_0 655.4 MB pytorch-nightly
torchtext-0.9.0.dev20210110| py37 11.3 MB pytorch-nightly
torchvision-0.9.0.dev20210110| py37_cu102 26.2 MB pytorch-nightly
x264-1!157.20191217 | h7b6447c_0 922 KB
------------------------------------------------------------
Total: 1.10 GB
报错
Shell
Copy
joblib-1.0.0 | 207 KB | ## | 100%
torchvision-0.2.1 | 75 KB | ## | 100%
fsspec-0.8.3 | 69 KB | ## | 100%
sphinxcontrib-qthelp | 26 KB | ## | 100%
openssl-1.1.1i | 3.8 MB | ## | 100%
sphinxcontrib-devhel | 24 KB | ## | 100%
anaconda-custom | 3 KB | ## | 100%
sphinxcontrib-serial | 25 KB | ## | 100%
pytorch-1.8.0.dev202 | 860.8 MB | | 0% pytorch-1.8.0.dev202 | 860.8 MB | | 0%
torchtext-0.9.0.dev2 | 11.2 MB | ## | 100%
mock-4.0.3 | 27 KB | ## | 100%
sphinxcontrib-appleh | 30 KB | ## | 100%
tbb-2020.3 | 1.4 MB | ## | 100%
InvalidArchiveError('Error with archive /root/anaconda3/pkgs/pytorch-1.8.0.dev20201229-py3.7_cuda11.0.221_cudnn8.0.5_0.tar.bz2. You probably need to delete and re-download or re-create this file. Message from libarchive was:\n\nSeek failed')
注意要选对版本,因为是用 pytorch 进行 GPU 测试所以应该是 pytorch-1.8.0.dev20201229-py3.7_cuda11.0.221_cudnn8.0.5_0.tar.bz2 这个包
我们通过 wget 命令下载离线包进行 conda 离线安装:
Shell
Copy
wget https://anaconda.org/pytorch-nightly/pytorch/1.8.0.dev20201229/download/linux-64/pytorch-1.8.0.dev20201229-py3.7_cuda11.0.221_cudnn8.0.5_0.tar.bz2
conda install --use-local pytorch-1.8.0.dev20201229-py3.7_cuda11.0.221_cudnn8.0.5_0.tar.bz2
Downloading and Extracting Packages
| 0%
InvalidArchiveError('Error with archive /root/anaconda3/pkgs/pytorch-1.8.0.dev20201229-py3.7_cuda11.0.221_cudnn8.0.5_0.tar.bz2. You probably need to delete and re-download or re-create this file. Message from libarchive was:\n\nSeek failed')
发现还是报错,这是因为之前我们安装的时候残留文件导致的,删除残留文件再执行安装
Shell
Copy
The following packages will be UPDATED:
ca-certificates 2019.1.23-0 --> 2020.12.8-h06a4308_0
certifi 2019.3.9-py37_0 --> 2020.12.5-py37h06a4308_0
openssl 1.1.1b-h7b6447c_1 --> 1.1.1i-h27cfd23_0
pytorch <unknown>::pytorch-1.8.0.dev20201230-~ --> pytorch-nightly::pytorch-1.8.0.dev20210103-py3.7_cuda11.0.221_cudnn8.0.5_0
下载 benchmark 安装 benchmark 需要的套件
Shell
Copy
git clone https://github.com/pytorch/benchmark
cd benchmark
python install.py
Requirement already satisfied: pytest in /usr/local/anaconda3/lib/python3.7/site-packages (from -r requirements.txt (line 1)) (5.4.3)
Collecting pytest-benchmark
Downloading pytest_benchmark-3.2.3-py2.py3-none-any.whl (49 kB)
|████████████████████████████████| 49 kB 358 kB/s
Requirement already satisfied: requests in /usr/local/anaconda3/lib/python3.7/site-packages (from -r requirements.txt (line 3)) (2.23.0)
Collecting tabulate
Downloading tabulate-0.8.7-py3-none-any.whl (24 kB)
Requirement already satisfied: wcwidth in /usr/local/anaconda3/lib/python3.7/site-packages (from pytest->-r requirements.txt (line 1)) (0.2.4)
Requirement already satisfied: attrs>=17.4.0 in /usr/local/anaconda3/lib/python3.7/site-packages (from pytest->-r requirements.txt (line 1)) (19.3.0)
Requirement already satisfied: py>=1.5.0 in /usr/local/anaconda3/lib/python3.7/site-packages (from pytest->-r requirements.txt (line 1)) (1.8.1)
Requirement already satisfied: more-itertools>=4.0.0 in /usr/local/anaconda3/lib/python3.7/site-packages (from pytest->-r requirements.txt (line 1)) (8.3.0)
Requirement already satisfied: packaging in /usr/local/anaconda3/lib/python3.7/site-packages (from pytest->-r requirements.txt (line 1)) (20.4)
Requirement already satisfied: pluggy<1.0,>=0.12 in /usr/local/anaconda3/lib/python3.7/site-packages (from pytest->-r requirements.txt (line 1)) (0.13.1)
Requirement already satisfied: importlib-metadata>=0.12; python_version < "3.8" in /usr/local/anaconda3/lib/python3.7/site-packages (from pytest->-r requirements.txt (line 1)) (1.6.1)
Collecting py-cpuinfo
Downloading py-cpuinfo-7.0.0.tar.gz (95 kB)
|████████████████████████████████| 95 kB 81 kB/s
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/anaconda3/lib/python3.7/site-packages (from requests->-r requirements.txt (line 3)) (1.25.9)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/anaconda3/lib/python3.7/site-packages (from requests->-r requirements.txt (line 3)) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/anaconda3/lib/python3.7/site-packages (from requests->-r requirements.txt (line 3)) (2020.4.5.2)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/anaconda3/lib/python3.7/site-packages (from requests->-r requirements.txt (line 3)) (2.9)
Requirement already satisfied: six in /usr/local/anaconda3/lib/python3.7/site-packages (from packaging->pytest->-r requirements.txt (line 1)) (1.15.0)
Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/anaconda3/lib/python3.7/site-packages (from packaging->pytest->-r requirements.txt (line 1)) (2.4.7)
Requirement already satisfied: zipp>=0.5 in /usr/local/anaconda3/lib/python3.7/site-packages (from importlib-metadata>=0.12; python_version < "3.8"->pytest->-r requirements.txt (line 1)) (3.1.0)
Building wheels for collected packages: py-cpuinfo
安装一栏之前需要去 torchbenchmark 中把每个模型需要的依赖也安装
为了方便写了个脚本进行安装:
Shell
Copy
a=`ls . | grep -v ADDING_MODELS.md | grep -v install.sh`
for element in ${a[@]}
do
cd $element
python install.py
cd ..
done
安装 代理需求:
Shell
Copy
Unable to verify https connectivity, required for setup.
Do you need to use a proxy?
在 install.py 文件中修改 pip 命令加上代理设置
Shell
Copy
subprocess.check_call([sys.executable, '-m',
'pip', '--proxy', 'http://127.0.0.1:10000', 'install', '-r', 'requirements.txt'])
安装过程中发现 model 中的依赖里面执行 python install.py 会有很多 package not found 出现下面这些错误:
Shell
Copy
Installed /root/anaconda3/lib/python3.7/site-packages/cityscapesScripts-2.1.7-py3.7.egg
Processing dependencies for cityscapesScripts==2.1.7
Searching for appdirs
Reading http://mirrors.cloud.aliyuncs.com/pypi/simple/appdirs/
Couldn't find index page for 'appdirs' (maybe misspelled?)
Scanning index of all packages (this may take a while)
Reading http://mirrors.cloud.aliyuncs.com/pypi/simple/
No local packages or working download links found for appdirs
error: Could not find suitable distribution for Requirement.parse('appdirs')
Traceback (most recent call last):
File "install.py", line 19, in <module>
install_other_dependencies()
File "install.py", line 14, in install_other_dependencies
subprocess.check_call(['bash', 'install_dependencies.sh', tmpdir])
File "/root/anaconda3/lib/python3.7/subprocess.py", line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['bash', 'install_dependencies.sh', '/tmp/tmpfs4b436c']' returned non-zero exit status 1.
需要手动执行 pip install appdirs 手动进行安装
运行测试:
在 benchmark 下有两个基本的测试脚本:
test.py 封装了最简单的测试,这些测试围绕基础设施遍历所有模型进行安装和执行
test_bench.py
Shell
Copy
发现报各种版本错误
通过 Docker 的方式进行图形显卡测试
https://pytorch.org/ pytorch 官网
遂决定使用 docker 的方式部署:
docker pytorch git 目录
https://github.com/nerox8664/pytorch-benchmarks pytorch benchmarks
它要求通过 nvidia-docker 命令来运行
Shell
Copy
nvidia-docker run --rm -ti nerox8664/pytorch-benchmarks
查看 nvidia 显卡驱动和相关信息
Shell
Copy
nvidia-smi
Thu Dec 17 14:35:58 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50 Driver Version: 430.50 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 P102-100 Off | 00000000:00:08.0 Off | N/A |
| 0% 14C P0 48W / 225W | 0MiB / 10156MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+