环境要求
已安装conda环境,如未安装,可参照Public_cluster Conda使用指南
申请GPU资源
在Public_cluster申请一个GPU计算节点,指定gpu队列,2个CPU核、1块GPU卡,指定最大占用时间为1小时。
[wzq@workstation ~]$salloc -N 1 -n 2 -p gpu --gres=gpu:1 -t 1:00:00
salloc: Pending job allocation 3452 ###生成任务ID为3452
salloc: Waiting for resource configuration
salloc: Nodes gpu2 are ready for job ###分配gpu1节点供任务使用
SSH登录到分配到的GPU计算节点上。
(base) [wzq@workstation ~]$ ssh gpu2 ###登录gpu1节点
Warning: Permanently added 'gpu1,192.168.0.3' (ECDSA) to the list of known hosts.
Last login: Fri Dec 31 03:19:41 2021 from 192.168.0.1
安装pytorch最新版本
创建一个conda环境,命名pytorch2.2,安装python3.8版本
(base) [wzq@gpu2 ~]$ conda create -n pytorch2.2 python=3.8
安装完成
#
# To activate this environment, use
#
# $ conda activate pytorch2.2
#
# To deactivate an active environment, use
#
# $ conda deactivate
激活conda环境
(base) [wzq@gpu2 ~]$ conda activate pytorch2.2
(pytorch2.2) [wzq@gpu2 ~]$
安装pytorch,使用清华源
pip3 install torch torchvision torchaudio -i https://pypi.tuna.tsinghua.edu.cn/simple
安装完成
Installing collected packages: mpmath, typing-extensions, sympy, pillow, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, MarkupSafe, fsspec, filelock, triton, nvidia-cusparse-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch, torchvision, torchaudio
Successfully installed MarkupSafe-2.1.5 filelock-3.13.3 fsspec-2024.3.1 jinja2-3.1.3 mpmath-1.3.0 networkx-3.1 numpy-1.24.4 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.19.3 nvidia-nvjitlink-cu12-12.4.99 nvidia-nvtx-cu12-12.1.105 pillow-10.3.0 sympy-1.12 torch-2.2.2 torchaudio-2.2.2 torchvision-0.17.2 triton-2.2.0 typing-extensions-4.10.0
运行测试环境,上传测试程序cifar10_tutorial.py
安装测试程序依赖:
(pytorch2.2) [wzq@gpu2 ~]$ pip3 install matplotlib -i https://pypi.tuna.tsinghua.edu.cn/simple
运行测试程序:
(pytorch2.2) [wzq@gpu2 ~]$ python cifar10_tutorial.py
显示运行结果:
truck ship deer bird
[1, 2000] loss: 2.235
[1, 4000] loss: 1.885
[1, 6000] loss: 1.716
[1, 8000] loss: 1.583
[1, 10000] loss: 1.500
[1, 12000] loss: 1.469
[2, 2000] loss: 1.387
[2, 4000] loss: 1.361
[2, 6000] loss: 1.379
[2, 8000] loss: 1.306
[2, 10000] loss: 1.322
[2, 12000] loss: 1.290
Finished Training
GroundTruth: cat ship ship plane
Predicted: cat plane plane plane
Accuracy of the network on the 10000 test images: 54 %
Accuracy of plane : 77 %
Accuracy of car : 64 %
Accuracy of bird : 45 %
Accuracy of cat : 22 %
Accuracy of deer : 40 %
Accuracy of dog : 40 %
Accuracy of frog : 82 %
Accuracy of horse : 62 %
Accuracy of ship : 51 %
Accuracy of truck : 56 %
cuda:0