How to Fix RuntimeError: cudnn error: cudnn_status_not_initialized

The CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library for deep learning developed by NVIDIA.

The cuDNN is designed to work with the CUDA toolkit, a software development kit from NVIDIA that allows developers to use the power of GPUs for general-purpose computing.

While working with Pytorch or Tensorflow with cuDNN, we sometimes face a RuntimeError, and in today’s post, we will discuss this RuntimeError and how to fix it.

RuntimeError: cudnn error: cudnn_status_not_initialized

The “RuntimeError: cudnn error: cudnn_status_not_initialized” error in PyTorch occurs when the CUDA-DNN library (cudnn) is not properly initialized.

PyTorch initializes cuDNN lazily whenever a convolution is executed for the first time.

If insufficient GPU memory was left to initialize cuDNN because PyTorch already held the entire memory in its internal cache, this type of error might occur.

Other causes of the error

  1. Missing or outdated cudnn library.
  2. Problems with the CUDA toolkit installation.
  3. Issues with the GPU device or driver.
  4. Incorrect CUDA and cudnn library versions.

How to fix RuntimeError: cudnn error: cudnn_status_not_initialized

To fix RuntimeError: cudnn error: cudnn_status_not_initialized error, apply the following solutions.

  1. Downgrade your PyTorch version if the latest version is incompatible. Depending on the time you are seeing this issue, it might be possible that the issue has been resolved in the upcoming or newer version; in that case, upgrade the version and see if it is working; if it is not, then downgrade your version. It works both ways. So, see which approach suits you.
  2. You need to check if the GPU device is properly configured and available. 
  3. Use the lower version of CUDA and cudnn library to see if it is compatible with Pytorch and if the error persists.
  4. Uninstall Pytorch, CUDA, and cudnn libraries and reinstall the CUDA and cudnn library or install the Pytorch version that includes the cudnn library.
  5. Empty the cache manually with “torch.cuda.empty_cache()” right before the first convolution that is executed.
  6. The cuDNN is not supported in virtual environments, and running your code in a virtual environment may cause the RuntimeError.
  7. Installing the torch with CUDA 11.1 with the following command fixed the first issue with torch 1.8:
     pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
    

After all these fixes, I hope one of them will work for you.

That’s it, and thank you for reading this troubleshooting article.

Further reading

How to Fix RuntimeError: cuda error: invalid device ordinal

How to Fix ModuleNotFoundError: no module named gputil

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.