July 16, 2025

Using GPUs with VMs on vSphere

Using GPUs with Virtual Machines on vSphere

🎯 Purpose of the Article

This article introduces the foundational concepts and options for integrating GPU acceleration into VMware vSphere environments, particularly for machine learning (ML) and compute-intensive workloads.

As organizations increasingly adopt machine learning (ML), artificial intelligence (AI), and data-intensive workloads, the demand for GPU-accelerated computing has surged. If you’re a vSphere administrator or infrastructure architect, you’ve likely been asked by data scientists, ML engineers, or developers to provide GPU-capable virtual machines (VMs). This article kicks off a series that explores how VMware vSphere supports GPU compute workloads, starting with an overview of the available options and considerations. b

🧠 Key Concepts

GPU Compute: Refers to using GPUs in VMs for general-purpose computing, not just graphics.
Near Bare-Metal Performance: With the right setup, GPU performance in VMs can closely match that of physical servers.

Why GPUs Matter for ML Workloads

GPUs are essential for accelerating ML tasks such as training, inference, and development. These workloads involve massive matrix operations that GPUs handle far more efficiently than traditional CPUs. The result? Faster time to insights and improved productivity for your teams.

Beyond VDI: GPU Compute on vSphere

While GPUs have traditionally been associated with Virtual Desktop Infrastructure (VDI), vSphere supports a broader use case known as GPU Compute. This allows users to run GPU-accelerated applications in VMs—similar to how they would on bare metal or in public cloud environments.

VMware, in collaboration with technology partners like NVIDIA, offers flexible GPU consumption models that maximize infrastructure ROI while meeting diverse user needs.

Key Questions to Ask Before You Begin

To successfully deploy GPU-enabled VMs, you’ll need to gather information from both your end-users and your hardware/software vendors. Consider:

What type of workloads will be run (e.g., ML training vs. inference)?
What performance expectations do users have?
What GPU hardware is available or needed?
What licensing or software components are required?

Performance Expectations

GPU performance in vSphere VMs can approach near bare-metal levels, depending on the chosen technology. Future articles in this series will dive deeper into performance benchmarks and comparisons. For now, VMware’s performance engineering team has published initial results that demonstrate the viability of GPU compute in virtualized environments.

GPU Usage Models in vSphere

One of the first decisions you’ll face is how GPUs will be consumed in your environment. VMware supports multiple GPU usage models, each suited to different scenarios:

Common GPU Configuration Options:

GPU Configuration	Use Case
Direct Pass-Through (vDGA)	High-performance, single-user workloads
Shared GPU (vGPU)	Multi-user environments, flexible resource allocation
Virtual Compute Server (vCS)	ML/AI workloads with compute-focused GPU sharing

These configurations are enabled through technologies like NVIDIA vGPU, formerly known as NVIDIA GRID. The vGPU family includes products such as Virtual Compute Server (vCS) and Virtual Data Science Workstation (vDWS), each tailored to specific workload types.

🚀 Why Use GPUs in vSphere?

Speed: GPUs accelerate ML tasks like training and inference by handling large matrix operations faster than CPUs.
Demand: Data scientists and developers increasingly require GPU-powered environments for AI, ML, and data analytics.
Flexibility: vSphere supports GPU usage beyond traditional VDI (Virtual Desktop Infrastructure), enabling broader compute use cases.

🛠️ Deployment Options

VMware offers multiple GPU deployment models, each suited to different needs:

Use Case	Technology	Description
Dedicated GPU	DirectPath I/O	Assigns a full GPU to a single VM. Best for high-performance needs.
Shared GPU	NVIDIA vGPU	Allows multiple VMs to share a single GPU. Ideal for cost-efficiency and flexibility.
Networked GPU	Bitfusion	Enables GPU access over the network, decoupling GPU from physical servers.

These options are supported through partnerships with vendors like NVIDIA, offering tools like NVIDIA Virtual Compute Server (vCS) and vDWS.

📈 Performance Considerations

Performance varies by technology.
DirectPath I/O offers the highest performance but lacks flexibility.
NVIDIA vGPU balances performance with scalability.
Bitfusion enables dynamic GPU allocation across the data center.

🧭 Decision-Making Guidance

System administrators should:

Understand user needs (e.g., training vs. inference).
Evaluate hardware/software compatibility.
Choose the GPU model that aligns with performance, flexibility, and cost goals.

VMDirectPath I/O (Passthrough Mode)

In the first part of this series, we introduced the concept of GPU Compute on vSphere and outlined the various methods available for enabling GPU acceleration in virtual machines. In this second installment, we focus on VMDirectPath I/O, also known as passthrough mode, which allows a virtual machine to directly access a physical GPU device.

What is VMDirectPath I/O?

VMDirectPath I/O enables a GPU to be directly assigned to a VM, bypassing the ESXi hypervisor. This direct access delivers near-native performance, typically within 4–5% of bare-metal speeds. It’s an ideal starting point for organizations transitioning from physical workstations to virtualized environments for GPU workloads.

Key Benefits of Passthrough Mode:

High performance: Minimal overhead due to direct device access.
Simplicity: No third-party drivers required in the ESXi hypervisor.
Dedicated resources: Each GPU is assigned to a single VM, ensuring full utilization.
Cloud replication: Mimics public cloud GPU instances in a private cloud setup.

Limitations:

No GPU sharing between VMs.
vSphere features like vMotion, DRS, and Snapshots are not supported.
Requires careful hardware and firmware configuration.

Step-by-Step Setup Guide

1. Host Server Configuration

1.1 Check GPU Compatibility

Ensure your GPU is supported by your server vendor and can operate in passthrough mode. Most modern GPUs support this.

1.2 BIOS Settings

If your GPU maps more than 16GB of memory, enable the BIOS setting for:

“Above 4G decoding”
“Memory mapped I/O above 4GB”
“PCI 64-bit resource handling above 4G”

This setting is typically found in the PCI section of the BIOS. Consult your vendor documentation for exact terminology.

1.3 Enable GPU for Passthrough in vSphere

Use the vSphere Client to enable the GPU for DirectPath I/O:

Navigate to:
Configure → Hardware → PCI Devices → Edit
Check the box next to your GPU device.
Click OK and reboot the host server.

After reboot:

Go to:
Configure → Hardware → PCI Devices
Confirm the GPU appears under “DirectPath I/O PCI Devices Available to VMs”.

2. Virtual Machine Configuration

2.1 Create the VM

Create a new VM as usual, but pay special attention to the boot firmware setting.

2.2 Enable EFI/UEFI Boot Mode

Navigate to:
Edit Settings → VM Options → Boot Options
Set Firmware to EFI or UEFI.

This is required for proper GPU passthrough functionality.

2.3 Adjust Memory Mapped I/O Settings (if needed)

If your GPU maps more than 16GB of memory (e.g., NVIDIA Tesla P100), you’ll need to configure additional VM parameters.

Steps:

Consult your GPU vendor’s documentation for PCI BAR memory requirements.
If unsure, enable passthrough and attempt to boot the VM.
If the boot fails, check the VM’s vmware.log for memory mapping errors.

Use Cases for VMDirectPath I/O

Single-user ML training or inference
GPU-intensive development environments
Private cloud replication of public cloud GPU instances

This method is best suited for scenarios where dedicated GPU access is required and VM mobility features are not critical.

2.3 Configuring Memory Mapped I/O for High-End GPUs

Some GPUs, like the NVIDIA Tesla P100 or V100, require large memory mappings (>16GB). These mappings are defined in the PCI Base Address Registers (BARs). If your GPU falls into this category, you’ll need to configure two advanced VM parameters:

Steps in vSphere Client:

Navigate to:
Edit Settings → VM Options → Advanced → Configuration Parameters → Edit Configuration
Add the following parameters:
- use64bitMMIO: Enables 64-bit memory mapping.
- 64bitMMIOSizeGB: Set based on the number of high-end GPUs assigned to the VM.

How to Calculate MMIO Size:

Multiply the number of GPUs by 16GB
Round up to the next power of two

Example:

2 GPUs → 2 × 16 = 32 → Round up to 64
1 NVIDIA V100 (32GB BAR) → Round up to 64

This ensures the VM can properly map the GPU’s memory regions.

2.4 Installing the Guest Operating System

Install an EFI/UEFI-capable OS (e.g., modern Linux or Windows versions). This is essential for GPUs requiring large MMIO regions.

Once the OS is installed, install the vendor-specific GPU driver inside the guest OS.

2.5 Assigning the GPU to the VM

Before assigning the GPU:

Power off the VM.

Then:

In the vSphere Client, go to: Edit Settings → Add New Device → PCI Device
Select the GPU from the list.
Confirm it appears in the VM’s hardware list (e.g., “PCI Device 0”).

2.6 Reserving VM Memory

To ensure stability and performance:

Set a memory reservation equal to the VM’s total configured memory.
Navigate to: Edit Settings → Virtual Hardware → Memory → Reservation

This guarantees that the VM has exclusive access to the required memory.

2.7 Verifying GPU Access

After powering on the VM:

Linux: Run
Windows: Open Device Manager and check under Display Adapters

If the GPU appears, it’s ready for use in passthrough mode.

3. Advanced Scenarios

3.1 Multiple VMs Using Passthrough

To deploy multiple VMs:

Clone a base VM before assigning PCI devices.
Assign a dedicated GPU to each VM individually.

3.2 Multiple GPUs in One VM

You can assign multiple GPUs to a single VM using the same steps. Just ensure:

MMIO size is calculated correctly.
Each GPU is added via the PCI Device menu.

This setup is ideal for large-scale ML training or GPU-intensive simulations.

Summary

VMDirectPath I/O is a powerful method for delivering near-native GPU performance in vSphere. While it comes with limitations (no vMotion, DRS, or snapshots), it’s perfect for dedicated workloads and private cloud replication of GPU-enabled environments.

In the next article, we’ll explore shared GPU models like NVIDIA vGPU, which offer more flexibility and scalability for multi-user environments.

Installing NVIDIA Virtual GPU Technology

In this third installment of our series, we explore the NVIDIA vGPU technology—formerly known as NVIDIA GRID—and how it enables flexible, high-performance GPU usage in VMware vSphere environments. This article focuses on compute workloads such as machine learning, deep learning, and high-performance computing (HPC), rather than virtual desktop infrastructure (VDI).

What is NVIDIA vGPU?

NVIDIA vGPU allows virtual machines to share GPU resources or dedicate full GPUs, depending on workload requirements. It’s ideal for environments where:

Applications don’t require full GPU power.
GPU resources are limited and need to be shared.
Flexibility is needed between full and partial GPU allocation.

Key Products in the NVIDIA vGPU Family:

NVIDIA Virtual Compute Server (vCS) – Recommended for ML/AI workloads.
NVIDIA Quadro Virtual DataCenter Workstation (vDWS) – Previously used for compute workloads, now more focused on graphics.

Architecture Overview

NVIDIA vGPU consists of two main components:

NVIDIA Virtual GPU Manager – Installed as a VIB (VMware Installation Bundle) on the ESXi hypervisor.
Guest OS vGPU Driver – Installed inside the virtual machine.

This setup enables GPU virtualization and management across vSphere environments.

1. Setting Up NVIDIA vGPU on the vSphere Host

1.1 Configure GPU for vGPU Mode

Access the ESXi host via SSH or the shell and run:

Alternatively, use the vSphere Client:

Navigate to:
Configure → Hardware → Graphics → Host Graphics → Edit

Reboot the host after changing the setting.

1.2 Verify Host Graphics Settings

Run: # esxcli graphics host get

Default Graphics Type: SharedPassthru
Shared Passthru Assignment Policy: Performance

1.3 Install NVIDIA vGPU Manager VIB

Before installation:

Place the host in maintenance mode: esxcli system maintenanceMode set –enable true
Install the VIB (adjust path as needed): esxcli software vib install -v /vmfs/volumes/<datastore>/NVIDIA/NVIDIA-VMware_ESXi_6.7_Host_Driver_<version>.vi

Message: Operation finished successfully.
Reboot Required: false
VIBs Installed: NVIDIA_bootbank_<driver_name>

Exit maintenance mode: esxcli system maintenanceMode set –enable false

1.4 Verify VIB Installation

Run: sxcli software vib list | grep -i NVIDIA

This confirms the NVIDIA driver is installed.

1.5 Confirm GPU Operation

Run: nvidia-smi

This displays GPU status, usage, and vGPU profiles.

1.6 Check GPU Virtualization Mode

Run: nvidia-smi -q | grep -i virtualization

Virtualization mode: Host VGPU

1.7 Disabling ECC (Pre-vGPU 9.0 Only)

If you’re using NVIDIA vGPU software prior to release 9.0, you must disable ECC (Error Correcting Code) on supported GPUs (e.g., Tesla V100, P100, P40, M6, M60), as ECC is not supported in those versions.

Disable ECC on ESXi: nvidia-smi -e 0
Verify ECC Status: nvidia-smi -q

Expected output:

ECC Mode Current : Disabled Pending : Disabled

Note: ECC is supported in NVIDIA vGPU 9.0 and later, so this step can be skipped if you’re using a newer release.

2. Choosing the vGPU Profile for a VM

Once the vGPU Manager is installed and operational, you can assign a vGPU profile to your virtual machine. Profiles define how much GPU memory and compute capability a VM can access.

Profile Selection in vSphere Client:

Right-click the VM → Edit Settings
Go to Virtual Hardware tab
Use the GPU Profile dropdown to select a profile

Profile Naming Convention:

Example: grid_P100-8q → Allocates 8GB of GPU memory
Example: grid_P100-16q → Allocates full 16GB, dedicating the GPU to one VM

With vCS (Virtual Compute Server) on vSphere 6.7 Update 3 or later, you can assign multiple vGPU profiles to a single VM, enabling access to multiple physical GPUs.

3. Installing the Guest OS vGPU Driver

Ensure the driver version matches the vGPU Manager version installed on ESXi.

3.1 Install Developer Tools (Ubuntu Example):
apt update
apt upgrade
sudo apt install build-essential

3.2 Prepare the NVIDIA vGPU Driver:

Download the .run file from NVIDIA’s site (e.g., NVIDIA-Linux-x86_64-390.42-grid.run)
Copy it to the VM

3.3 Exit X-Windows Server:

Ubuntu:sudo service lightdm stop
CentOS/RHEL:sudo init 3

3.4 Install the Driver:
chmod +x NVIDIA-Linux-x86_64-390.42-grid.run
sudo sh ./NVIDIA-Linux-x86_64-390.42-grid.run

Verify installation: nvidia-smi

3.5 Apply NVIDIA License

Edit the license config file: sudo nano /etc/nvidia/gridd.conf

Add:

ServerAddress=10.1.2.3
FeatureType=1

Restart the service: sudo service nvidia-gridd restart

Verify licensing:sudo grep grid /var/log/messages

Look for messages confirming license acquisition.

4. Installing and Testing CUDA Libraries

4.1 Using Containers (Recommended):

Use NVIDIA Docker and pre-built containers for CUDA and ML frameworks. This simplifies version management and avoids manual dependency resolution.

Refer to VMware’s guide:
“Enabling Machine Learning as a Service with GPU Acceleration”

4.2 Manual Installation (Optional):

You can manually install CUDA libraries and frameworks if containerization is not preferred. Ensure compatibility with the vGPU driver and guest OS.

4.2 Manual Installation of CUDA Libraries

4.2.1 Download the CUDA Packages

Ensure compatibility between the CUDA version and the NVIDIA vGPU driver installed in your guest OS. For example, CUDA 9.1 is compatible with the 390 series of NVIDIA drivers.

Use wget or a browser to download the following packages:

https://developer.nvidia.com/compute/cuda/9.1/Prod/local_installers/cuda_9.1.85_387.26_linux

https://developer.nvidia.com/compute/cuda/9.1/Prod/patches/1/cuda_9.1.85.1_linux

https://developer.nvidia.com/compute/cuda/9.1/Prod/patches/2/cuda_9.1.85.2_linux

https://developer.nvidia.com/compute/cuda/9.1/Prod/patches/3/cuda_9.1.85.3_linux

⚠️ Always verify compatibility with your driver and OS version before downloading.

4.2.2 Install the CUDA Libraries

Run the installer: sudo sh cuda_9.1.85_387.26_linux

Accept the EULA
When prompted to install the driver, choose No
Say Yes to installing libraries, symbolic links, and samples

4.3 Testing CUDA Functionality

Run Sample CUDA Programs

After installation, compile and run the sample programs:

deviceQuery – Verifies GPU detection and basic functionality
vectorAdd – Tests GPU compute capability

Successful execution confirms that the GPU is correctly configured and ready for ML workloads.

About

Enter your site info here.