Skip to content
Menu

ESXi Host Disconnected from vCenter? Quick Fix Guide

Dealing with an ESXi host disconnected from vCenter is one of the most frustrating issues we face as virtualization administrators. When vCenter shows your host as “not responding” or in a “disconnected” state, it can trigger immediate concern, even though the underlying VMs might still be running normally. This disconnection happens when the hostd or vpxa agent becomes unresponsive, which can stem from various underlying factors.

Despite the alarming alerts, it’s important to understand that an ESXi host not responding but VMs still running is a common scenario. This happens because ESXi hosts normally send heartbeats every 10 seconds, and vCenter Server has a 60-second window to receive these signals. When this communication breaks down, vCenter loses its management capabilities while the virtual machines continue operating. In this comprehensive guide, we’ll walk through the systematic troubleshooting steps to quickly restore connectivity between your ESXi host and vCenter Server.

Check the Basics First

Before diving into complex troubleshooting, let’s start with the fundamental checks that often resolve an ESXi host disconnected issue. These basic steps frequently identify the root cause without requiring advanced interventions.

Verify host is powered on and reachable

Initially, confirm your ESXi host is physically powered on and operational. This might seem obvious, nevertheless many administrators overlook this step. Access your host’s remote management interface (iLO, iDRAC, or remote console) to verify its power state. If you encounter a Purple Screen of Death (PSOD), you’ll need to address this specific issue first before attempting to reconnect to vCenter.

Check if ESXi host is pingable from vCenter

Subsequently, test basic network connectivity between your vCenter and the disconnected ESXi host. Open a command prompt on your vCenter Server and ping the ESXi host’s management IP address:

ping <esxi_host_ip>

For ESXi-to-ESXi connectivity testing, use the vmkping command which allows you to specify which vmkernel interface to use:

vmkping -I vmk0 <destination_ip>

A successful ping shows replies with minimal packet loss, whereas failed pings will display “Request timed out” or “100% packet loss” messages.

Confirm DNS resolution for both host and vCenter

Furthermore, DNS resolution issues frequently cause ESXi host disconnection problems. Both forward and reverse DNS lookups must function correctly. All ESXi hosts must be able to resolve each other by:

  • IP address
  • Short name
  • Fully Qualified Domain Name (FQDN)

Test DNS resolution using nslookup:

nslookup vcenter.domain.local
nslookup esxi_hostname.domain.local

If you see errors like “Host not found” or “No such host is known,” you’re facing DNS resolution failures that must be addressed.

Look for ‘esxi host not responding but vms still running‘ symptoms

Finally, an interesting phenomenon occurs when hosts disconnect from vCenter – virtual machines often continue running normally. This happens because the management agents (hostd/vpxa) have failed while the VM runtime processes remain operational. You can verify VM power states directly from the ESXi host using:

vim-cmd vmsvc/getallvms
vim-cmd vmsvc/power.getstate <Vmid>

When troubleshooting this situation, remember that vCenter treats disconnected hosts conservatively for HA purposes – it assumes any VMs on disconnected hosts have failed over if the host fails.

Network and Port Connectivity

Network connectivity issues often underlie ESXi host disconnection problems. Let’s dive deeper into specific network tests that can pinpoint exactly where communication breaks down.

Test management network with vmkping

The vmkping command allows you to test connectivity from specific vmkernel interfaces, making it perfect for troubleshooting management network issues. To use it effectively:

vmkping -I vmk0 <vcenter_ip>

To test jumbo frames with the “don’t fragment” flag, use:

vmkping -d -s 8972 <destination_ip>

The successful response will show packet statistics with 0% loss, indicating a healthy connection. Conversely, 100% packet loss suggests a network path problem.

Check port 902 and 443 between host and vCenter

Two critical ports must remain open for ESXi-vCenter communication:

  • Port 902: Used for heartbeat monitoring and VMware Authorization Daemon (vmware-authd)
  • Port 443: Used for HTTPS communication via the Tomcat web service

From vCenter, test these ports using:

curl -v telnet://<ESXiHostIP>:902
curl -v telnet://<ESXiHostIP>:443

From ESXi, verify with:

nc -zv <vCenterFQDN> 443
nc -zuv <vCenterFQDN> 902

Use netcat (nc) to verify port-level access

Netcat acts as a built-in port scanner on ESXi hosts, offering layer-4 connectivity verification. The basic syntax is:

nc -z <target_ip> <tcp_port>

For successful connections, you’ll see: “Connection to port [tcp/*] succeeded!”

Additionally, specify source interface with:

nc -z -s <source_vmkernel_ip> <target_ip> <tcp_port>

Check for MTU mismatches or LACP misconfigurations

MTU mismatches between vCenter and ESXi hosts frequently cause connection problems. Test with:

ping -M do -s 1472 <ESXi_Host_IP>   (from vCenter)
ping -d -s 1472 <vCenter_IP>        (from ESXi host)

Gradually decrease packet size until successful pings occur to identify the maximum working MTU.

Regarding LACP (Link Aggregation Control Protocol), misconfigured LAGs can cause hosts to become unreachable when the Link Aggregation Group goes down. This especially happens during host reboots when ports get suspended with errors like “%EC-5-L3DONTBNDL2: Port suspended: LACP currently not enabled.”

Agent and Service Health

When basic connectivity checks pass but your ESXi host remains disconnected, agent and service issues are likely the culprit. The hostd and vpxa services form the backbone of ESXi-vCenter communication, and their failure frequently causes disconnection problems.

Restart hostd and vpxa agents on ESXi

I’ve found that restarting management agents resolves most disconnection issues quickly. You can accomplish this through two methods:

Using SSH (if enabled):

/etc/init.d/hostd restart
/etc/init.d/vpxa restart

Via Direct Console User Interface (DCUI):

  1. Press F2 and log in as root
  2. Navigate to Troubleshooting Options
  3. Select Restart Management Agents
  4. Press F11 to confirm

ImportantIf LACP is configured on the vSAN network, do not restart management agents on ESXi hosts running vSAN. Likewise, avoid using the services.sh command with NSX or shared graphics environments.

Check if hostd is unresponsive via DCUI

An unresponsive hostd often manifests as:

  • No key input accepted on DCUI
  • ESXi host shows “Not Responding” in vCenter
  • Unable to connect via SSH, yet host responds to pings

In essence, if DCUI navigation presents long delays or you can’t switch to the shell interface, hostd may have become unresponsive due to high system latency.

Review vpxd.log and vobd.log for errors

For troubleshooting, examine these key logs:

  • vpxd.log: Located at /var/log/vmware/vpxd/vpxd.log on vCenter Server, contains information about vCenter operations
  • hostd.log: Found at /var/log/hostd.log on ESXi, records VM operations and host events
  • vobd.log: Focuses primarily on storage and iSCSI operations
  • vpxa.log: Located at /var/log/vpxa.log on ESXi, records interactions between host and vCenter

Ensure ESXi version is compatible with vCenter

As a rule, your vCenter version should be equal to or greater than your ESXi hosts. Important to realize, ESXi hosts on a newer version than vCenter may not properly process agent updates, sometimes resulting in vSAN network partitions.

For vCenter 8, ensure ESXi hosts are at minimum version 6.7 Update 3, although upgrading to ESXi 7.0 or 8.0 provides better compatibility. Older hosts managed by newer vCenter versions frequently experience agent communication issues.

Storage and System-Level Issues

Storage-related problems frequently cause ESXi host disconnections from vCenter, yet they’re often overlooked in initial troubleshooting. These deeper system issues require methodical investigation to resolve.

Check for APD/PDL or datastore latency

All-Paths-Down (APD) or Permanent Device Loss (PDL) conditions commonly trigger host disconnections. In APD scenarios, storage becomes inaccessible temporarily, causing the ESXi host to continuously retry I/O operations. This can tie up management agent threads, making hostd unresponsive. For PDL events, check vmkernel logs for SCSI sense code “0x5 0x25 0x0” indicating the device is permanently unavailable.

Storage latency spikes often precede disconnection events. Monitor for warning messages like: “Device performance has deteriorated. I/O latency increased from average value of X microseconds to Y microseconds”.

Look for Purple Diagnostic Screens (PSOD)

Purple Diagnostic Screens indicate kernel-level failures requiring immediate attention. Common PSOD causes include hardware failures (particularly RAM or CPU issues), displaying “MCE” or “NMI” error codes. After a PSOD, the host terminates all services and VMs abruptly.

Verify no duplicate IPs in the environment

Duplicate IP addresses frequently cause intermittent connectivity issues. Check vobd.log for entries containing “esx.problem.net.vmknic.ip.duplicate”. Create vCenter alarms to detect these conditions automatically.

Ensure vCenter Managed IP is correctly set

After IP address changes, hosts may disconnect because they’re still trying to communicate with vCenter’s old IP address. Verify the vCenter managed IP settings via vCenter’s Advanced Settings, specifically checking “VirtualCenter.AutoManagedIPV4” value. Trailing whitespace in these fields can also cause connectivity failures.

Conclusion

Dealing with ESXi host disconnection issues certainly tests our patience as virtualization administrators. Nevertheless, a methodical approach significantly increases our chances of quickly resolving these frustrating problems. We’ve seen how a systematic troubleshooting process starts with basic connectivity checks, then progressively moves through network, agent, and storage-level investigations.

Remember that most disconnection issues stem from either management agent failures or network connectivity problems. Therefore, restarting hostd and vpxa services often provides the quickest resolution. Additionally, thorough network testing, especially focusing on ports 443 and 902, helps identify communication bottlenecks between your ESXi hosts and vCenter.

Perhaps the most important takeaway centers on understanding that “not responding” hosts frequently continue running VMs without interruption. This happens because the control plane (management agents) operates separately from the data plane (VM execution). During troubleshooting, this knowledge helps us maintain perspective and avoid unnecessary panic.

Storage-related issues, though less obvious, undoubtedly cause many disconnection scenarios. APD/PDL conditions or excessive latency warrant close examination, particularly when connectivity problems persist after basic troubleshooting steps.

DNS resolution, MTU mismatches, and duplicate IP addresses round out the common culprits we’ve explored. Subsequently, establishing monitoring for these potential failure points helps prevent future disconnection events.

The next time vCenter shows your hosts as disconnected, you can now approach the problem confidently with this systematic troubleshooting framework. After all, most disconnection issues resolve quickly once you identify the specific underlying cause.

Key Takeaways

When your ESXi host disconnects from vCenter, these systematic troubleshooting steps will help you quickly restore connectivity and maintain your virtualized environment.

• Start with basics first: Verify host power, ping connectivity, and DNS resolution before diving into complex troubleshooting steps.

• Restart management agents: Use /etc/init.d/hostd restart and /etc/init.d/vpxa restart via SSH or DCUI to resolve most disconnection issues quickly.

• Test critical network ports: Ensure ports 443 and 902 are open between ESXi and vCenter using netcat or curl commands for proper communication.

• Check storage health: Monitor for APD/PDL conditions and storage latency spikes that can cause hostd to become unresponsive.

• Remember VMs keep running: ESXi host disconnection affects management plane only – virtual machines typically continue operating normally during these events.

Most ESXi disconnection issues resolve through agent restarts or network connectivity fixes. The key is following a methodical approach rather than jumping to complex solutions immediately.

FAQs

Q1. What should I do if my ESXi host disconnects from vCenter but VMs are still running? First, check basic connectivity by verifying the host is powered on and pingable. Then, restart the management agents (hostd and vpxa) on the ESXi host. If issues persist, investigate network connectivity, particularly ports 443 and 902, between the host and vCenter.

Q2. How can I restart management agents on an ESXi host? You can restart management agents via SSH (if enabled) using the commands /etc/init.d/hostd restart and /etc/init.d/vpxa restart. Alternatively, use the Direct Console User Interface (DCUI) by selecting “Restart Management Agents” under Troubleshooting Options.

Q3. Why might an ESXi host become unresponsive but still run VMs? This occurs because the management agents (hostd/vpxa) have failed while the VM runtime processes remain operational. The disconnection affects the management plane (vCenter’s ability to manage the host) but not the data plane where VMs continue to run.

Q4. What network ports are crucial for ESXi-vCenter communication? Two critical ports must remain open: Port 902 for heartbeat monitoring and VMware Authorization Daemon, and Port 443 for HTTPS communication via the Tomcat web service. You can test these using netcat (nc) or curl commands.

Q5. How do storage issues contribute to ESXi host disconnections? Storage problems like All-Paths-Down (APD) or Permanent Device Loss (PDL) conditions can cause host disconnections. These issues can tie up management agent threads, making hostd unresponsive. Additionally, high storage latency can precede disconnection events. Check vmkernel logs for related error messages.

Leave a Reply

Your email address will not be published. Required fields are marked *