Archive for March, 2008
VMWware ESX 3.5 – Windows VM CPU spikes & stays high
What a day. I’ve been working an outage since 9 AM this morning related to our VMWare environment. I’m going to keep this short – not because it isn’t interesting, but I want to get the relevant data out there in case someone else has this problem.
A good place to start is with VMWare’s support article 1003638. On Friday evening, we updated 14 hosts in our ESX cluster from 3.02 to 3.50. After the update everything seemed fine – all of our apps tested out and performed normally. We had previously upgraded to virtual center 2.5 last December, so we didn’t have to touch anything there. It turns out that combining Windows guests in a DRS-enabled ESX cluster with Virtual Center 2.5 and ESX 3.50 is where our problem came from.
In the configuration above, allowing DRS to move running virtual machines around was the root cause. DRS initiated moves resulted in high CPU utilization of the guest. The only way to solve it was to: 1) set DRS to manual mode (only let it recommend changes) and 2) restart any existing guest machines that are experiencing the high CPU utilization. By leaving DRS enabled in manual mode, you can still watch its recommendations and initiate them yourself. Manually initiated moves don’t result in the CPU problem. VMWare says a patch is coming for Virtual Center 2.5 that will fix this. The problem was identified in January, and support tells me it is now in the release notes for Virtual Center (although I certainly don’t see it in there). Hope you have better luck. Post a comment if you run across this article and find it helpful in your job.

Creative Commons License