What started off as a good idea, quickly snowballed into my home lab being jacked up for about 3 weeks. I often refer to these things as “life lessons” and unfortunately I usually experience these things a lot. I normally just chalk it up to my ADD and move on !! Hopefully this blog post will help others that might run into this issue as well.
VMworld 2011 is/was fast approaching and I was asked to work with Erin Banks (Rockstar RSA vSpecialist –@banksek blog: http://www.commondenial.com) to help get vShield 5 added into our VMworld Hands on Lab. Specifically we wanted to show off vShield’s new embedded RSA Data Loss Prevention tool. Since I’d never really messed with it I jumped all over the chance to do it. My first thought was, “I guess I should install it just to see how everything works” so I did that on my home lab. vShield 5 installs from a OVF – Which I LOVE !!! So I figured, “how hard can it be to setup, who needs the manual” 🙂 (the install guide to the right is for 4.1 – as soon as 5.0 is published I’ll update it with the proper link) That’s where everything started to go south. I skimmed the first couple of pages of the install guide, mostly to see what the default username and password was and then went about setting it up. It was pretty easy and once you have the OVF deployed and setup its IP you do a lot of the management/install from the vShield 5 Manager webpage. I decided that I would protect all 3(ea) of my vSphere ESXi 5 home hosts and went about installing vShield 5 App, vShield 5 End Point on each system using the vShield Manager 5 Web interface.
That part of the install is pretty much “click and walk away” process.
The net-net is it setups firewalls, rules and a new vSwitch with no physical nic’s assigned to them so that they can communicate with each other.
What I failed to realize was that it’s a bad idea to protect the host that has vCenter installed. When it does that, it treats that VM just like every other VM and locks it down. When that happens, vCenter is not very happy and all hosts show inactive 😦
The first time you see this, or at least the first time I noticed something wrong was when I tried to connect to vCenter 5 with the vSphere 5 Client. It timed out. So I tried to RDP into that server and it timed out. Then I logged into each of the vSphere hosts to figure out which one had my vCenter in it. I opened up the console and everything looked fine. I thought maybe something hung so I bounced the server. I then opened up IE in the VM and noticed I couldn’t get out. I might not be the sharpest tool in the shed but I figured out pretty quickly that I did something wrong. Google confirmed and that’s when the life lesson began 🙂
First thing I came across was a fellow vSpecialist Scott Drummonds blog post over at vPivot.com about it: vShield, vCenter, and Management Clusters
That drove me to the following KB article and the first thing I did for my vCenter 5 .vmx file – http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1028151
That allowed my vCenter VM to surf the web and get a around a bit. I still had issues with vCenter seeing the 3(ea) hosts. I spent a couple of days trying to figure out if it was possible to brute force remove it from the host directly as well as to see if I could uninstall using vShield manager but unfortunately it looks like the protected hosts or the protected vShield 5 VM wouldn’t give me the option to uninstall. It would just give me a status of them (the status was “installed” see “Host Information” screen shot above). I then remember reading a recent blog post called “How to enable & disable ESXi Host SSH via the vSphere Client in vSphere 5” from Simon Seagrave and it dawned on me that maybe I could see if the DVfilter and the vShield stuff could simply be “unchecked” from the firewall rules so off I went and sure enough, it was listed.
Once I unchecked those 2 boxes things started to get back to normal. I could now RDP into my vCenter VM and when I launched the vShield5 Manager I had the option to uninstall.
Now, I did run into some sort of issue on the 2nd host during the uninstall. I’m not sure if it was a bug or what but it didn’t go as smoothly as the first one went. Here is a run down of the steps it takes to uninstall:
Step 5 (rebooting ESX host) was the issue. It would just hang. I thought that since it looked like everything was uninstalled and it was just doing a reboot I could just hard cycle it and call it good. When I did that, it showed up in vShield Manager as still having vShield App and vShield Endpoint installed. So I ran through the process again. It went a little faster but it still hung on rebooting the ESXi host. This time I decided to see if anything stood out on the host’s screen so I connected a monitor to my vSphere 5 ESXi host and the following error was splashed across the screen. “Error Loading /vmware-f.v00 Fatal Error: 33 (inconsistent data)
It made no sense to me why that was happening. Google wasn’t really any help either (that our I was losing patience with googling the issue 🙂 ) I went back into the firewall settings and it looked like the check boxes were checked again so I unchecked them and re-ran the uninstall and everything sailed through. I’m not sure if that really fixed the issue or not but I decided not to press my luck and spend time troubleshooting it more. I was happy it was uninstalled !!
The net-net is RTFM !! It will save you a ton of heartburn. I also think this is a HUGE issue that VMware needs to address. More and more customers are virtualizing vCenter and a sentence in the install guide warning you against doing this is probably not enough. Mike Foley with RSA Security (And honorary vSpecialist) suggested in an e-mail exchange helping me with this blog post that vShield install should:
1) detect the vCenter VM
2) automatically create a firewll ruleset that doesn’t “hose you” (Thats a Foley Techincal Term)
3) using 2, install with “Allow” set on everything by default.
I couldn’t agree more with him. If nothing more, it should flash a huge warning during the install that say “If you run vCenter as a VM, please read the install guide before proceeding” and then require a checkbox, or accept to be pushed before moving on !! Finally this should be a great reminder that VMware’s guidance is to not put management VM’s on production ESX hosts (thanks for the reminder Mr Drummonds). I hate when I help validate, the wrong way, these types of rules !!
So allow me to be the example others follow on what NOT to do when installing vShield 5 for vSphere 5. R T F M !!!!
@vTexan….Rob Randell here from VMware. This is a great post and something that we are looking at addressing as opposed to just being in documentation that is. 😉 I just wanted to clarify one thing about the proposed solution from Mr. Foley above to have a default set of rules for vCenter that allow all traffic to vCenter. While I think that will be important and something that is worth doing, it won’t address the issue that you ran into.
Note that the default vShield App rule is Allow All when you install it. This is so you can install it and not “completely hose yourself” on the install. 😉 So by detecting the vCenter and creating the allows for it, would prevent you from hosing yourself, if/when you changed the default allow to a default deny.
The issue with vCenter being inside the cluster it is managing and using vShield App is different though. The issue is not with the firewall blocking anything. It is the fact that when you configure vShield App, a VMX entry needs to be made for every VM that is on a particular host with a particular vShield App appliance. In short this VMX entry allows the vShield to inspect and pass the traffic for the VMs that are on the host it is protecting. Without these entries a VM cannot communicate at all on the network.
What is happening here is that the vShield Manager is sending a call to the vCenter Server to add these entries into the VMX’s of all of the VMs. What is happening when vCenter is on the cluster it is managing is that it attempts to make the VMX changes to itself. Essentially trying to do surgery on itself. This can work sometimes, but other times if there is even the slightest hiccup, it may fail to put the entries in and cause the vCenter server to not be able to complete the operation. Hence it cuts itself off from the network.
I could go much deeper here, but wanted to get a quick clarification out on this. So the best practice here is to have vCenter virtualized on a management cluster that another vCenter is managing. Kind of a cross managed environment. This will allow you to use vShield App on both sides of the management without the vCenter trying to do surgery on itself.
Hey Rob – thank you very much for the comment and information. It’s greatly appreciated !! I look forward to trying this again in the future 🙂
Tommyt
So is the issue the same with vSphere 5? after seeing this page I scanned through the manuals and saw no reference to the virtualized vcenter problem. I did see a yellow note on the screen during eh install to not have the management or VC on he host as it could cause some minor network issues.
Bob
Hey Bob – i was/am on vSphere 5 when i ran into this issue. Rob’s 2 paragraphs above that start with “The issue with vCenter being inside the cluster” is probably right in line of what you are looking for.
I think the net-net is to segment your management VM’s into their own cluster/network.
My understanding and hope is VMware will be adding more information into their install guides to cut down on these possible issues.
Tommyt
Happend to me too…
Thanks for the comment !! I’m glad I’m not the only one this happened to !!