Migrate VM from Hyper-V to vSphere with Pre-Installed VMware Tools

Note: This post is written specifically for VMware Tools 10. If you’re looking for a procedure that works with VMware Tools 11 or VMware Tools 12, you can see my latest blog post here.

One of things I rarely get to do is work with Hyper-V, however, I’m starting to get more exposure to it as I encounter more organizations that are either running all Hyper-V or are doing some type of migration between Hyper-V and vSphere.

One of the biggest challenges that I’ve both heard and encountered in my own testing is really around drivers. If you’re making the move from Hyper-V to vSphere, you’re going to have to figure out how to get your network settings migrated along with the virtual machines, whether manually or in a more automated way.

And yes! You can definitely use Zerto as the migration vehicle and take advantage of benefits like:

  • Non-disruptive replication
  • Automatic conversion of .vhdx to .vmdk (and vice versa)
  • Non-disruptive testing before migrating
  • Boot Order
  • Re-IP

For re-IP operations , Zerto requires that VMware Tools is installed running on the VMs you want to protect.

Zerto Administration Guide for vSphere

There are two ways to accomplish a cross-hypervisor migration or failover with Zerto.

Installing the VMware Tools is going to be required either way. If you choose to install the VMware Tools before migrating or protecting, you are going to get much better results.

Post-installation of the VMware Tools will prevent the capability to automatically re-IP or even keep the existing network settings, therefore, you will end up having to hand-IP every VM you migrate/failover, which seriously cuts into any established recovery time objective (RTO) and leaves more room for human error.

Overview

We will walk through what you need to do in order to get VMware Tools prepared for installation on a Hyper-V virtual machine. After that, there is a video at the end of this post that will pick demonstrate successful pre-installation of VMware Tools, replication, and migration of a VM from Hyper-V.

At the time of this writing, the versions of Zerto, Hyper-V, and vSphere that I have performed the steps that follow are:

  • Zerto 8.0
  • Hyper-V 2016
  • vSphere 6.7 (VMware Tools from 6.7 as well)

I also wanted to give a shout out to Justin Paul, who had written a similar blog post about this same subject back in 2018. You can find his original post here: https://bit.ly/3dfWKdm

Pre-Requisites

Like a recipe, you’re going to need a few things:

VMware Tools

You will need to obtain a copy of the VMware Tools, and it must be a version supported by your version of vSphere. You can use this handy >>VMware version mapping file<< to see what version of the tools you’d need.

You can get the tools package by mounting the VMware Tools ISO to any virtual machine in your vSphere environment, browsing the virtual CD-ROM, and copying all the files to your desktop. If you don’t have an environment available, you can also >>download the installer<< straight from VMware (requires a My VMware account).

Since you only need a few files from the installer package, start the installer on your desktop and wait for the welcome screen to load. Once that screen loads, if you’re on a physical machine (laptop, PC, etc…), you’re going to get a pop-up stating that you can only install VMware Tools inside a virtual machine. DO NOT dismiss this pop-up just yet.

  1. Go to Start > Run and type in %TEMP% , the press Enter.
  2. Look for a folder that follows this naming convention {VVVVVVVV-WWWW-XXXX-YYYY-ZZZZZZZZZZZZZ} followed by “-setup” appended to it and open it.

    Open this folder and copy the 3 files out of it to your desktop.
  3. Copy the following 3 files to a folder on your desktop: vcredist_x64.exe, vcredist_x86.exe, and VMware Tools64.msi

    3 Required Files to Copy
  4. Once you’ve saved the files somewhere else, you can now dismiss the popup and exit the VMware Tools installer.

Microsoft Orca

Microsoft Orca is a database table editor that can be used for creating and editing Windows installer packages. We’re going to be using it to update the VMware Tools MSI file we just extracted in the previous steps, to allow it to be installed within a Hyper-V virtual machine.

Orca is part of the Windows SDK that can be downloaded from Microsoft (https://bit.ly/3d7aWoZ). Download the installer, and not the ISO (it’s easier to get exactly what you want this way).

Run the installer and when you get to the screen where you’ll need to Select the features you want to install, select only MSI tools and complete the installation.

After installation is completed, you can search your start menu for “orca” or browse to where it was installed to and launch Orca.

Edit VMware Tools MSI with Orca

Now that we’ve got the necessary files we need, and Orca installed, we’re going to need to edit the VMware Tools MSI to remove an installer pre-check that prevents installation on any other platform than vSphere.

  1. Launch Orca
  2. Click Open, and browse to where you saved VMware Tools64.msi, select it, and click Open.

    Launch Orca and Open VMware Tools MSI
  3. In the left window pane labeled Tables, scroll down and click on InstallUISequence.
  4. In the right window pane, look for the line that says VM_CheckRequirements. Right-click on this entry, and select Drop Row.

    InstallUISequence srcset= VM_CheckRequirements > Drop Row”>
  5. Click save on the toolbar, and close the MSI file. You can also exit Orca now.

What next?

I’ve made you read all the way down to here to tell you that if you want to skip the previous steps and are looking to do this for vSphere 6.7, I have a copy of the MSI that is ready for installation on a Hyper-V virtual machine. If you need it, send me a message on Twitter: @eugenejtorres

Now that you’ve got an unrestricted copy of the VMware Tools MSI package. Copy the VMware Tools MSI along with the vc_redist(x86/x64) installers to your target Hyper-V VMs (or a network share they can all reach), and start installing.

Important: When installing VMware Tools on the Hyper-V virtual machine, you may get the following error:

If you receive the error above, it means you’re missing Microsoft Visual C++ 2017 Redistributable (x64) on that VM.

If this is the case, click cancel and exit the VMware Tools installer. Run the vcredist_x64.exe installer that you copied earlier, and then retry the VMware Tools Installer.

Demo

Since you’ve gotten this far, the next step is to test to validate the procedure. Take a look at the video below to see what migration via Zerto looks like after you’ve taken the steps above.

If you have any questions or found this helpful, please comment. If you know someone that needs to see this, please share and socialize! Thanks for reading!

Share This:

How To: Migrate Windows Server 2003 to Azure via Zerto, Easily

So since Microsoft has officially ended extended support for Windows Server on July 15, 2015, that means that you may not be able to get support or any software updates. While many enterprises are working towards being able to migrate applications to more current versions of Windows, alongside initiatives to adopt more cloud services; being able to migrate the deprecated OS to Azure is an option to enable that strategy and provide a place for those applications to run in the meantime.

Be aware though that although Microsoft support (read this) may be able to help you troubleshoot running Windows Server 2003 in Azure, that doesn’t necessarily mean they will support the OS. That said, if you are running vSphere on-premises and still wish to get these legacy systems out of your data center and into Azure, keep reading and I’ll show you how to do it with Zerto.

Please note that I’ve only tested this with the 64-bit version of the OS (Windows Server 2003 R2). EDIT: this has also been verified to work on the 32-bit version of the OS – Thanks Frank!)

The Other Options…

While the next options are totally doable, think about the amount of time involved, especially if you have to migrate VMs at scale. Once you’re done taking a look at these procedures, head to the next section. Trust me, it can be done more easily and efficiently.

  • Migrate your VMs from VMware to Hyper-V
    • … Then migrate them to Azure. Yes, it’s an option, but from what I’ve read, it’s really just so you can get the Hyper-V Integration Services onto the VM before you move it to Azure. From there, you’ll need to manually upload the VHDs to Azure using the command line, followed by creating instances and mounting them to the disks. Wait – there’s got to be a better way, right?
  • Why migrate when you can just do all the work from vSphere, run a bunch of powershell code, hack the registry, convert the disk to VHD, upload, etc… and then rinse and repeat for 10’s or 100’s of servers?
    • While this is another way to do it, take a look at the procedure and let me know if you would want to go through all that for even JUST ONE VM?!
  • Nested Virtualization in Azure
    • Here’s another way to do it, which I can see working, however, you’re talking about nesting a virtual environment in the cloud and perhaps run production that way? While even if you have Zerto you can technically do this, there would have to be a lot of consideration that goes in to this… and likely headache.

Before You Start

Before you start walking through the steps below, this how-to assumes:

  1. You are running the latest version of Zerto at each site.
  2. You have already paired your Azure ZCA (Zerto Cloud Appliance) to your on-premises ZVM (Zerto Virtual Manager)
  3. You already know how to create a VPG in Zerto to replicate the workload(s) to your Azure subscription.

Understand that while this may work, this solution will not be supported by Zerto, this how-to is solely written by me, and I have tested and found this to work. It’s up to you to test it.

Additionally, this is likely not going to get any support from Microsoft, so you should test this procedure on your own and get familiar with it.

This does require you to download files to install (if you don’t have a Hyper-V environment), so although I have provided a download link below, you are responsible for ensuring that you are following security policies, best practices, and requirements whenever downloading files from the internet. Please do the right thing and be sure to scan any files you download that don’t come directly from the manufacturer.

Finally – yeah, you should really test it to make sure it works for you.

Migrating Legacy OS Using Zerto

Alright, you’ve made it this far, and now you want to know how I ended up getting a Windows Server 2003 R2 VM from vSphere to Azure with a few simple steps.

Step 1: Prepare the VM(s)

First of all, you will need to download the Hyper-V Integration Services (think of them as VMware Tools, but for Hyper-V, which will contain the proper drivers for the VM to function in Azure).

I highly suggest you obtain the file directly from Microsoft if at all possible, or from a trustworthy source. At the least, deploy a Hyper-V server and extract the installer from it yourself.

If you have no way to get the installer files for the Hyper-V Integration Services, you can download at your own risk from here. It is the exact same copy I used in my testing, and will work with Windows Server 2003 R2.

  1. Obtain the Hyper-V Integration Services ISO file. (hint: look above)
  2. Once downloaded, you can mount the ISO to the target VM and explore the contents. (don’t run it, because it will not allow you to run the tools installation on a VMware-hosted workload).
  3. Extract the Support folder and all of it’s contents to the root of C: or somewhere easily accessible.
  4. Create a windows batch file (.bat) in the support folder that you have just extracted to your VM. I put the folder in the root of C:, so just be aware that I am working with the C:\Support folder on my system.
  5. For the contents of the batch file, change directory to the C:\Support\amd64 folder (use the x86 folder if on 32-bit), then on the next line type: setup.exe /quiet (see example below). The /quiet switch is very important, because you will need this to run without any intervention.

    Example of batch file contents and folder path
  6. Save the batch file.
  7. On the same VM, go to Control Panel > Scheduled Tasks > Add Scheduled Task. Doing so will open the Scheduled Task Wizard.

    Create a scheduled task
  8. Click Next
  9. Click browse and locate the batch file you created in step 5-6, and click open

    Browse to the batch file
  10. Select when my computer starts, and click next

    Select when my computer starts
  11. Enter local administrator credentials (will be required because you will not initially have network connectivity), and click next

    enter admin credentials
  12. Click Finish

Step 2: Create a VPG in Zerto

The previous steps will now have your system prepared to start replicating to Azure. Furthermore, what we just did, basically will allow the Hyper-V Integration Services to install on the Azure instance upon boot, therefore enabling network access to manage it. It’s that simple.

Create the VPG (Virtual Protection Group) in Zerto that contains the Windows Server 2003 R2 VM(s) that you’ve prepped, and for your replication target, select your Microsoft Azure site.

If you need to learn how to create a VPG in Zerto, please refer to the vSphere Administration Guide – Zerto Virtual Manager documentation.

Step 3: Run a Failover Test for the VPG

Once your VPG is in a “Meeting SLA” state, you’re ready to start testing in Azure before you actually execute the migration, to ensure that the VM(s) will boot and be available.

Using the Zerto Failover Test operation will allow you to keep the systems running back on-premises, meanwhile booting them up in Azure for testing to get your results before you actually perform the Move operation to migrate them to their new home.

  1. In Zerto, select the VPG that contains the VM(s) you want to test in Azure (use the checkbox) and click the Test button.

    Select VPG, click Test
  2. Validate the VPG is still selected, and click Next.

    Validate VPG, click Next
  3. The latest checkpoint should already be selected for you. Click Next

    Verify Checkpoint, click Next
  4. Click Start Failover Test.

    Start Failover Test

After you click Start Failover Test, the testing operation will start. Once the VM is up in Azure, you can try pinging it. If it doesn’t ping the first time, reboot it, as the Integration Services may require a reboot before you can RDP to it (I had to reboot my test machine).

When you’re done testing, click the stop button in Zerto to stop the Failover Test, and wait for it to complete. At this point, if everything looks good, you’re ready to plan your migration.

If you did anything different than what I had done, remember to document it and make it repeatable :).

Next Steps

Once you’ve validated that your systems will successfully come up you can then schedule your migration. When you perform the migration into Azure, I recommend using the Move Operation (see image below), as that will be the cleanest way to get the system over to Azure in an application-consistent state with no data loss, as opposed to seconds of data loss and a crash-consistent state that the failover test, or failover live operations will give you.

Note: Before you run the Move Operation, it will be beneficial to uninstall VMware Tools on the VM(s) that you are moving to Azure. It has been found that not doing so will not allow you to uninstall them once in Azure.



Move Operation


Recommendations before you migrate:

  • Document everything you do to make this work. (it may come in handy when you’re looking for others to help you out)
  • Be sure to test the migration beforehand using the Failover Test Operation.
  • Check your Commit settings in Zerto before you perform the Move Operation to ensure that you allow yourself enough time to test before committing the workload to Azure. Current versions of Zerto default the commit policy to 60 minutes, so should you need more time, increase the commit policy time to meet your needs.
  • Be sure to right-size your VMs before moving them to the cloud. If they are oversized, you could be paying way more in consumption than you need to with bigger instance sizes that you may not necessarily need.

That’s it! Pretty simple and straightforward. To be honest, obtaining a working copy of Windows Server 2003 R2 and the Hyper-V Integration Services took longer than getting through the actual process, which actually worked the first time I tried it.

If this works for you let me know by leaving a comment, and if you find this to be valuable information that others can benefit from, please socialize it!

Cheers!

Share This:

Single vCenter, Single ZVM, and Recovering Zerto in a Failure Scenario

As a follow-up to my previous blog entry titled “Zerto Virtual Manger Outage, Replication, and Self-Healing“, which covers a ZVM failure scenario in an environment with paired ZVMs and two vCenters, I also decided to test and document what I found to be a useful solution to being able to recover from a failed ZVM in an environment where there is only one vCenter and one instance of Zerto Virtual Replication installed.  Granted, this is generally not a recommended deployment topology due to potentially having a single point of failure, this type of deployment does exist, and this should provide a suitable solution to allow recovery.

The following has been successfully tested in my lab, which is a vSphere environment, but I also do anticipate that this solution can also be carried over to a Hyper-V environment; which I’m hoping to test soon.

Since my lab originally consisted of two vCenters and two ZVMs, I first had to tear it down to become a single vCenter and single ZVM environment for the test.  Here is what I did, should you want to test this on your own before deciding whether or not you want to actually deploy it in your environment.

Disclaimer: 

Once again, this is not generally a recommended configuration, and there are some caveats similar to the referenced blog entry above, but with that said, this will allow you to be able to recover if you have Zerto deployed in your environment as described above.

Considerations

Please note that there may be some things to look out for when using this solution because of how the journal contains data until the checkpoints have been committed to the replica disk:

  • Journal disk being added at the time of a ZVM failure
  • VRA installation, new VRA installation at the time of a ZVM failure
  • Changes made to protected VMs (VMDK add) may not be captured if coinciding with a ZVM failure
  • VPG settings changed at the time of a ZVM failure, such as adding/removing a VM from a VPG

 

Based on additional testing I’ve done, it makes best sense to keep the journal size of the VPG protecting the ZVM as short as possible because any changes that occur to the ZVM (any of the above) will first go to the journal before aging out and being committed to the replica disk.  If those changes don’t commit to the disk, they will not appear in the UI when the ZVM is recovered using this method.

This was found by creating a VPG to protect another set of workloads, and then 10 minutes later, running through the recovery steps for the ZVM.  What I didn’t account for here is the FIFO (first-in-first-out) nature of the journal.  Because the change I had made resided within journal for the protected ZVM, it did not get a chance to age out to disk.  Recovering from the replica did not include the new VPG.

As a result, the recommendation for journal history when protecting the ZVM would be 1 hour (the minimum) – meaning your RPO for the ZVM will be 1 hour.

Setup the Test Environment

Before you can test this, you will need to configure your lab environment for it.  The following assumes your lab consists of two vCenters and 2 installations of Zerto Virtual Replication.  If your lab only has 1 vCenter, simply skip the “lab recovery site” section and move to the “lab protected site” steps.

In lab recovery site:

  1. Delete all existing VPGs
  2. Delete VRAs (via the ZVM UI)
  3. Un-pair the two ZVR sites (in the sites tab in ZVM UI)
  4. Remove hosts from recovery site vCenter

In lab protected site:

  1. [Optional] Create a new cluster, and add the hosts you removed from your recovery site.
  2. Deploy VRAs any hosts you’ll be using in for the test.
  3. Configure VPGs.

Protect the ZVM using Zerto

One thing I’ve wondered about that I finally got around to testing is actually protecting the ZVM itself using ZVR.  I’m happy to say, it appears to work just fine.  After all, Zerto does not make use of agents, snapshots, or disrupt production for that matter, as the technology basically replicates/mirrors block writes from the protected to the recovery site after they’re acknowledged via the virtual replication appliances, not touching the protected workload.

Protecting the ZVM is as simple as protecting any other application, via a VPG (Virtual Protection Group).  While you can likely protect the ZVM via storage snaps and replication, you’re still not going to get an RPO anywhere close to what Zerto itself can provide, which is typically in seconds – many cases single-digit seconds.  What this means, is that your amount of data loss, in the case of the ZVM, will likely be in minutes, even shorter if you can automate the recovery portion of this solution via scripting.

So, a few things to make this solution easier when creating the VPG to protect the ZVM:

  1. When selecting your default recovery server for the VPG that protects the ZVM, select a host, as opposed to a cluster.  This allows you to easily locate the VRA responsible for protecting the ZVM.  Further on through this article, you’ll see why.
  2. Select a specific datastore for recovery.  You can select a datastore cluster, but for the same reasons as above, selecting a specific datastore allows you to easily locate the disk files for the “recovery replica” of the ZVM in the event of a failure.

    Replication Settings - VPG Creation Wizard

  3. Select the production network/portgroup that houses the production IP space for the ZVM (Recovery tab of VPG creation wizard).  We will not be changing the IP address.Recovery Tab - VPG Creation Wizard
  4. Do not change the IP address for failover/move or test (in the NICs tab of the VPG creation wizard).

    NIC Settings - VPG Creation Wizard

Once you’ve created the VPG, allow initial sync to complete.  As you can see below, I now have a VPG containing the ZVM.  Please note that I’m protecting only the ZVM because I am using the embedded SQL CE database.  Using an external SQL server for the ZVR database will require additional planning.  Once initial sync has been completed, you’re ready to begin the actual failure test and recovery.

VPG List - Protecting ZVM

Simulate a Failure of the Primary ZVM

In order to test the recovery, we will need to simulate a failure of the Primary ZVM.

  1. Power off the ZVM.  Optionally, you can also go as far as deleting it from disk.  Now you know there’s no coming back from that scenario.  The ZVM will be gone.

Recover the ZVM Using the Replica

If you remember form the blog post linked at the beginning of this one, even if the ZVM is down, the VRAs are still replicating data.  Knowing that, the VRA in the recovery site (in this case on the recovery host) will have a lock on the VMDK(s) for the ZVM.  That is why I mentioned it would be good to know what host you’re replicating the ZVM to.

  1. IMPORTANT: Before you can start recovering, you will need to shutdown the VRA on the host specified for recovery.  Doing so will ensure that any lock on the VMDK(s) for the replica will be released.
  2. Once the VRA has been shutdown, open the datastore browser and move or copy the VMDK(s) out of the VRA folder to another folder.  By doing this, you’re making sure that if that VRA comes back up before you can delete the VPG protecting the ZVM, there will not be a conflict/lock.  If you select to copy the files, rather than move them, then you can use the existing replica as a pre-seed to re-protect the ZVM.
  3. Create a new VM using the vSphere client.
  4. Select to create a Custom virtual machine.

    Create VM - Custom

  5. Provide a name for the VM that doesn’t already exist in vCenter if you did not delete the original “failed” ZVM.  This ensures there won’t be a naming conflict.
  6. Select the datastore where you copied the replica VMDK(s) to.
  7. Select the Virtual Machine Version.  In this case, you can leave the default, which will be the latest version supported by vSphere version.

    Create VM - vHW Version

  8. Select the OS version for the ZVM.

    Create VM - OS Version

  9. Select the number of vCPUs required. (Match what the original ZVM had)
  10. Select the amount of memory to allocate to the VM. (Match what the original ZVM had)
  11. Select the PortGroup and Adapter type and make sure it’s set to connect at power on.  This should match the original.  My original ZVM had been configured with VMXNET3, so that’s what I selected.
  12. Select the SCSI controller to use.  Again, try to match the original.  Mine was LSI Logic SAS.
  13. On the Select a Disk screen, select Use an existing virtual disk.

    Create VM - Select Existing Disk

  14. Browse to the location of the ZVM replica’s VMDK(s) you copied, and select the disk and click OK.

    Create VM - Select existing disk file

  15. Leave the advanced options at default.
  16. On the summary screen, click Finish.
  17. When the creation is completed, power on the VM, open the console, and watch it boot up.  At this point, DO NOT power on the VRA that you previously shutdown.  There will be some cleanup, especially if you did not copy the VMDK(s) to another location.

Power on new VM created using existing disk.

Clean-up

Once the recovered ZVM has booted up, go ahead and log in to the Zerto UI.  Don’t be alarmed that everything is red.  This is because the ZVM is coming up from being down for a while, and it needs to run some checks, and get re-situated with the VRAs and begin creating new checkpoints again.  Once that process completes, as we saw in the previous blog article (referenced at the beginning of this one), things will start to go green and into a “Meeting SLA” state.

  1. Click on the VPGs tab.
  2. Locate the VPG previously created to protect the ZVM, and delete it.  If you want to retain the original replica disks as a pre-seed, make sure you select the checkbox labeled Keep the recovery disks at the peer site.  Please note that because the VRA that was protecting this VPG is still down, you may need to click delete again, and force the deletion of the VPG.

    Delete VPG - Preserve recovery disks.

  3. Once the VPG is deleted, go ahead and power on the VRA you previously shutdown.

Verify ZVR Functionality

Now that we’ve cleaned up and powered the VRA back up, you can verify that ZVR is working again, and the ZVM is performing its duty of creating and tracking checkpoints in the journal again.  You can do this by starting to initiate a failover test and clicking to see what checkpoints are available, or by attempting to recover a file from the journal from any one of the VPGs.

Validate checkpoint functionality

(Above) you can see when the ZVM went down, and when it started creating and tracking checkpoints again.

Validate JFLR

(Above) Restored a file from the Journal.

Summary

While this is not an optimal/recommended configuration, through testing and validation, we have seen that even in a single ZVM, single vCenter environment, being able to recovery the platform that is providing your resiliency services is completely possible.  Granted, there will be some data loss (RPO) on the ZVM itself, despite being down for time between the failure and the recovery, Zerto Virtual Replication is clearly able to pick up where it left off, and resume protection of your environment.

If you found this to be useful, please share, comment, and let me if you’ve tried this for yourself!

Share This:

Zerto Virtual Manager Outage, Replication, and Self-Healing

I’ve decided to explore what happens when a ZVM (Zerto Virtual Manager) in either the protected site or the recovery site is down for a period of time, and what happens when it is back in service, and most importantly, how an outage of either ZVM affects replication, journal history, and the ability to recover a workload.

Before getting in to it, I have to admit that I was happy to see how resilient the platform is through this test, and how the ability to self-heal is a built in “feature” that rarely gets talked about.

Questions:

  • Does ZVR still replicate when a ZVM goes down?
  • How does a ZVM being down affect checkpoint creation?
  • What can be recovered while the ZVM is down?
  • What happens when the ZVM is returned to service?
  • What happens if the ZVM is down longer than the configured Journal History setting?

Acronym Decoder & Explanations

ZVMZerto Virtual Manager
ZVRZerto Virtual Replication
VRAVirtual Replication Appliance
VPGVirtual Protection Group
RPORecovery Point Objective
RTORecovery Time Objective
BCDRBusiness Continuity/Disaster Recovery
CSPCloud Service Provider
FOTFailover Test
FOLFailover Live

Does ZVR still replicate when a ZVM goes down?

The quick answer is yes.  Once a VPG is created, the VRAs handle all replication.    The ZVM takes care of inserting and tracking checkpoints in the journal, as well as automation and orchestration of Virtual Protection Groups (VPGs), whether it be for DR, workload mobility, or cloud adoption.

In the protected site, I took the ZVM down for over an hour via power-off to simulate a failure.  Prior to that, I made note of the last checkpoint created.  As the ZVM went down, within a few seconds, the protected site dashboard reported RPO as 0 (zero), VPG health went red, and I received an alert stating “The Zerto Virtual Manager is not connected to site Prod_Site…”

The Zerto Virtual Manager is not connected to site Prod_Site

 

Great, so the protected site ZVM is down now and the recovery site ZVM noticed.  The next step for me was to verify that despite the ZVM being down, the VRA continued to replicate my workload.  To prove this, I opened the file server and copied the fonts folder (C:\Windows\Fonts) to C:\Temp (total size of data ~500MB).

As the copy completed, I then opened the performance tab of the sending VRA and went straight to see if the network transmit rate went up, indicating data being sent:

VRA Performance in vSphere, showing data being transmitted to remote VRA in protected site.

Following that, I opened the performance monitor on the receiving VRA and looked at two stats: Data receive rate, and Disk write rate, both indicating activity at the same timeframe as the sending VRA stats above:

Data receive rate (Network) on receiving/recovery VRA Disk write rate on receiving/recovery VRA

As you can see, despite the ZVM being down, replication continues, with caveats though, that you need to be aware of:

  • No new checkpoints are being created in the journal
  • Existing checkpoints up to the last one created are all still recoverable, meaning you can still recover VMs (VPGs), Sites, or files.

Even if replication is still taking place, you will only be able to recover to the latest (last recorded checkpoint) before the ZVM went down.  When the ZVM returns, checkpoints are once again created, however, you will not see checkpoints created for the entire time that ZVM was unavailable.  In my testing, the same was true for if the recovery site ZVM went down while the protected site ZVM was still up.

How does the ZVM being down affect checkpoint creation?

If I take a look at the Journal history for the target workload (file server), I can see that since the ZVM went away, no new checkpoints have been created.  So, while replication continues on, no new checkpoints are tracked due to the ZVM being down, since one of it’s jobs is to track checkpoints.

Last checkpoint created over 30 minutes ago, right before the ZVM was powered off.

 

What can be recovered while the ZVM is down?

Despite no new checkpoints being created – FOT or FOL – VPG Clone, Move, and File Restore services are still available for the existing journal checkpoints.  Given this was something I’ve never tested before, this was really impressive.

One thing to keep in mind though is that this will all depend on how long your Journal history is configured for, and how long that ZVM is down.  I provide more information about this specific topic further down in this article.

What happens when the ZVM is returned to service?

So now that I’ve shown what is going on when the ZVM is down, let’s see what happens when it is back in service.  To do this, I just need to power it back up, and allow the services to start, then see what is reported in the ZVM UI on either site.

As soon as all services were back up on the protected site ZVM, the recovery site ZVM alerted that a Synchronization with site Prod_Site was initiated:

Synchronizing with site Prod_Site

Recovery site ZVM Dashboard during site synchronization.

The next step here is to see what our checkpoint history looks like.  Taking a look at the image below, we can see when the ZVM went down, and that there is a noticeable gap in checkpoints, however, as soon as the ZVM was back in service, checkpoint creation resumed, with only the time during the outage being unavailable.

Checkpoints resume

 

What happens if the ZVM is down longer than the configured Journal History setting?

In my lab, for the above testing, I set the VPG history to 1 hour.  That said, if you take a look at the last screen shot, older checkpoints are still available (showing 405 checkpoints).  When I first tried to run a failover test after this experiment, I was presented with checkpoints that go beyond an hour.  When I selected the oldest checkpoint in the list, a failover test would not start, even if the “Next” button in the FOT wizard did not gray out.  What this has lead me to believe is that it may take a minute or two for the journal to be cleaned up.

Because I was not able to move forward with a failover test (FOT), I went back in to select another checkpoint, and this time, the older checkpoints were gone (from over an hour ago).  Selecting the oldest checkpoint at this time, allowed me to run a successful FOT because it was within range of the journal history setting.  Lesson learned here – note to self: give Zerto a minute to figure things out, you just disconnected the brain from the spine!

Updated Checkpoints within Journal History Setting

Running a failover test to validate successful usage of checkpoints after ZVM outage:

File Server FOT in progress, validating fonts folder made it over to recovery site.

And… a recovery report to prove it:

Recovery Report - Successful FOT Recovery Report - Successful FOT

 

Summary and Next Steps

So in summary, Zerto is self-healing and can recover from a ZVM being down for a period of time.  That said, there are some things to watch out for, which include known what your configured journal setting is, and how a ZVM being down longer than the configured history setting affects your ability to recover.

You can still recover, however, you will start losing older checkpoints as time goes on while the ZVM is down.  This is because of the first-in-first-out (FIFO) nature of how the journal works.  You will still have the replica disks and journal checkpoints committing to it as time goes on, so losing history doesn’t mean you’re lost, you will just end up breaching your SLA for history, which will re-build over time as soon as the ZVM is back up.

As a best practice, it is recommended you have a ZVM in each of your protected sites, and in each of your recovery sites for full resilience.  Because after all, if you lose one of the ZVMs, you will need at least either the protected or recovery site ZVM available to perform a recovery.  The case is different if you have a single ZVM.  If you must have a single ZVM, put it into the recovery site, and not on the protected site, because chances are, your protected site is what you’re accounting for going down in any planned or unplanned event.  It makes most sense to have the single ZVM in the recovery site.

In the next article, I’ll be exploring this very example of a single ZVM and how that going down affects your resiliency.  I’ll also be testing some ways to potentially protect that single ZVM in the event it is lost.

Thanks for reading!  Please comment and share, because I’d like to hear your thoughts, and am also interested in hearing how other solutions handle similar outages.

Share This:

Configuring NSX Manager for Trend Micro Deep Security 9.6 SP1

This week was my first undertaking of preparing a vCenter environment for agent-less AV using Trend Micro Deep Security 9.6 SP1 and NSX Manager 6.2.4.  So far, it’s been a great learning experience (to remain positive about it), so I wanted to share.

Initially (about 4 months ago), vShield was deployed, and since day 1, we had problems.  Since I’ve never worked with Trend Micro Deep Security, I relied on the team that managed it to tell me what their requirements were.  Sometimes, this is the quickest way to get things done, but what I realized is that even if it works in one environment, it always helps to validate compatibility across the board.  If I had done that, I would have probably saved myself a lot of headache…

Deploying EOL Software? You should have known better...

So in anticipation of a “next time”, here are my notes from the installation and configuration.  Hopefully, this information is found useful to others, as some of it isn’t even covered in the Deep Security Installation Guide.

Please not, this is not a how-to.  It’s more of a reference for the things we’ve discovered through this process that may not all be documented in one place.  With that said, here’s my attempt at documenting what I’ve learned in one place.

The process (after trial and error since the installation guide isn’t very detailed) that seems to work best is as follows:

  1. Validate product compatibility versions (vCenter, ESXi, Trend Micro Deep Security, NSX Manager)
  2. Deploy and configure NSX Manager, then register with vCenter and the Lookup Service.
  3. Deploy Guest Introspection Services to the cluster(s) requiring protection.
  4. Register the vCenter and NSX Manager from the Trend Micro Security Manager.
  5. Deploy the Trend Micro Deep Security Service from vSphere Networking and Security in the Web Client.

When we first set out to deploy this in our datacenter, it was months ago.  We initially started with vShield Manager, which is what was relayed to us from the team that manages Trend.  We met with issues deploying properly and “things” missing from the vSphere Web Client when documentation said otherwise.  We had support tickets open with both VMware and Trend Micro for at least a few months.  At one point, due to the errors we were getting, Trend and VMware both escalated the issue to their engineering/development teams.  At the end of the day, we (the customers) eventually figured out what was causing the problem… DNS lookups.

The Trend Micro Deep Security installation guide does not cover this as a hard requirement.  Although the product will allow you to input and save IP addresses instead of FQDNs, it just doesn’t work, so use DNS whenever possible!

vShield

First of all, I wouldn’t look at vShield anymore unless working in an older environment after this.  In fact, I may just respond with:

If it’s EOL, is no longer supported, AND incompatible; I won’t even try. “Road’s closed pizza boy! Find another way home!”

You don’t gain anything from deploying EOL software.  Very importantly, you don’t get any future security updates when vulnerabilities are discovered and you won’t get any help if you call support about it.

In case you’re reading this and did the same thing I did, here are some things we noticed during this vShield experience:

  • Endpoint would not stay installed on some the ESXi hosts, while it did on others.
  • There is additional work for this configuration if you’re using AutoDeploy and stateless hosts. (see VMware KB: 2036701)
  • When deploying the TM DSVAs to hosts where Endpoint did stay on, installation fails as soon as the appliance is attempted to be deployed.
    • This is where we discovered using FQDN instead of IP address is preferred.
  • After successfully deploying the DSVAs, we still had problems with virtual appliances and endpoint staying registered, so it never actually worked.

In the back of my mind since I didn’t do this up-front, I started questioning compatibility. Sure enough, vShield is EOL, and is not compatible with our vCenter and host versions.

VMware NSX Manager 6.2.4

With a little more research, I found that NSX with Guest Introspection has replaced vShield for endpoint/GI services, and as long as that’s all you’re using NSX for, the license is free.  With NSX 6.1 and later, there is also no need to “prepare stateless hosts” (see VMware KB: 2120649).

Before simply deploying and configuring, I checked all the compatibility matrixes involved and validated that our versions are supported and compatible.  Be sure to check the resource links below, as there is some important information especially with compatibility:

  • vCenter: v 6.0 build 5318203
  • ESXi: v 6.0 build 5224934
  • NSX: v 6.2.4 build 4292526
  • Trend Micro Deep Security: v 9.6 SP1

Note: NSX 6.3.2 can be deployed, but you will need at least TMDS 9.6 SP1 Update3 – which is why I went with 6.2.4, and will upgrade once TMDS is upgraded to support NSX 6.3.2.)

Resources:

 

What I’ve Learned

Here are some tips to ensure a smooth deployment for NSX Manager 6.2.4 and Trend Micro Deep Security 9.6 SP1.

  • Ensure your NTP servers are correct and reachable.
  • Use IP Pools if at all possible when deploying guest introspection services from NSX Manager.  (makes deployment easier and quicker)
  • Set up a datastore that will house ONLY NSX related appliances.  (makes deployment easier and quicker)
  • When you first set up NSX Manager, be sure to add your user account or domain group with admin access to it for management, otherwise, you won’t see it in the vSphere Web Client unless you’re logged in with the administrator@vsphere.local account.
  • Validate that there are DNS A and PTR records for the Trend Micro Security Manager, vCenter, and NSX Manager, otherwise anything you do in Deep Security to register your environment will fail.
  • Pay close attention to the known issues and workarounds in the “Compatibility Between NSX 6.2.3 and 6.2.4 with Deep Security” reference above, because you will see the error/failure they refer to.
  • If deploying in separate datacenters or across firewalls, be sure to allow all the necessary ports.
  • Unlike vShield Manager deploying Endpoint, NSX Manager deploying Guest Introspection is done at the cluster level.  When using NSX, you can’t deploy GI to only one host, you can only select a cluster to deploy to.

If you’ve found this useful in your deployment, please comment and share!  I’d like to hear from others who have experienced the same!

 

Share This:

Changing a VM’s Recovery VRA When a Host Crashes or Prior to Recovery Site Maintenance

Yesterday, we had one host in our recovery site PSOD, and that caused all kinds of errors in Zerto, primarily related to VPGs.  In our case, this particular host had both inbound and outbound VPGs attached to it’s VRA, and we were unable to edit (edit button in VPG view was grayed out, along with the “Edit VPG” link when clicking in to the VPG) any of them to recover from the host failure.  Previously when this would happen, we would just delete the VPG(s) and recreate it/them, preserving the disk files as pre-seeded data.

When you have a few of these to re-do, it’s not a big deal, however, when you have 10 or more, it quickly becomes a problem.

One thing that I discovered that I didn’t know was in the product, is that if you click in to the VRA associated with the failed host, and go do the MORE link, there’s an option in there to “Change Recovery VRA.”  This option will allow you to tell Zerto that anything related to this VRA should now be pointed at X. Once I did that, I was able to then edit the VPGs.  I needed to edit the VPGs that were outbound, because they were actually reverse-protected workloads that were missing some configuration details (NIC settings and/or Journal datastore).

Additionally – If you are planning on host maintenance in the recovery site (replacing a host(s), patching, etc…), these steps should be taken prior to taking the host and VRA down to ensure non-disrupted protection.

 

Here’s how:

  1. Log on to the Zerto UI.
  2. Once logged on, click on the Setup tab.
  3. In the “VRA Name” column, locate the VRA associated with the failed host, and then click the link (name of VRA) to open the VRA in a new tab in the UI.
  4. Click on the tab at the top that contains VRA: Z-VRA-[hostName].
  5. Once you’re looking at the VRA page, click on the MORE link.
  6. From the MORE menu, click Change VM Recovery VRA.
  7. In the Change VM Recovery VRA dialog, check the box beside the VPG/VM, then select a replacement host. Once all VPGs have been udpated, click Save.

Once you’ve saved your settings, validate that the VPG can be edited, and/or is once again replicating.

 

Share This:

Zerto: Dual NIC ZVM

Something I recently ran into with Zerto (and this can happen for anything else) was the dilemma of being able to protect remote sites that (doesn’t happen often) happen to have IP addresses that are identical in both the protected and recovery sites.  And no, this wasn’t planned for, it was just discovered during my Zerto deployment in what we’ll call the protected sites.

Luckily, our network team had provisioned two new networks that are isolated, and connected to these protected sites via MPLS.  Those two new networks do not have the ability to talk back to our existing enterprise network without firewalls getting involved, and this is by design since we are basically consolidating data centers while absorbing assets and virtual workloads from a recently acquired company.

When I originally installed the ZVM in my site (which we’ll call the recovery site), I had used IP addresses for the ZVM and VRAs that were part of our production network, and not the isolated network set aside for this consolidation.  Note: I installed the Zerto infrastructure in the recovery site ahead of time before discussions about the isolated networks was brought up.  So, because I needed to get this onto the isolated network in order to be able to replicate data from the protected sites to the recovery site, I set out to re-IP the ZVM, and re-IP the VRAs.  Before I could do that, I needed to provide justification for firewall exceptions in order for the ZVM in the recovery site to link to the vCenter, communicate with ESXi hosts for VRA deployment, and also to be able to authenticate the computer, users, service accounts in use on the ZVM.  Oh, and I also needed DNS and time services.

The network and security teams asked if they could NAT the traffic, and my answer was “no” because Zerto doesn’t support replication using NAT.  That was easy, and now the network team had to create firewall exceptions for the ports I needed.

Well,  as expected, they delivered what I needed.  To make a long story short, it all worked, and then about 12 hours before we were scheduled to perform our first VPG move, it all stopped working, and no one knew why.  At this point, it was getting really close to us pulling the plug on the migration the following day, but I was determined to get this going and prevent another delay in the project.

When looking for answers, I contacted my Zerto SE, reached out on twitter, and also contacted Zerto Support.  Well, at the time I was on the phone with support, we couldn’t do anything because communication to the resources I needed was not working.  We couldn’t perform a Zerto re-configure to re-connect to the vCenter, and at this point, I had about 24VPGs that were reporting they were in sync (lucky!), but ZVM to ZVM communication wasn’t working, and recovery site ZVM was not able to communicate with vCenter, so I wouldn’t have been able to perform the cutover.  So since support couldn’t help me out in that instance, I scoured the Zerto KB looking for an alternate way of configuring this where I could get the best of both worlds, and still be able to stay isolated as needed.

I eventually found this KB article that explained that not only is it supported, but it’s also considered a best practice in CSP or large environments to dual-NIC the ZVM to separate management from replication traffic.  I figured, I’m all out of ideas, and the back-and-forth with firewall admins wasn’t getting us anywhere; I might as well give this a go.  While the KB article offers the solution, it doesn’t tell you exactly how to do it, outside of adding a second vNIC to the ZVM.  There were some steps missing, which I figured out within a few minutes of completing the configuration.  Oh, and part of this required me to re-IP the original NIC back to the original IP I used, which was on our production network.  Doing this re-opened the lines of communication to vCenter, ESXi hosts, AD, DNS, SMTP, etc, etc… Now I had to focus on the vNIC that was to be used for all ZVM to ZVM as well as replication traffic.  In a few short minutes, I was able to get communication going the way I needed it, so the final thing I needed to do was re-configure Zerto to use the new vNIC for it’s replication-related activities.  I did that, and while I was able to re-establish the production network communications I needed, now I wasn’t able to access the remote sites (ZVM to ZVM) or access the recovery site VRAs.

It turns out, what I needed here were some static, persistent routes to the remote networks, configured to use the specific interface I created for it.

Here’s how:

The steps I took are below the image.  If the image is too small, consider downloading the PDF here.

zerto_dual_nic_diagram

 

On the ZVM:

  1. Power it down, add 2nd vNIC and set it’s network to the isolated network.  Set the primary vNIC to the production network.
  2. Power it on.  When it’s booted up, log in to Windows, and re-configure the IP address for the primary vNIC.  Reboot to make sure everything comes up successfully now that it is on the correct production network.
  3. After the reboot, edit the IP configuration of the second vNIC (the one on the isolated network).  DO NOT configure a default gateway for it.
  4. Open the Zerto Diagnostics Utility on the ZVM. You’ll find this by opening the start menu and looking for the Zerto Diagnostics Utility.  If you’re on Windows Server 2008 or 2012, you can search for it by clicking the start menu and starting to type “Zerto.”
    zerto_dual_nic_1_4
  5. Once the Zerto Diagnostics Utility loads, select “Reconfigure Zerto Virtual Manager” and click Next.
    zerto_dual_nic_1_5
  6. On the vCenter Server Connectivity screen, make any necessary changes you need to and click Next.  (Note: We’re only after changing the IP address the ZVM uses for replication and ZVM-to-ZVM communication, so in most cases, you can just click Next on this screen.)
  7. On the vCloud Director (vCD) Connectivity screen, make any necessary changes you need to and click Next. (Note: same note in step 6)
  8. On the Zerto Virtual Manager Site Details screen, make any necessary changes you need to  and click Next. (Note: same as note in step 6)
  9. On the Zerto Virtual Manager Communication screen, the only thing to change here is the “IP/Host Name Used by the Zerto User Interface.”  Change this to the IP Address of your vNIC on the isolated Network, then click Next.zerto_dual_nic_1_9
  10. Continue to accept any defaults on following screens, and after validation completes, click Finish, and your changes will be saved.
  11. Once the above step has completed, you will now need to add a persistent, static route to the Windows routing table.  This will tell the ZVM that for any traffic destined for the protected site(s), it will need to send that traffic over the vNIC that is configured for the isolated network.
  12. Use the following route statement from the Windows CLI to create those static routes:
    route ADD [Destination IP] MASK [SubnetMask] [LocalGatewayIP] IF [InterfaceNumberforIsolatedNetworkNIC] -p
    Example:>
    route ADD 192.168.100.0 MASK 255.255.255.0 10.10.10.1 IF 2 -p
    route ADD 102.168.200.0 MASK 255.255.255.0 10.10.10.1 IF 2 -p
    
    Note: To find out what the interface number is for your isolated network vNIC, run route print from the Windows CLI.  It will be listed at the top of what is returned.
    

 

zerto_dual_nic_1_10

Once you’ve configured your route(s), you can test by sending pings to remote site IP addresses that you would normally not be able to see.

After performing all of these steps, my ZVMs are now communicating without issue and replications are all taking place.  A huge difference from hours before when everything looked like it was broken.  The next day, we were able to successfully move our VPGs from protected sites to recovery sites without issue, and reverse protect (which we’re doing for now as a failback option until we can guarantee everything is working as expected).

If this is helpful or you have any questions/suggestions, please comment, and please share! Thanks for reading!

 

Share This:

Replace a Failed External PSC in Enhanced Linked Mode: Part 2

In Part 1 of “Replace a Failed External PSC in Enhanced Linked Mode”, I worked through repointing a Windows vCenter Server to another External PSC, in an effort to unregister, and rebuild a failed PSC.

In this guide, I will walk through:

  • Building a new Windows external PSC
  • Joining the SSO Domain
  • Re-pointing VC1 back to this newly-built, and linked external PSC, therefore, returning us to the original topology we started with (except for the server name)

The main systems I’ll be working with in this guide are:

  • VC1 (Windows vCenter that was re-pointed to working PSC for other site while we rebuild it’s home PSC)
  • PSC3 – This is a newly built Windows Server 2012 R2 PSC that is taking the place of PSC1.

I would have chosen to use the same name of the original PSC, but from experience, it’s always better to be safe, and not try to re-introduce a problematic record into the environment, just in case there are still entries hanging out somewhere in the working configuration.

At the end of this guide, we will end up with this topology:

replace_psc_part2_topology_1

 

Install the External PSC Role on the New Server


Note: 
These steps are the same ones to deploy the second PSC in “Deploying Windows vCenter with External PSCs in Enhanced Linked Mode: Part 2.”

  1. Launch the vCenter Server Installer
  2. Select vCenter Server for Windows, and click Install.
  3. Click Next on the welcome screen.
  4. Accept the EULA, click Next.
  5. Under External Deployment, select Platform Services Controller, and click Next.replace_psc_part2_1_5
  6. Verify the system name (This should be the same FQDN of the PSC you are building to replace the failed one), and click Next.
  7. Select Join a vCenter Single Sign-On domain.
  8. Enter the FQDN for the first Platform Services Controller that owns the SSO domain you want to join.
  9. Enter the vCenter Single Sign-On password, then click Next.replace_psc_part2_1_9
  10. When prompted for Certificate Validation, click OK to accept the self-signed certificate.
  11. Select Join an existing site, choose the site from the dropdown menu(should match the site name of the first PSC you created), and click Next.replace_psc_part2_1_11
  12. On the Configure Ports page, make any changes necessary for your environment, and click Next.
  13. Set the PSC installation and data directories, and click Next.
  14. Select whether or not to join the Customer Experience Improvement Program (CEIP), and click Next.
  15. Verify the installation summary settings, and if all looks well, click Install.
  16. Once the installation has completed, log into the vSphere Web Client, and navigate to Home > Administration > Deployment > System Configuration.  Under the Nodes object, verify that there are now 4 nodes (you should see 2 PSCs, and 2 vCenter servers).

 

Next Steps:

After verifying functionality of the newly added PSC, the next step is to re-point VC1 (repointed previously to PSC2) to the new PSC (PSC3).

replace_psc_part2_2_main

Repoint the Connections Between vCenter Server and Platform Services Controller:
 
  1. Log onto the vCenter Server instance (VC1).
  2. In the command prompt (run as administrator), navigate to C:\Program Files\VMware\vCenter Server\bin (or wherever you have vCenter installed to).
  3. Run the cmsso-util script:
    cmsso-util repoint --repoint-psc psc_fqdn_or_static_ip [--dc-port port_number]

    psc_fqdn_or_static_ip – is the FQDN or static IP address of the PSC you want to repoint to.

    replace_psc_part2_3_3

  4. Log into the vCenter Server instance by using the vSphere Web Client to verify that the vCenter server is running and can be managed.
  5. Finally, to see what PSC each vCenter is connected to:
    1. Log into the vSphere Web Client, and navigate to Hosts and Clusters View.
    2. Select a vCenter, and go to the Manage Tab.
    3. In Settings, go to Advanced Settings.
    4. Search for the config.vpxd.sso.admin.uri
    5. When the result is returned, look at the Value field, and this will tell you what PSC the particular vCenter is connected to.replace_psc_part2_3_5

 

This completes the series for Replacing a Failed External PSC in Enhanced Linked Mode. If you find that this has helped you, please feel free to share the information. It took me quite a while to gather all the information needed, and build the environment for this, so I really hope it helps.

In my research when first encountering the issues with my failed PSC, I found that there are a lot of other bloggers out there who have written something about the issues, troubleshooting steps, and fixes related to a failed PSC. While this is not a “one fix to rule them all” solution, it is a very clean way to replace a failed PSC. I apologize for not documenting the same thing for the VCSA, however, if you follow the steps in the order I provided, the links I have in my posts also have the proper steps to execute for the VCSA.

 

Share This:

Replace a Failed External PSC in Enhanced Linked Mode: Part 1

If you’ve followed along starting with the 2-part series about “Deploying Windows vCenter with External PSCs in Enhanced Linked Mode…”, this is the next installment, in which we will go through replacing a failed external PSC using that same topology.

If you haven’t followed along, then the following links will help set the stage for what we’re doing here:

The reasons I’m performing these steps, or even trying it, are:
  • I have never performed this procedure before.
  • I would like to know how to do this and be able to help others out by sharing my experience and the information.
  • I would like to understand the requirements and general order of operations for performing the procedure.
  • I would like to address a current production issue without experimenting in production, and therefore causing further complications or a causing a complete loss of manageability of my environment.

 

The Workflow

replace_psc_part1_topology

 

Lab Testing

 

So to prepare for this in the lab, I’ve created a datacenter in each vCenter, assigned some permissions to both of them (AD-integrated permissions), and licensed both vCenters (licenses to be removed following the testing).  This was done to ensure that replication was succeeding between the two PSCs.  I also ran the vdcrepadmin.exe utility on both PSCs to ensure replication is succeeding, and there are no outstanding changes or replication problems.

From PSC1:

.\vdcrepadmin.exe -f showpartnerstatus -h localhost -u administrator -w [password]

replace_psc_part1_lab_1_1
From PSC2:
.\vdcrepadmin.exe -f showpartnerstatus -h localhost -u administrator -w [password]
replace_psc_part1_lab_1_2
The next step for me to do is shutdown PSC1 to simulate a failure.  Once shut down, re-point the vCenter affected by this failure to the working PSC:
replace_psc_part1_lab_1_3

If I run the following on PSC2, it will show that PSC1 is now offline:

.\vdcrepadmin -f showpartnerstatus -h localhost -u administrator -w [password]

If I run the following on PSC2, it still shows that there are two PSCs registered:

.\vdcrepadmin -f showservers -h localhost -u administrator -w [password]

replace_psc_part1_lab_1_4

Re-point the Connections Between vCenter Server and Platform Services Controller

  1. Log onto the vCenter Server instance (VC that is still connected to the failed PSC).
  2. In the command prompt (run as administrator), navigate to C:\Program Files\VMware\vCenter Server\bin (or wherever you have vCenter installed to).
  3. Run the cmsso-util script to repoint the connection of this vCenter to the PSC that is still alive:
    cmsso-util repoint --repoint-psc psc_fqdn_or_static_ip

    (Running the command above may take some time to complete. In my test lab, it took approximately 13 minutes to complete)

    replace_psc_part1_lab_3_3

    Part of the repointing task includes stopping and starting all vCenter related services on the server.  Give the web server additional time to fully initialize before moving on with the next step.

  4. Log into the vCenter Server instance by using the vSphere Web Client to verify that the vCenter server is running and can be managed.After the web server completed its initialization for VC1, I was able to log in successfully, and verify the inventory, permissions, and licensing.  The next step is to Unregister the bad PSC (PSC1) from the configuration on PSC2.

 

Unregister the Failed PSC

 

  1. On the PSC (live one that you just repointed to), open a command prompt (run as administrator).
  2. Browse to C:\ProgramData\VMware\vCenterServer\cfg\install-defaults.
  3. On the failed PSC: Open the vmdir.ldu-guid file to find the hostid.
  4. On the working PSC: Navigate to C:\Program Files\VMware\vCenter Server\bin
  5. On the working PSC: Run the cmsso-util unregister command to unregister the stopped/failed Platform Services Controller:

    cmsso-util unregister --hostId host_Id --node-pnid Platform_Services_Controller_System_Name --username administrator@vsphere.local --passwd [password]

    replace_psc_part1_lab_4_5

    After this has been run successfully, verify that the OLD PSC has been removed from the topology.

  6. n the vSphere Web Client, navigate to Home > Administration > Deployment > System Configuration.  Under the Nodes object, verify that there are only 3 nodes (you should see 1 PSC, and 2 vCenter servers).
  7. On PSC2, run

    .\vdcrepadmin -f showservers -h localhost -u administrator -w ************

    replace_psc_part1_lab_4_7

    You should now only see one server in the listing, as opposed to 2, since you just removed the failed PSC.

  8. Delete the failed PSC (VM) you no longer need from the vSphere inventory.

 

There you have it.  We have successfully re-pointed a vCenter to another PSC, unregistered the bad PSC, and validated that we are now ready to rebuild in order to re-instate the original topology.  The next part to this series will cover building the replacement PSC, joining the SSO domain, and finally, repointing the vCenter at this new PSC, therefore returning the topology to where it was before we started.

Share This:

Deploying Windows vCenter with External PSCs in Enhanced Linked Mode: Part 2

Before following any steps in here, be sure to refer to the previous part to this series, which will provide some background information and walk through the steps to install the first PSC and vCenter servers.  The steps contained in this post will be a continuation, marking the differences between the initial install (previous post), and the additional PSC and vCenter server.
In this post, I will be walking through joining the second PSC to the SSO domain created previously.  Following that, if there are any steps different for the subsequent vCenter Server installation, I will call out any steps in this installation that differ from the first install and include screenshots.
For any pre-requisites, head to Part 1 of this series.

 

Install the Second Platform Services Controller on a Windows Machine and Joining the SSO Domain

If deploying in a production environment, refer to the vSphere Installation and Setup for vSphere 6.0 Guide.

    1. Launch the vCenter Server Installer
    2. Select vCenter Server for Windows, and click Install.
    3. Click Next on the welcome screen.
    4. Accept the EULA, click Next.
    5. Under External Deployment, select Platform Services Controller, and click Next.external_psc_part2_1_5
    6. Verify the system name (recommended you use FQDN, not IP Address), and click Next.
    7. Select Join a vCenter Single Sign-On domain.
    8. Enter the FQDN for the first Platform Services Controller that owns the SSO domain you want to join.
    9. Enter the vCenter Single Sign-On password, then click Next.external_psc_part2_1_9
    10. When prompted for Certificate Validation, click OK to accept the self-signed certificate.
    11. Select Join an existing site, choose the site from the drop-down menu (should match the site name of the first PSC you created), and click Next.external_psc_part2_1_11
    12. On the Configure Ports page, make any changes necessary for your environment, and click Next.
    13. Set the PSC installation and data directories, and click Next.
    14. Select whether or not to join the Customer Experience Improvement Program (CEIP), and click Next.
    15. Verify the installation summary settings, and if all looks well, click Install.

Next Steps

 
Once completed, run the installer on the vCenter Server that will connect to this PSC, and be sure to connect it to THIS PSC, NOT THE FIRST PSC that was built.

external_psc_part2_1_ns_1

Install vCenter Server and the vCenter Server Components, Connecting to the Second PSC

Perform these steps, making sure to connect this vCenter to the PSC that was just installed above, not the first PSC from Part 1.
Installation
 
  1. Launch the vCenter installer and select vCenter Server for Windows. Click Install.
  2. Once the installer initializes, click Next.
  3. Accept the VMware End User License Agreement, and click Next.
  4. Under External Deployment, select vCenter Server and click Next.
  5. Enter the system’s FQDN, and click Next.
  6. Enter the information for the external PSC that was deployed in the section above, and click Next.  This step will register the vCenter with the PSC.
  7. When prompted for certificate validation, click OK to approve the self-signed certificate created by the PSC.
  8. Configure the vCenter service account according to your environment requirements and click Next.  If you are using an external database server, you will need to specify a user service account.

    Note: 
    If you are using a user service account, you will need to make sure it has the “log on as a service” privilege in the local security policy.

  9. Select your database deployment and enter information if necessary, then click Next.
  10. Configure the required ports if necessary to match your environment, and click Next.
  11. Configure the installation directory for the vCenter Server and data, then click Next.
  12. Review all settings, and when ready, click Install.
Next Steps
 
After you’ve completed this setup, you should now have a functional topology, with 2 vCenters each connected to an external platform services controller.  The two external PSCs should now also be running in enhanced linked mode, and to verify, you can log into either vCenter, and see that you can also manage the inventory of the other.
I will follow this up with my testing for vCenter repointing and recovering from a failed PSC in this topology.
Share This: