Introduction
Obviously, based on my previous blog posts, it’s apparent that I’ve been spending some time in the past few months testing VMware Site Recovery Manager and Zerto Virtual Replication to see which product best meets our business continuity and disaster recovery requirements. My task was to compare the two products, feature for feature based on our use cases, which are primarily protection, recovery, re-protection, and workload migration.
Get comfortable, this could take a while…
Blue vs. Red
As of today, SRM and Zerto have been tested in a sandbox environment, consisting of 2 sites (Seattle and Denver), 2 vCenters, 2 physical hosts in a cluster in each site, and 1 test workload which consisted of a Windows Server VM with auto-generated files of different sizes. The two sites, being geographically separated are joined by a dual 20 Gb/s connection, and there are no bandwidth throttling mechanisms in place outside of what’s available in the software, and it’s only used to throttle down during business hours. The physical networking at the host level in both sites is 10GbE.
VMware’s Site Recovery Manager is the only one of the two products that has the array-based replication feature, so to make this more of an “apples-to-apples” comparison, that feature isn’t heavily reported on here, but has been tested, and it works well, so I’m happy.
Both hypervisor-based product tests that were performed have been completed in each direction, in terms of recovery testing, failover, re-protection, and migration. The results of both solutions are similar, however, based on results, we are leaning more toward one product in terms of simplicity, flexibility, scalability, monitoring capabilities, and user experience.
Below are images of what the topology for both test environments looks like, with SRM on the left, and Zerto on the right.
If you are interested in seeing these diagrams up close, you can download the PDFs for each here:
^^ Not pictured in the Zerto Diagram: External PSCs for vCenter, vCenter SQL Servers, and all port communication native to vCenter components.
Product Comparison
While VMware Site Recovery Manager creates a complete solution with vSphere Replication (which can also be used without SRM), Zerto also protects using hypervisor replication. But to compare the two, we must first compare the capabilities of each solution by comparing vSphere Replication (without SRM) to Zerto Virtual Replication. Note that without SRM, vSphere Replication can be rather limited when it comes to several features. The tables will lay out the use cases for either product, and their features.
Use Cases
VMware vSphere Replication Use Cases | Zerto Virtual Replication Use Cases |
---|---|
|
|
Feature Comparison: vSphere Replication (Without SRM) and Zerto Virtual Replication
VMware vSphere ReplicationFeatures & Benefits | Zerto Virtual Replication Features & Benefits | |
---|---|---|
Licensing Requirement | VMware Essentials Plus and Above | VMware Essentials |
Automation/Orchestration of Disaster Recovery | Manual, PowerCLI to get basic automation (add to inventory, power on/off) ; otherwise, use SRM with vSphere Replication | Full automation/orchestration features |
Version Compatibility | vSphere Replication version must match vCenter version | Zerto can be used with vSphere 4.0 and later, no ties to having every component match versions in respect to hypervisor/vCenter. |
Automated Recovery Capabilities | Each VM in the recovery site will need to manually be powered on. | Fully automated recovery capabilities. |
Automated Connection to correct network(s) | Manually done when recovering with vSphere Replication. For automation of post-recovery tasks, use SRM. | Fully automated |
WAN Compression | Network compression capable with 6.1 at the cost of vSphere replication appliance CPU resources. Note: 1 vR appliance per vCenter instance is supported for a maximum of 2000 VMs protected per appliance. | Built-in, often seeing a 50% compression ratio. Replication appliances are assigned a 1:1 ratio (host to VRA) with automated resource reservations to ensure best performance of replication appliances. |
IP Re-Addressing | Manual process. For automated re-IP, use SRM | Built in to failover plan (assigned in VPG) |
Non-Disruptive Testing | Not available since you cannot power on the replica VM if the original VM is still running and reachable. Use SRM with vSphere Replication to allow for recovery testing. | Real or bubble networks can be used for recovery testing and isolation. |
Cloning Capability | None | Allows for recovery site clones. This allows for full long-term archival backups of the VMs or file-level recovery from a point-in-time clone. |
Failback Option | None - SRM required. | Automated failback workflow capability |
Point-in-Time Recovery | Available with vSphere Replication 6.x - maximum of 24 PIT instances. Uses VMware Snapshots. | Configurable, however, when using Offsite Backup Feature, up to 1 year. Does not use VMware Snapshots. |
RDM (Raw Device Mapping) Support | No physical RDM support, but virtual RDMs are supported. | Both physical and virtual mode RDMs are supported. |
Bandwidth Control | None | Throttling and priorities are available in Zerto to reduce bandwidth consumption during certain times, and unlimited at others, via schedule. |
vApp Support | Not Supported | Zerto leverages vApps to make administration easier. If a vApp is configured for protection with a VPG, then any VM added to the vApp is automatically protected. |
Storage DRS Support | Not supported, SRM is required. | Storage DRS is supported and works with Zerto. |
RPO Range | 15 minutes to 24 hours | Seconds |
How VMs are Chosen | Selected individually or through multi-selecting in the interface, but protection grouping is not available. | VMs can be organized into Virtual Protection Groups. |
Feature Comparison: vSphere Replication (with SRM) and Zerto Virtual Replication
VMware vSphere Replication with SRM | Zerto Virtual Replication | |
---|---|---|
Provides planning, testing, and execution of disaster recovery for vSphere: | Yes | Yes |
Designed for: | SRM was designed for disaster recovery orchestration only | Designed for hypervisor-based replication AND disaster recovery orchestration |
Licensed: | Per-VM | Per-VM |
Replication granularity: | Per-VM or multi-select but virtual protection grouping is not available | Per-VM and/or Per-Virtual Protection Group |
Configure consistency groups (virtual protection groups) | No | Yes |
Replication recovery points: | Yes, up to 24 snapshots | Yes, up to 14 days with standard recovery, up to 1 year with extended recovery using the Offsite Backup feature. |
Compatibility: | vSphere Replication works with ESX 5.x and above. SRM requires the same version of vCenter and SRM be installed at both sites. | Zerto works with ESXi 4.0 U1 and above. Zerto can replicate between different versions of vCenter. Zerto can also protect and recover from vSphere to Hyper-V, Hyper-V to vSphere, and either virtualization platform to the cloud (AWS, Azure(Zerto v5.0)). |
Managed with: | vSphere Client Plugin | vSphere Client Plugin and standalone browser UI |
Replication is performed with: | vSphere Replication | Zerto HyperVisor-based replication through VRAs deployed to each host with protected VMs |
Feature Comparison: VMware Site Recovery Manager & Zerto Virtual Replication API Availability
The following table displays the availability, use cases, and capabilities of both the VMware Site Recovery Manager and Zerto Virtual Replication APIs for access, integration, and automation.
VMware Site Recovery Manager | Zerto Virtual Replication | |
---|---|---|
Availability |
|
|
Use Cases |
|
|
Capabilities |
|
|
Programming Environments/Supported Languages |
|
|
System Requirements
The following tables below outline system requirements for both VMware Site Recovery Manager and Zerto Virtual Replication.
VMware Site Recovery Manager 6.1 | Zerto Virtual replication 4.5 U3 | |
---|---|---|
Virtualization Management |
|
|
Hypervisor |
|
|
vSphere Replication Appliance |
|
|
Storage Replication Adapter |
|
|
Client |
|
|
vSphere Replication Appliance Resource Requirements (per site) |
|
|
Zerto Virtual Replication Appliance (VRA) |
|
|
Recovery Orchestration Provided By |
|
|
SRM6.1/ZVM 4.5U3 Server Requirements (1 per site) |
|
|
Supported Databases |
|
|
Bandwidth Requirements |
|
|
Number of Firewall Ports for Cross-site Communication, Replication, and Recovery |
|
|
Steps from Installation to Protection
The following table compares the high-level installation tasks/steps for VMware Site Recovery Manager and Zerto Virtual Replication. These steps assume necessary pre-requisites such as vCenter installation and firewall rules have been created.
Please note, that SRM appears to have many more steps, because SRM supports both array-based replication, in addition to vSphere Replication. If you don’t use one or the other, these steps are dramatically decreased. In my test environment, both features have been tested, and because of that, SRM has more steps.
VMware Site Recovery Manager | Zerto Virtual Replication |
---|---|
|
|
Protection Workflow
The following workflows have been created to illustrate the process involved in protecting virtual workloads using VMware Site Recovery Manager with vSphere Replication, and Zerto Virtual Replication.
Individual files for each protection workflow in full-size view are here:
In the above images, SRM on the left, and Zerto on the right; visually, you can see that SRM clearly has many more steps performed in multiple places, compared to Zerto. Majority of the additional steps in the SRM protection workflow deal with the multiple layers where protection is configured via the vSphere Web Client for a single VM using vSphere Replication. On the right side (Zerto), you see that most of the steps (if not all) for protecting virtual workloads takes place at the top layer, which is the Zerto Virtual Manager UI.
In SRM, protecting a single VM using vSphere Replication involves selecting the VM enabling vSphere Replication, going into Site Recovery, building a protection group and configuring it, followed by creating a recovery plan and configuring. The recovery plan portion of that is where customization such as boot priority and IP address changes are completed.
In Zerto, protecting a single VM is as easy as logging into the ZVM UI, creating a VPG, and providing protection and recovery settings all within one wizard.
Recovery Workflow
The following workflows have been created to illustrate the process involved in recovering from a site failure using VMware Site Recovery Manager with vSphere Replication, and Zerto Virtual Replication.
Individual files for each protection workflow in full-size view are here:
In the above images, SRM on the left, and Zerto on the right; visually you can see that the steps to recovery are fairly similar, with the exception that recovery in SRM is performed via the vSphere Web Client, while recovery from Zerto is performed from the ZVM UI (recovery performed at the recovery site in both scenarios). The most complex part about recovering in any scenario is the organization of admins/engineers/business stakeholders to recover, re-configure, and validate the recovery process. Of course, if routine recovery testing had been taking place, a failure should basically mimic a recovery test, although, more of a commitment at this point, instead of an exercise.
In SRM, there really is one place to take care of a recovery, and that is in Site Recovery > Recovery Plans. Locate the recovery plan for the application(s) you want to recover, and click the red button – its a no-brainer!
In the Zerto UI home screen, toggle the failover type from test to “live”, and click the recover button. When you click the button, you will be presented with a 3 step wizard, where you will select the VPG(s) to recover; select the checkpoint to recover from, set the commit policy, re-protect; and click the “start failover” button. Recovery and re-protection all in 1 place. The re-protection process in either product is straightforward, however, if there already isn’t a site built to re-protect to, there will be some work to do (in either case).
Implementation Time and Complexity
Planning, designing, and implementing either of these two products shouldn’t be difficult for anyone, except there are several pre-requisites that take time, change management processes and schedules to follow, or firewall rules to create and verify. With SRM, I’ve found that since this product ties to closely in to vSphere and version matching is a requirement, this could delay anyone who doesn’t have a version-aligned environment; or doesn’t have experience with vSphere or SRM. The biggest requirement for SRM? vSphere – you will have to have a vSphere deployment fully functional, and at an exact minimum version in both sites, in order to deploy SRM successfully. Zerto doesn’t care if the vCenter/ESXi versions on both sites match, as long as the minimum supported version is in use.
Granular requirements can make for administrative overhead and total team collaboration in the case of upgrades, maintenance, recovery, etc… because SRM relies heavily on version compatibility (as do other VMware products). In cases like this, there are specific orders of operations required for upgrades or power-on operations. These requirements are out of scope, but it pays to understand that they exist; so be sure to do some research, and if you can, test it before performing in production.
When installing Zerto, what took the most amount of time was building the Windows VMs (a few hours x 2) to house ZVM in each site… that and firewall rules (about 2 weeks, in my case following approval, change management, and implementation). Once the VMs were built and the firewall rules were in place, the actual time taken to install Zerto was about 10-15 minutes per ZVM, and approximately 10 minutes to deploy each VRA, which can also be bulk scripted. Zerto works as long as the hypervisor and vCenter are at a minimum version supported by Zerto, but it can protect across versions, or even hypervisors (VMware vSphere & Microsoft Hyper-V)! VPG creation can vary, depending on how many VMs per VPG you want to protect, and customization of all options, with one of the longer taking items being recovery and test IP settings. That’s it. Once you have a VPG created, initial synchronization starts, and as soon as the sites are in sync, you’ll ready to test, recover, or migrate and re-protect.
Monitoring and Reporting
Monitoring and Reporting with VMware Site Recovery Manager
VMware Site Recovery Manager provides monitoring and reporting, however, is limited depending on where you are in the object hierarchy (but the data is there!):
- number of replicated VMs per host
- amount of data transferred
- number of RPO violations
- replication count
- number of sites successfully connected
These reports can also be expanded to show more detail, and data range can be modified. In my experience during testing, monitoring replication status and information isn’t as intuitive and centrally located as you would expect. There are several different places to monitor protection status and get additional information.
Some of this is at the VM level, where you will see replication status, last sync point, target site, quiescing (enabled/disabled), network compression (enabled/disabled), RPO, Points in time recovery (enabled/disabled), disk status.
Monitoring at the VM Object
At the VM (protected VM) level, you can monitor replication performance, however, it is limited to 2 counters, which are:
- Replication Data Receive Rate (Average in KBps)
- Replication Data Transmit Rate (Average in KBps)
Monitoring at the Site Recovery > Sites Level
At the site level, you can monitor things like issues, recovery plan history, and also get basic protection group and recovery plan information for Array Based Replication, Protection Groups, and Recovery Plans:
Monitoring at the Protection Group Level
At the protection group level, the summary tab will give you information such as status, number of VMs that are in the protection group, configuration status of those VMs, and any replication warnings (not clickable for more detail):
Selecting a protection group gives you a list of recovery plans, and VMs, and general protection information, but no logging or reporting.
Monitoring at the Recovery Plan Level
At the recovery plan level, when you select a recovery plan you the plan status, VM status, and recent history if the recovery plan has been run for testing or failover:
Digging deeper into a recovery plan, you have the ability to see recovery plan steps, history, protection group general protection information, and virtual machine general protection information:
Monitoring vSphere Replication at the vCenter Level
One more place that I was able to find monitoring and reporting is at the vSphere Replication level. Going to vSphere Replication in the vSphere Web Client, then clicking on a vCenter. From there, going to the Monitor tab, and clicking on vSphere Replication will take you the the screen in the image below where you can monitor Outgoing Replications, Incoming Replications, View Reports and Cloud Recovery Settings. The reports section looks to contain the most information, however, there isn’t a way in the UI to export reports if a customer requests a report to show history of their replication jobs.
Monitoring Outgoing Replications (per vCenter)
This section displays any Point in Time snapshots that can be recovered to if it has been configured, and replication information (although very general) such as:
- Status
- VM
- Target Site
- vR Server used
- Configured Disks
- Last Instance Sync Point
- Last Sync Duration
- Last Sync Size
- RPO
- Quiescing (enabled/disabled)
- Network Compression (enabled/disabled)
Monitoring Incoming Replications (per vCenter)
This section displays Point in Time Snapshots, Recovery history, and Replication information (again all general) such as:
- Status
- VM (when a VM is selected above)
- Target Site
- vR Server
- Configured Disks
- What manages the incoming replications (in this case, it’s SRM)
- Last instance sync point
- Last sync duration
- Last sync size
- RPO
- Quiescing (enabled/disabled)
- Network Compression (enabled/disabled)
Reporting for vSphere Replication (per vCenter)
This section contains statistical information that can be filtered by date range. This section is a little more detailed (my favorite view), and actually contains numbers on graphs. It contains information such as:
- Count of replicated vs non-replicated VMs
- Replicated VMs per by host(s)
- Transferred bytes
- RPO violations
- Replications Count
- Site connectivity status
- vR Server Connectivity (not pictured)
While this is great information, there is no way from the interface to export the reports if needed.
Cloud Recovery Testing
This section contains general information on any replications to the cloud. Since we are not replicating to the public cloud, this section is empty, but I have shown it to display what detail it contains.
Based on the findings for monitoring vSphere Replication and SRM, as shown above, there are multiple places to look for information, statistics, and reports. The problem here is that monitoring any ongoing replication jobs and/or recoveries and performance is a multi-tiered approach, and there is no centralization of information that is exportable for review. There are too many places to look for information, and it would be too tedious to effectively monitor protection jobs, recoveries, and performance out-of-the-box.
Monitoring and Reporting in Zerto Virtual Replication
Monitoring protection status in Zerto has been intuitive, detailed, and centralized. Zerto has decided to separate the two functions into “tabs” within the UI. One tab for monitoring (includes tasks and alerts), and one tab for reporting. The ability to set Zerto up to alert via e-mail and send reports at a regular interval (and scheduled!) are natively built into the product. The product doesn’t stop with 1 e-mail address destination, as it also allows for multiple recipients via comma or semicolon separator in the site settings. In the resource reports, you can set up the sampling rate, and the sampling time interval. In terms of BC/DR solutions, it would be much more preferred to receive more information than necessary, rather than waiting for a problem to surface. Nothing is more embarrassing or resume-generating than finding out at the point of a failure that your replication product hasn’t been replicating much or hasn’t been able to meet your RPO/RTO.
In the Zerto UI, monitoring alerts, events, and tasks is as simple as clicking on the “monitoring” tab. You can search for specific events or alerts (or both), and also modify the timeframe that you are targeting. In the reporting tab, you can get reports for the following items, and you can select any of them per VPG, or for all VPGs (and customize the reporting dates).
- VPG Performance (RPO in seconds, IOPs, Throughput (MB/s), and WAN traffic (MB/s))
- Outbound Protection Over Time (data in GB) – for each recovery site
- Protection Over Time by Site (Journal Usage in GB, VMs protected by count)
- Recovery Reports by VPG, type, and/or status
- Resource Report – shows resources used by protected VMs, which is required by Zerto to ensure recovery capability. (Exports to Excel)
- Usage – exports to CSV, PDF, or ZIP
Conclusion
In conclusion, both products work as advertised, and deciding which product to go with may come down to trust, flexibility, simplicity, scalability, monitoring & reporting, re-protection capabilities, and of course, cost. When considering the cost of either solution, be sure to also include the cost of human hours required to successfully deploy and support either one. Both products have their benefits and quirks, but the bottom line is that THEY BOTH WORK GREAT!
Since I also went through the entire process from design to implementation, to protection, testing, and recovery – it took a considerable amount of time for VMware Site Recovery Manager to become usable due to some external problems we were having, so that sort of left a bad taste in my mouth (it was frustrating – but that was specific to my environment). Because Zerto wasn’t affected by those existing problems in terms of being prevented from working, it felt much simpler, but don’t get me wrong, you still have to plan for your deployment. The time that it took to deploy and have both products functioning varied considerably, with Zerto coming in as the winner in terms of time to protection versus Site Recovery Manager in my experience (again related to the underlying problems in my environment).
Array-based replication is an optional feature of SRM, and once we figured out what was needed on the SAN side for this to work properly, it actually runs nicely. This method has historically an expensive route to go due to the requirement of needing to have the same storage (vendor at least) in each site (protected and recovery). This also introduces another layer of complexity in configuration, administration, maintenance, and support alignment, which will involve SAN administrators. vSphere Replication, on the other hand, is easy to set up and you can be replicating VMs using this method in a short period of time.
Scalability of the products is another area I researched and determined that both products can protect up to 5000 VMs per vCenter instance (refer to comparison tables).
vSphere replication (without Site Recovery Manager) has a limitation of 1 vSphere Replication Appliance per vCenter instance. When leveraging the additional (limit) of 9 more vSphere Replication Servers per vSphere Replication appliance, you can protect up to 2000 VMs – see here for details. When pairing vSphere Replication with Site Recovery Manager and array-based replication, you can achieve protection of up to 5000 VMs per vCenter instance. (SRM Operation Limits)
Zerto can scale out to take advantage of cluster resources by deploying a VRA (virtual replication appliance) to each host in a cluster where you are protecting VMs. The VRAs come at no additional cost (both products are licensed per VM being protected) and can be sized as needed for best performance. When deploying Zerto VRAs, you will need IP addresses, so that’s one downside to having one per host, especially in large environments. On the plus side, you can deploy all those VRAs from one screen and their deployments can be automated, so that saves time.
Compatibility of each product and their requirements vary as well, with SRM having more requirements in both sites (protected and recovery). Since Zerto is basically deployed on top of a virtualization infrastructure, it is not tightly integrated into the base vSphere product nor does it rely on the same version requirements as SRM. Zerto is very flexible in versioning for both protected and recovery sites, and it also can protect and recovery to/from vSphere and Microsoft Hyper-V, or cloud providers.
Lastly, while I’m not seasoned programmer or script guru – at a high-level, both products can be programmatically managed, and both support PowerShell (with SRM requiring the PowerCLI add-on from VMware). Both products can also leverage vRealize Orchestrator, allowing workflow automation for protection tasks. Both products include support for multiple scripting/programming languages and have their APIs documented, however, in the case of SRM, the creation of recovery plans and forced-failovers cannot be automated (per the API documentation). Zerto can be managed through a feature-rich RESTful API that allows management of pretty much every aspect of the product and its capabilities, and their documentation is clear and full of example scripts in each of their supported languages for everyday tasks.
I hope this information has been helpful for those who are trying to decide which product to go with, and as always, comments or questions are welcome! And if you find this to be useful information, please share it!