Zerto 10 Role-based Access Controls via Keycloak

If you’re still on Zerto 9.7 or lower on the Windows Zerto Virtual Manager and have been asking for better role-based access controls (RBAC) for Zerto, then you need to get migrated over to the new Zerto Virtual Manager Appliance (ZVMA)!

About the Zerto Virtual Manager Appliance

The Linux-based Zerto Virtual Manager Appliance (ZVMA) made its debut in Zerto 9.5, and has since become the standard going forward with Zerto, as the last Windows version (of the ZVM) was 9.7. In Zerto 10, there is no Windows ZVM, so migration is now on the table and I’d highly recommend going that route to to prevent being left behind (and I will go more into detail about that in another blog post).

In addition to the underlying OS changing, came a modernization of how the ZVM has been architected. Instead of running everything as a single (or maybe a few) Windows services, Zerto has been built to run as containers on top of MicroK8s on a hardened Debian 11 virtual appliance. Please also note that because it’s Debian 11, the minimum vSphere version that supports it is vSphere 7.x.

That said – there is no separate software package to download and install; the ZVMA is now a fully-packaged OVF that you just deploy in vSphere. The best part is once it’s deployed, you’re ready to use it. This fundamental change on how Zerto has been built also introduced the ability to provide more frequent updates (quarterly) and virtually no disruption as each container can be updated independently without having to disrupt the entire functionality of the ZVM.

Now back to why you’re here…

While in the older versions of Zerto, there were some basic role-based access controls, they relied on vSphere roles, which meant that anyone who needed to log into Zerto would need to have credentials to log onto the vCenter client. This has all changed once you’ve entered the world of the Linux ZVM.

Instead of relying on vSphere permissions for each user, Zerto now has it’s own authentication services built on Keycloak (https://www.keycloak.org/), which provides you with a more secure posture when it comes to safeguarding your ability to recover from something as disruptive as a ransomware attack.

By removing the reliance on vSphere logins (which have typically been integrated to Active Directory), the chances of an elevated AD account becoming compromised will not affect Zerto’s operation because there is no dependency on those logins to get into Zerto. Not even the service account Zerto uses to manage API calls to vCenter can affect Zerto, because it’s not even managed by Zerto. While we’re on that subject, the ZVMA also supports MFA for added security. Additionally, you get to keep tighter grips on who actually has access and can log into vSphere while making sure your recovery environment stays protected/isolated.

Configure Role-based Access Controls in Zerto 10

In this section, I’ll cover what the role-based access controls looks like, what roles and permissions are involved, and how to set a user up and grant the correct roles, because when I first went through this, I didn’t find it as intuitive; so hopefully this helps if anyone reading finds themselves in a similar situation.

Note that before doing this, the assumption is that you’re already familiar with deploying the Linux Zerto Virtual Manager (OVF deployment via vCenter) and have already gone through and changed default passwords as well as paired to your vCenter. If you haven’t done that and need the information to do so, visit https://help.zerto.com for the deployment guide.

Also, this is not the guide for configuring Keycloak for any other integration such as Active Directory or Okta, for example. This is simply using accounts local to the ZVMA (in Keycloak). For other supported integration, visit the Zerto documentation at: https://help.zerto.com

Enable Roles and Permissions

Once you’ve completed the pre-requisite steps above, log onto the Zerto Management page at https://[yourZVMAIPAddress]/management. You must do this in order to leverage the Zerto Roles and Permissions through Keycloak.

  1. In the management interface, click on Security & RBAC on the left navigation bar.
  2. Enable the radio button for “No Access” under Roles & Permissions

    Enabling Roles & Permissions

Create a Keycloak User and Configure Permissions

  1. Log onto the Keycloak administration UI at https://[yourZVMAIPAddress]/auth.
  2. Once logged in, click on the realm dropdown menu and switch from master to zerto.

    Changing the realm to zerto realm in Keycloak
  3. Click on Users on the left navigation bar, and then click the Add user button.

    Add a Keycloak user to the zerto realm
  4. In the create user window, set actions as needed, such as update password (change password upon initial logon) or any other options you require. Click Create when done.

    Keycloak create user dialog
  5. You should now see the user details and several tabs across the top. Click on Role mapping.

    Role mapping in user details in Keycloak
  6. Click the Assign role button

    Assign role in Keycloak
  7. At first glance, don’t worry if you don’t see any Zerto roles. (This is what got me and wasn’t clearly identified in the documentation). Click on the filter dropdown menu on the top left, and select Filter by clients.

    Filter by clients selection in Keycloak
  8. You will now see a full list and a section tagged zerto-client. From that section, select the required roles for your user, and click the Assign button at the bottom.

    Zerto roles listed in Keycloak
  9. You will now see the role(s) assigned to the user.

    Assigned role to user in Keycloak
  10. Finally, before the user can try logging in, click on the Credentials tab at the top, and set the password.

    Set the user's password in Keycloak

Managing Zerto Roles by Using Groups

Maybe you don’t want to manage roles and permissions on a per-user basis, especially at scale. Besides, it’s a best practice to use groups for role management so you can simply add users to them down the road without having to repeat the steps above for each user.

So, if your preferred method to manage roles is by group, you can skip the steps above, and follow these steps and be on your way. Just remember, when you set users up, you still have to set the initial password and other options before they can login.

  1. If you’re not already logged into Keycloak, login at https://[yourZVMAIPAddress]/auth.
  2. Change from the master realm (dropdown on the top left) to the zerto realm.
  3. Click on Groups under the Manage section on the left
  4. Click the Create group button.

    Create a group in Keycloak
  5. Provide a name for your group and click Create

    Create a group in Keycloak
  6. Click on the group you just created.

    Group Created in Keycloak
  7. Click on the Role mapping tab at the top, and click Assign Role

    Assign Role to group in Keycloak
  8. Click on the filter dropdown and select Filter by clients.

    Filter by clients in Keycloak
  9. Scroll down the list to the area tagged zerto-client and select the role(s) you wish to apply to the group you just created. When done, click Assign.

    zerto-client roles in Keycloak
  10. Now, add members to the group (if you have previously created users – otherwise, create users and then add them to the group). Click on the Members tab, and click Add member.

    Add members to group in Keycloak
  11. Select the users to add to the group as members, and click the Add button to finish.

Summary

Managing Zerto users in Zerto 10 via Keycloak doesn’t have to be difficult. It’s quite easy, actually, especially when assigning roles at the group level. By assigning different roles to different users depending on what they need access to be able to do, you’re not only exercising better access controls with Zerto, but you are also providing better security, able to create responsibilities for others without giving them any vSphere permissions, and also reducing your own operational/administrative overhead.

Now the question is whether or not to integrate with Active Directory – that is totally up to you. I’m going to leave you with this piece of advice though. Zerto 10 was built with Keycloak to isolate authentication and provide better security when it comes to recovering from cyberthreats. By choosing not to integrate with AD, there is no other way for bad actors to access Zerto, therefore giving you a better chance at quickly turning the tables on them and recovering to a point in time before any malware/ransomware took over. Zerto 10 also introduced in-line encryption detection, so your protected workloads will have a built-in early warning system, so you’ll be able to not only react faster, but be notified before all hell breaks loose.

Let me know your thoughts in the comments, and feel free to ask me any questions about what was shared here.

I will be working on additional Zerto 10 content, so stay tuned!

Share This:

Configuring AWS for Zerto Virtual Replication

By now, it’s no secret that the IT Resilience Platform that Zerto has come to be known as offers complete flexibility when it comes to multi-cloud agility.  This agility allows businesses to accelerate their digital transformation and truly take advantage of what the public cloud platform offers – ensuring even more freedom to choose your cloud and to be able to replicate workloads to, from, and even between public clouds.  As there have been great improvements in Zerto’s any-to-any story, one in particular I’d like to focus on in this article is AWS (Amazon Web Services).

Starting with Zerto Virtual Replication 6.0, customers now have:

  • Orchestration allowing not only targeting AWS for DR or for workload migration, but now the ability to come back out of AWS to on-premises datacenters, or even the ability to replicate between public cloud providers (AWS, Microsoft Azure, IBM Public Cloud) and Cloud Service Providers (CSPs).
  • Zerto Analytics visibility between all sites, including public cloud, now with network statistics and 30-day history.

Now, while these improvements are exciting and offer even more cloud agility to customers, one can’t help but realize that before you can actually start taking advantage of ZVR 6.0 to achieve a hybrid cloud architecture or DR in the cloud (specifically AWS), there are some pre-requisites to complete before doing so.  That said, meeting those requirements may not seem as intuitive as you’d hope at first glance.

While having a cloud use-case is usually the first step, and is determined by business requirements – the challenge lies within understanding what exactly needs to be configured in AWS for ZVR functionality, and how to accomplish it. If you take a look below, the workflow itself is a multi-step process that may not be very easy to perform, until now.

ZVR AWS Workflow
Figure 1: Configuring AWS for ZVR – Workflow

In my usual fashion of wanting to know exactly how things are done and then sharing it with everyone else, I’ve written a how-to document for configuring AWS for Zerto Virtual Replication, which I am happy to say has been turned into an official Zerto whitepaper and is now available for download!

>> Whitepaper – Configuring AWS for Zerto Virtual Replication <<

As usual, feedback, is welcomed with open arms. If you find this useful, please share and be social!

Share This:

Zerto Virtual Manager Outage, Replication, and Self-Healing

I’ve decided to explore what happens when a ZVM (Zerto Virtual Manager) in either the protected site or the recovery site is down for a period of time, and what happens when it is back in service, and most importantly, how an outage of either ZVM affects replication, journal history, and the ability to recover a workload.

Before getting in to it, I have to admit that I was happy to see how resilient the platform is through this test, and how the ability to self-heal is a built in “feature” that rarely gets talked about.

Questions:

  • Does ZVR still replicate when a ZVM goes down?
  • How does a ZVM being down affect checkpoint creation?
  • What can be recovered while the ZVM is down?
  • What happens when the ZVM is returned to service?
  • What happens if the ZVM is down longer than the configured Journal History setting?

Acronym Decoder & Explanations

ZVMZerto Virtual Manager
ZVRZerto Virtual Replication
VRAVirtual Replication Appliance
VPGVirtual Protection Group
RPORecovery Point Objective
RTORecovery Time Objective
BCDRBusiness Continuity/Disaster Recovery
CSPCloud Service Provider
FOTFailover Test
FOLFailover Live

Does ZVR still replicate when a ZVM goes down?

The quick answer is yes.  Once a VPG is created, the VRAs handle all replication.    The ZVM takes care of inserting and tracking checkpoints in the journal, as well as automation and orchestration of Virtual Protection Groups (VPGs), whether it be for DR, workload mobility, or cloud adoption.

In the protected site, I took the ZVM down for over an hour via power-off to simulate a failure.  Prior to that, I made note of the last checkpoint created.  As the ZVM went down, within a few seconds, the protected site dashboard reported RPO as 0 (zero), VPG health went red, and I received an alert stating “The Zerto Virtual Manager is not connected to site Prod_Site…”

The Zerto Virtual Manager is not connected to site Prod_Site

 

Great, so the protected site ZVM is down now and the recovery site ZVM noticed.  The next step for me was to verify that despite the ZVM being down, the VRA continued to replicate my workload.  To prove this, I opened the file server and copied the fonts folder (C:\Windows\Fonts) to C:\Temp (total size of data ~500MB).

As the copy completed, I then opened the performance tab of the sending VRA and went straight to see if the network transmit rate went up, indicating data being sent:

VRA Performance in vSphere, showing data being transmitted to remote VRA in protected site.

Following that, I opened the performance monitor on the receiving VRA and looked at two stats: Data receive rate, and Disk write rate, both indicating activity at the same timeframe as the sending VRA stats above:

Data receive rate (Network) on receiving/recovery VRA Disk write rate on receiving/recovery VRA

As you can see, despite the ZVM being down, replication continues, with caveats though, that you need to be aware of:

  • No new checkpoints are being created in the journal
  • Existing checkpoints up to the last one created are all still recoverable, meaning you can still recover VMs (VPGs), Sites, or files.

Even if replication is still taking place, you will only be able to recover to the latest (last recorded checkpoint) before the ZVM went down.  When the ZVM returns, checkpoints are once again created, however, you will not see checkpoints created for the entire time that ZVM was unavailable.  In my testing, the same was true for if the recovery site ZVM went down while the protected site ZVM was still up.

How does the ZVM being down affect checkpoint creation?

If I take a look at the Journal history for the target workload (file server), I can see that since the ZVM went away, no new checkpoints have been created.  So, while replication continues on, no new checkpoints are tracked due to the ZVM being down, since one of it’s jobs is to track checkpoints.

Last checkpoint created over 30 minutes ago, right before the ZVM was powered off.

 

What can be recovered while the ZVM is down?

Despite no new checkpoints being created – FOT or FOL – VPG Clone, Move, and File Restore services are still available for the existing journal checkpoints.  Given this was something I’ve never tested before, this was really impressive.

One thing to keep in mind though is that this will all depend on how long your Journal history is configured for, and how long that ZVM is down.  I provide more information about this specific topic further down in this article.

What happens when the ZVM is returned to service?

So now that I’ve shown what is going on when the ZVM is down, let’s see what happens when it is back in service.  To do this, I just need to power it back up, and allow the services to start, then see what is reported in the ZVM UI on either site.

As soon as all services were back up on the protected site ZVM, the recovery site ZVM alerted that a Synchronization with site Prod_Site was initiated:

Synchronizing with site Prod_Site

Recovery site ZVM Dashboard during site synchronization.

The next step here is to see what our checkpoint history looks like.  Taking a look at the image below, we can see when the ZVM went down, and that there is a noticeable gap in checkpoints, however, as soon as the ZVM was back in service, checkpoint creation resumed, with only the time during the outage being unavailable.

Checkpoints resume

 

What happens if the ZVM is down longer than the configured Journal History setting?

In my lab, for the above testing, I set the VPG history to 1 hour.  That said, if you take a look at the last screen shot, older checkpoints are still available (showing 405 checkpoints).  When I first tried to run a failover test after this experiment, I was presented with checkpoints that go beyond an hour.  When I selected the oldest checkpoint in the list, a failover test would not start, even if the “Next” button in the FOT wizard did not gray out.  What this has lead me to believe is that it may take a minute or two for the journal to be cleaned up.

Because I was not able to move forward with a failover test (FOT), I went back in to select another checkpoint, and this time, the older checkpoints were gone (from over an hour ago).  Selecting the oldest checkpoint at this time, allowed me to run a successful FOT because it was within range of the journal history setting.  Lesson learned here – note to self: give Zerto a minute to figure things out, you just disconnected the brain from the spine!

Updated Checkpoints within Journal History Setting

Running a failover test to validate successful usage of checkpoints after ZVM outage:

File Server FOT in progress, validating fonts folder made it over to recovery site.

And… a recovery report to prove it:

Recovery Report - Successful FOT Recovery Report - Successful FOT

 

Summary and Next Steps

So in summary, Zerto is self-healing and can recover from a ZVM being down for a period of time.  That said, there are some things to watch out for, which include known what your configured journal setting is, and how a ZVM being down longer than the configured history setting affects your ability to recover.

You can still recover, however, you will start losing older checkpoints as time goes on while the ZVM is down.  This is because of the first-in-first-out (FIFO) nature of how the journal works.  You will still have the replica disks and journal checkpoints committing to it as time goes on, so losing history doesn’t mean you’re lost, you will just end up breaching your SLA for history, which will re-build over time as soon as the ZVM is back up.

As a best practice, it is recommended you have a ZVM in each of your protected sites, and in each of your recovery sites for full resilience.  Because after all, if you lose one of the ZVMs, you will need at least either the protected or recovery site ZVM available to perform a recovery.  The case is different if you have a single ZVM.  If you must have a single ZVM, put it into the recovery site, and not on the protected site, because chances are, your protected site is what you’re accounting for going down in any planned or unplanned event.  It makes most sense to have the single ZVM in the recovery site.

In the next article, I’ll be exploring this very example of a single ZVM and how that going down affects your resiliency.  I’ll also be testing some ways to potentially protect that single ZVM in the event it is lost.

Thanks for reading!  Please comment and share, because I’d like to hear your thoughts, and am also interested in hearing how other solutions handle similar outages.

Share This:

Zerto Automation with PowerShell and REST APIs

Zerto is simple to install and simple to use, but it gets better with automation!  While performing tasks within the UI can quickly become second nature, you can quickly find yourself spending a lot of time repeating the same tasks over and over again.  I get it, repetition builds memory, but it gets old.  As your environment grows, so does the amount of time it takes to do things manually.  Why do things manually when there are better ways to spend your time?

Zerto provides great documentation for automation via PowerShell and REST APIs, along with Zerto Cmdlets that you can download and install to add-on to  PowerShell to be able to do more from the CLI.  One of my favorite things is that the team has provided functional sample scripts that are pretty much ready to go; so you don’t have to develop them for common tasks, including:

  • Querying and Reporting
  • Automating Deployment
  • Automating VM Protection (including vRealize Orchestrator)
  • Bulk Edits to VPGs or even NIC settings, including Re-IP and PortGroup changes
  • Offsite Cloning

For automated failover testing, Zerto includes an Orchestrator for vSphere, which I will cover in a separate set of posts.

To get started with PowerShell and RESTful APIs, head over to the Technical Documentation section of My Zerto and download the Zerto PowerShell Cmdlets (requires MyZerto Login) and the following guides to get started, and stay tuned for future posts where I try these scripts out and offer a little insight to how to run them, and also learn how I’ve used them!

  • Rest APIs Online Help – Zerto Virtual Replication
    • The REST APIs provide a way to automate many DR related tasks without having to use the Zerto UI.
  • REST API Reference Guide – Zerto Virtual Replication
    • This guide will help you understand how to use the ZVR RESTful APIs.
  • REST API Reference Guide – Zerto Cloud Manager
    • This guide explains how to use the ZCM RESTful APIs.
  • PowerShell Cmdlets Guide – Zerto Virtual Replication
    • Installation and use guide for the ZVR Windows PowerShell cmdlets.
  • White Paper – Automating Zerto Virtual Replication with PowerShell and REST APIs
    • This document includes an overview of how to use ZVR REST APIs with PowerShell to automate your virtual infrastructure.  This is the document that also includes several functional scripts that take the hard work out of everyday tasks.

If you’ve automated ZVR using PowerShell or REST APIs, I’d like to hear how you’re using it and how it’s changed your overall BCDR strategy.

I myself am still getting started with automating ZVR, but am really excited to share my experiences, and hopefully, help others along the way!  In fact, I’ve already been working with bulk VRA deployment, so check back or follow me on twitter @EugeneJTorres for updates!

Share This: