Scripted Redis Enterprise installation in Azure

TLDR

I have created a scripted installation for Redis Enterprise on Azure - without a lot of “mandatory tech” that you might find in other solutions. If you want to know why, I’m afraid you need to gather my decisions and the opinions they’re based on, in the text below.

Repo is freely available on Github here: https://github.com/jayonthenet/azdemo

Why this thing?

First question I got from a colleague when showing those little scripts. I had an answer as the ready, but getting this as a first question made me think a little longer, because questioning the existance of something expresses a strong opinion already. So here are some explanation models:

  1. I wanted to have some practice with technology xyz …
  2. I wanted to have a quick and dirty setup, that leaves me in control
  3. Redis Enterprise architecture is opinionated and I honor this with my deployment method
  4. Many of my customers are just where they are with their cloud adoption journey and using a full-blown hybrid scenario based on immutable infra tooling is just not helping them any
  5. I wanted to play around with technology xyz … 😅

Let’s dig into the most relevant ones - 3️⃣ and 4️⃣.

The main trigger for me to have an installation at the ready in this form was 4️⃣ - why? Because I want to have something at my fingertips, when I start a customer conversation. Something I can put on “share screen” and show. Something that is not dependent on a whole lot of frameworks or technologies. At Redis we of course have a sample involving Terraform that is cloud agnostic and very sophisticated, but that is not what I want to explain when there’s no prior knowledge present or when the main scope is just Azure. I want to talk about Redis. So the assumption is:

  • Everyone wo works with Azure and is either in Dev or Ops has at least seen the Azure CLI
  • Everyone wo develops in modern times has seen some shellscripts or can deal with the logic quickly as it looks like “programming language”

The other reason that pushed me over the edge was 3️⃣ and that there is no way to make this a “one step” thing with the installer that Redis has. So what does that mean?

Let’s assume for a moment that we want to install Redis in Azure and we want to run Redis Enterprise ourselves without any containers on plain VMs. This is exactly the same case as running Redis Enterprise on-prem on a bare-metal box from the point of view of the installer - the installer assumes that you already acquired “hardware” (virtual of any kind is as fine of course) and installed a supported OS on it. Depending on the OS there might be some additional preparation steps, e.g. shutting down the pre-installed DNS service of Ubuntu as we want that port free for us to claim.

The installer will take over at exactly that point and install Redis Enterprise software, enable the processes and the process watching daemon and be gone. This leaves you with a “shared nothing” architecture installed node, that can be used to either start or join a cluster, that is then able to serve databases. Oh! “Cluster” in this case can even be one machine only.

With that in mind, it is clear, that the process to setup Redis Enterprise consists of three main steps:

%%{init: {'theme':'dark'}}%% flowchart TD one --> two two --> three subgraph one [1: Acquire hardware] direction TB aa1(Get virtual hardware) --> a1 aa2(Get physical hardware) --> a1 a1(Install OS) --> a2(Perform additional steps depending on the OS) end subgraph two [2: Installation] direction TB b1(Download Redis Enterprise installer) --> b2(Run installer) b2 --> b3(Add additional users to 'redis' group on OS level) end subgraph three [3: Start or join cluster] direction TB cc1(Start a new cluster) --> c1 cc2(Join an existing cluster) --> c1 c1(Node is now 'secure' as it is subject to a clusters RBAC and policy enforcement) c1 --> c2(Open ports to make the node accessible from the outside) end

The thing with (again) number 3️⃣ is, that before being able to start or join a cluster you need to start a new user session. The installer will prepare all components of the software and then make it available for every new shell that comes along, by adding the right pathes and variables to the users environment. All that will not be populated in the current shell but be available on the next interactive shell that is started after the installation completes.

When solving the problem with the very sophisticated “work everywhere” solution, Terraform is utilized to summon the “hardware” and all necessary infrastructure components like virtual networks / switches so we can start the next step. That would be using a config management tool like (non-exaustive list here…) Puppet, Chef, Saltstack or Ansible to perform the actual configuration of the OS and installation of the Redis Enterprise software. Given that these config management tools are able to execute arbitrary complex workflows and sequences of scripts, the next step of starting or joining the cluster is basically just one more tool execution away - done.

But to understand the whole process, I suddenly need to understand the lifecycle of the resources I summon, the execution model of the tool involved - say Terraform and Ansible for a moment - and all the syntax and language included in the tooling. Shall we make this little “explanatory Zoom session” before we can start talking about Redis, say… 4 hours?

Given that my assumptions about Azure CLI and shell scripting are right, I can summon a three node cluster in Azure with my scripts in about 4 minutes - and explain what is happening while the script is running. Have some questions? Want some deeper infos on a few steps - 15 minutes and we’re good to jump into the rabbit hole of realtime data fun that is Redis.

Don’t take my words for granted - take the scripts for a spin and see for yourself, or shoot me a DM via Twitter and get your own demo session with me!

What is this thing?

Now that we have the why out of the way, let’s spend some time on the what.

When you look at the GitHub repo, you’ll find a few shell scripts that are driven by a few fundamental additional components.

  • direnv 🔗 https://direnv.net/

    • This thing is just phenomenal! If there’s one thing that you please take out of this post here, then it’s direnv.
    • cd to a directory and your environment variables are adjusted automagically to what is configured in your .envrc file
    • For my scripts I use direnv to provide all configuration, so that you don’t need to put subscription IDs, usernames, passwords, URLs, etc. - so anything that should be confidential, near a file that is under revision control in Git. This thing reduces the chances for human error as much as possible.
  • Azure CLI 🔗 https://docs.microsoft.com/en-us/cli/azure/install-azure-cli

    • Well - all interaction with Azure as an IaaS is done via the CLI, which ensures maximum scriptability and readability
  • cloud-init 🔗 https://cloudinit.readthedocs.io/en/latest/

    • Cloud agnostic config management baked right into all major Linux images and integrated into all major cloud fabrics (including of course Azure)^
    • Simple YAML bansed configuration, ensures maximum readability
    • Will carry out all necessary steps that need to happen between VM up and start or join cluster
      • Install/update packages
      • Carry out OS specific configurations like disabling swap and DNS servers
      • Download Redis Enterprise
      • Put your prefered user into the Redis admin group

Of course there are the scripts themselves. What they do is of course explained in the readme.md of the repo in more detail, but let me focus on two details here.

  1. Why not put the start or join cluster command into cloud-init itself?

    • Security - I don’t want that command which includes user/pass in plain text to be stored in the filesystem of the VM!
    • Script maintenance - I would need to re-login to consume the environment that the installer prepares. I cannot do that in cloud-init, so I would need to set the right environment myself. If anything happens with the next release of Redis Enterprise that changes how the environment is built, it might break my scripts - I don’t want that.
  2. Why not open the ports on the VM during VM creation? You’re doing that for SSH anyway, right?

    • Security - I only want to open the ports that Redis Enterprise uses, when the node is already part of a cluster and due to that subject to the cluster security. If I open the ports before that state, someone wearing a black hat has a window of opportunity to hijack my node and start his own cluster, locking me out. I know, that this would most probably be more of an inconvenience than a threat, but with security, I always err' on the side of caution.

What about other clouds?

Convince me! 😄

I said that cloud-init is cloud agnostic and so is the general order of steps and events in the scripts. What would need to be changed is the Azure CLI calls against GCP CLI or AWS CLI. I’ll probably do that as soon as there is demand in my team, or at my customers.

The real why is: readability.

One of the drivers behind this whole thing is the easy readability and with that the easy to explain script, that I use for tech intros. This would definitely suffer, when I need to switch CLI calls depending on the cloud environment. Also the config would look much more messy with options for all clouds.

Why is DNS missing?

This is another thing, that is pretty specific. In my team of solution architects at Redis, there is no single source of DNS that was agreed on. I use Azure DNS, but I can set my DNS records there with another script, that is not part of this repo as it wouldn’t help anyone right now. If you feel that having this would be beneficial - let me know and I might just include it in the next update.

Also you can learn more about getting Azure DNS and Redis Enterprise to be friends in this post here.

Conclusions

Sometimes there is more than meets the eye, when making a technical decision. The motivation and context why something was decided is as important as the outcome - even more so, when looking at it after some time. Here are some deeper thoughts and resources on the topic: adr.github.io

I hope I have detailed in my ramblings, why there is merit in having those scripts as they are and how some implementation details came to pass.

If anything is left unclear, or you want to have a friendly techie banter - shoot me a DM via Twitter.