Getting Azure DNS to play nice with Redis Enterprise

Why is DNS a thing in Redis Enterprise ?

Redis Enterprise (RE) is different in quite a few ways to Redis OSS - one being that RE is running it’s own DNS servers on all cluster nodes, while Redis OSS is not. RE will utilize DNS not only to expose named endpoints to databases, but also for cluster related concerns, like failover and service discovery. When a request is made against RE, the client only needs to know the FQDN and DNS will solve the riddle about which cluster node to target. If at any point this cluster node gets unavailable, it’s just a matter of changing the DNS entries to consistently fail-over all traffic and keep the system available.

For that to work you need a few entries in DNS. Let’s have a look at my little experimental cluster sitting in Google Cloud, that we want to make known to the world via Azure DNS for a little multi-cloud touch.

I have three nodes (just enough for quorum in case I want to experiment with failover scenarios) sitting there with RE installed - those are cheap e2-medium machines at around 32$ each per month and won’t give me enough performance for load testing but make me not feel bad for leaving them running longer sometimes.

You can grab your own demo version of RE here if you want to follow along. Next blog will be about installing RE to make this more easy.

cluster

So let’s call my friend “cluster”. My sandbox is not clearco.de but funnyco.de - which turns him into cluster.redis.funnyco.de :)

To roll that into Azure DNS, I would go to the Azure portal, to quickly generate three DNS zones (docs here).

graph LR; A(funnyco.de.)-->B(redis); B-->C(cluster);

Why is Azure DNS unintuitive for RE?

Here we go. Three DNS zones. To make this work, each underlying DNS zone will have a nameserver entry (NS) pointing “interested parties” to the right nameservers to ask for contents of the zone. So funnyco.de. has the nameservers for redis.

graph LR; A(funnyco.de.)-->B(redis); B-->C(cluster); A-->D(redis NS
ns1.azure...
ns2.azure...
ns3.azure...); D-->B; B-->E(cluster NS
ns1.azure...
ns2.azure...
ns3.azure...); E-->C; C-->F(A ns1 >> external IP); C-->G(A ns2 >> external IP ); C-->H(A ns3 >> external IP);

This goes one level further, so redis.funnyco.de. has the NS entries for cluster.redis.funnyco.de. - wait a moment! That does collide with RE wanting to be the nameservers, so it can publicly announce where the entrypoint to the RE cluster is and do e.g. HA failovers via DNS.

No worries - I can fix that! So enter the hostnames of the nodes of my cluster into the NS entry of redis.funnyco.de. pointing to cluster - still broken! I now have my cluster nodes as targets, but the actual conversion of names to IP adresses would happen in the “cluster” DNS zone in Azure, where no DNS client will ever ask because my cluster nodes are responsible now.

#sigh

Then I tried the obvious and deleted my cluster DNS zone. My NS records in redis.funnco.de. were still pointing at my nodes names. Tried creating ns1.cluster as an A record in redis.funnyco.de. and that was it - working!

graph LR; A(funnyco.de.)-->B(redis); A-->D(redis NS
ns1.azure...
ns2.azure...
ns3.azure...); D-->B; B-->E(cluster NS
ns1.cluster.redis.funnyco.de
ns2.cluster.redis.funnyco.de
ns3.cluster.redis.funnyco.de); B-->F(A ns1.cluster >> external IP); B-->G(A ns2.cluster >> external IP ); B-->H(A ns3.cluster >> external IP);

Azure portal looks like this in my case:

azure

To safe yourself from frustration of errors persisting too long in DNS while you get everything right, think about the TTL. I do set this to 10 seconds for all entries, which are “in flow” so changes propagate quickly and I can experiment without frustration. Just be sure to reset to a normal interval, as soon as you’ve reached a stable state, to be a good citizen.

How do I see if everything is okay?

  1. dig is your friend!

    • Querying for the cluster FQDN should yield an A record which corresponds to a cluster node
    •   ❯ dig cluster.redis.funnyco.de
      
        ; <<>> DiG 9.10.6 <<>> cluster.redis.funnyco.de
        ;; global options: +cmd
        ;; Got answer:
        ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12326
        ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
      
        ;; OPT PSEUDOSECTION:
        ; EDNS: version: 0, flags:; udp: 512
        ;; QUESTION SECTION:
        ;cluster.redis.funnyco.de.	IN	A
      
        ;; ANSWER SECTION:
        cluster.redis.funnyco.de. 15	IN	A	35.207.87.1
      
        ;; Query time: 113 msec
        ;; SERVER: 8.8.8.8#53(8.8.8.8)
        ;; WHEN: Wed Oct 27 11:18:59 CEST 2021
        ;; MSG SIZE  rcvd: 69
      
    • Querying for the nameservers of the cluster should return NS entries for all participating cluster nodes
    •   ❯ dig cluster.redis.funnyco.de NS
      
        ; <<>> DiG 9.10.6 <<>> cluster.redis.funnyco.de NS
        ;; global options: +cmd
        ;; Got answer:
        ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60503
        ;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1
      
        ;; OPT PSEUDOSECTION:
        ; EDNS: version: 0, flags:; udp: 512
        ;; QUESTION SECTION:
        ;cluster.redis.funnyco.de.	IN	NS
      
        ;; ANSWER SECTION:
        cluster.redis.funnyco.de. 3600	IN	NS	ns3.cluster.redis.funnyco.de.
        cluster.redis.funnyco.de. 3600	IN	NS	ns2.cluster.redis.funnyco.de.
        cluster.redis.funnyco.de. 3600	IN	NS	ns1.cluster.redis.funnyco.de.
      
        ;; Query time: 38 msec
        ;; SERVER: 8.8.8.8#53(8.8.8.8)
        ;; WHEN: Wed Oct 27 11:19:08 CEST 2021
        ;; MSG SIZE  rcvd: 107
      
  2. You could also have a look at a small tool that we use at Redis to assess the healthiness of a clusters DNS situation

    • Find it on GitHub –> DNStracer
    •   ❯ dnstracer -e redis-12466.cluster.redis.funnyco.de -d
        --------------------------------
                NS Record Test: OK
                Glue Record Test: OK
                NS Access Test: OK
                SOA Match Test: OK
                A Record Test: OK
        --------------------------------
        Results Debug:
        {ResultA:true ResultNS:true ResultGlue:true ResultAccess:true ResultSOAMatch:true}
        --------------------------------
        &{LocalA:[35.207.81.132] DNS2A:[35.207.81.132] DNS1A:[35.207.81.132] LocalNS:[ns1.cluster.redis.funnyco.de. ns2.cluster.redis.funnyco.de. ns3.cluster.redis.funnyco.de.] DNS2NS:[ns1.cluster.redis.funnyco.de. ns2.cluster.redis.funnyco.de. ns3.cluster.redis.funnyco.de.] DNS1NS:[ns1.cluster.redis.funnyco.de. ns2.cluster.redis.funnyco.de. ns3.cluster.redis.funnyco.de.] LocalGlue:[35.207.81.132 35.207.87.1 35.207.91.131] DNS2Glue:[35.207.81.132 35.207.87.1 35.207.91.131] DNS1Glue:[35.207.81.132 35.207.87.1 35.207.91.131] SOAMatch:true PublicMatchA:true LocalMatchA:true PublicMatchNS:true LocalMatchNS:true PublicMatchGlue:true LocalMatchGlue:true EndpointStatus:[true true true]}
        {ResultA:true ResultNS:true ResultGlue:true ResultAccess:true ResultSOAMatch:true}
        OK
      
  3. whatsmydns.net is a fine tool to quickly test against a lot of nameservers round the world

Conclusion

After wrapping my head around how to get the information into the right spot in Azure DNS, it seemed like a trivial thing to solve, but I thought it would be nice to post this somewhere if someone else stumbled over the same issue.

As soon as it is working, the setup is bullet proof and Azure DNS is once again keeping it’s place as my primary DNS solution for experimental or explorative work.

Are you hitting a brick wall? Shoot me a DM on Twitter and we’ll find a way!