ha-metrix-banner
ha-metrix-banner

It is an expert opinion about Business Continuity Planning for IT applications concerning HA (High Availability Framework). This post is sponsored by Kuberneteslab. Kuberneteslab helps startups to manage the horizontal layer of the application lifecycle. It includes DevOps + BCP + App modernization.

High Availability and Disaster Recovery are two different crucial concepts about IT Applications.

Enterprises do adopt HA (High Availability) framework. It makes sure all mission-critical applications must be running and available to end-users ALWAYS. Netflix is an example to make sure its proper disaster recovery over different regions.

Let’s discuss steps about the HA framework. We can not make everything HA just in 1 day. There is a process to follow the HA framework. Also, it is better to adopt the top-down approach.

HA framework
HA framework

There are four major steps to make sure highest level of availability and easy disaster recovery. This is a top-down approach. Let’s talk about every step 1 by 1.

1. Node Level HA: Everything does run on the node (aka machine, instance, VM). It can be a physical server or a virtual one. The first step to moving with HA is to make sure the process/pod/container is running on a unique node each time. Cluster orchestration like Kubernetes provides pod affinity/anti-affinity to achieve this.

Following is an example of the hard rule with respect to Kubernetes Pod affinity/anti-affinity.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cache-app
spec:
  selector:
    matchLabels:
      app: store
  replicas: 3
  template:
    metadata:
      labels:
        app: store
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - store
            topologyKey: "kubernetes.io/hostname"
      containers:
      - name: cache-server
        image: cache-server:1.0

2. Rack level HA: Once you have achieved the Node Level HA(High Availability) you should think about the Rack Level HA(High Availability). But why you do need this ?.So even in a rack-level disaster, you need to make sure everything runs smoothly.

Following is the example of the Node Level HA & Rack Level HA.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cache-app
spec:
  selector:
    matchLabels:
      app: store
  replicas: 3
  template:
    metadata:
      labels:
        app: store
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - store
            topologyKey: "kubernetes.io/hostname"
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - store
              topologyKey: "topology.kubernetes.io/rack"
      containers:
      - name: cache-server
        image: cache-server:1.0

Rack level HA is good to have for stateful applications. It will make sure the availability over the Rack Failure.

3. Availability Zone Level HA: It’s called Zone Level HA as well. Here the process/pods/containers spread over the AZ.
Following is the kubernetes node affinity example from AWS. You can add Rack and Node level HA too.

apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: azname
            operator: In
            values:
            - az1
            - az2
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: another-node-label-key
            operator: In
            values:
            - another-node-label-value
  containers:
  - name: with-node-affinity
    image: us.gcr.io/k8s-artifacts-prod/pause:2.0

4. Region Level HA: This one is the highest level of availability for any application except the multi-cloud approach. Here two sets of the application run in two different regions. Also, load balancing is a crucial factor for such a type of HA. This type of HA later can be extended to the multi-cloud approach. However, in stateful applications, we need to take care of the extra configuration. It’s challenging to achieve where costs become higher to manage this.

Recommendation for HA framework:

  1. Always make sure Rack level availability is minimum.
  2. In Kubernetes, you can configure the soft rule. It’s good to have a soft rule for the Rack and Zone level HA. Scaling will be not blocked in terms of the fewer Rack’s available and the unavailability of the zone.
  3. Take extra care of the stateful applications. For example, ETCD/Zookeeper with multiple zones needs to be tested with latency tests.
  4. While spreading the masters of the cluster in Zone and Region Level HA do proper latency and functionality tests with the highest load.
  5. Set proper monitoring and alerting for the application and infrastructure.

Metrix for the HA framework:

ha-table
Metrix for HA Framework

You can use the Metrix table to define your application HA compliance with HA framework.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments