Kubernetes Backup with Velero

Velero is an open source tool to safely backup and restore, perform disaster recovery, and migrate Kubernetes cluster resources and persistent volumes. It can be setup quickly with Terraform on a EKS cluster and is simple to operate.

An example deployment including EKS can be cloned from here

Installation via Terraform

 1resource "aws_s3_bucket" "velero" {
 2  bucket = "eks-velero-backup-${var.environment_name}"
 3  acl    = "private"
 4  server_side_encryption_configuration {
 5    rule {
 6      apply_server_side_encryption_by_default {
 7        sse_algorithm = "AES256"
 8      }
 9    }
10  }
11  versioning {
12    enabled = true
13  }
14}
15
16resource "aws_s3_bucket_policy" "velero" {
17    bucket = aws_s3_bucket.velero.id
18
19    policy = jsonencode({
20        Version = "2012-10-17"
21        Id      = "velero-${var.environment_name}-bucket-policy"
22        Statement = [
23            {
24                Sid       = "EnforceTls"
25                Effect    = "Deny"
26                Principal = "*"
27                Action    = "s3:*"
28                Resource = [
29                    "${aws_s3_bucket.velero.arn}/*",
30                    "${aws_s3_bucket.velero.arn}",
31                ]
32                Condition = {
33                    Bool = {
34                        "aws:SecureTransport" = "false"
35                    }
36                    NumericLessThan = {
37                        "s3:TlsVersion": 1.2
38                    }
39                }
40            },
41        ]
42    })
43}
44
45module "velero" {
46  source  = "DNXLabs/eks-velero/aws"
47  version = "0.1.2"
48
49  enabled = true
50
51  cluster_name                     = module.eks.cluster_id
52  cluster_identity_oidc_issuer     = module.eks.cluster_oidc_issuer_url
53  cluster_identity_oidc_issuer_arn = module.eks.oidc_provider_arn
54  aws_region                       = var.region
55  create_bucket                    = false
56  bucket_name                      = "eks-velero-backup-${var.environment_name}"
57  helm_chart_version               = "2.30.1"
58}

When a new namespace is created a daily scheduled backup is included in the namespace

 1velero schedule create $NAMESPACE-backup --schedule "0 7 * * *" -n $NAMESPACE
 2
 3➜  ~ (|arn:aws:eks:eu-west-2:123456789012:cluster/cluster-1:default) velero schedule get
 4NAME                 STATUS    CREATED                         SCHEDULE    BACKUP TTL   LAST BACKUP   SELECTOR
 529681-sys-backup     Enabled   2022-07-28 15:31:07 +0100 BST   0 3 * * *   0s           6h ago        <none>
 6backup-demo-backup   Enabled   2022-07-28 08:19:28 +0100 BST   0 3 * * *   0s           6h ago        <none>
 7dev-backup           Enabled   2022-07-28 09:59:44 +0100 BST   0 3 * * *   0s           6h ago        <none>
 8dev-update-backup    Enabled   2022-07-28 09:38:01 +0100 BST   0 3 * * *   0s           6h ago        <none>
 9develop-backup       Enabled   2022-07-28 09:29:47 +0100 BST   0 3 * * *   0s           6h ago        <none>
10pfs-retry-backup     Enabled   2022-07-28 17:17:55 +0100 BST   0 3 * * *   0s           6h ago        <none>
11ui-backup            Enabled   2022-07-28 11:14:28 +0100 BST   0 3 * * *   0s           6h ago        <none>
12➜  ~ (|arn:aws:eks:eu-west-2:123456789012:cluster/cluster-1:default)

Testing

Backup and recovery of a namespace has been tested with a recovery from S3 and EBS snapshots

A deployment has been created in backup-demo

A schedule was created

 1➜  ~ (|arn:aws:eks:eu-west-2:123456789012:cluster/cluster-1:default) velero schedule get
 2
 3NAME                 STATUS    CREATED                         SCHEDULE    BACKUP TTL   LAST BACKUP   SELECTOR
 429681-sys-backup     Enabled   2022-07-28 15:31:07 +0100 BST   0 3 * * *   0s           6h ago        <none>
 5backup-demo-backup   Enabled   2022-07-28 08:19:28 +0100 BST   0 3 * * *   0s           6h ago        <none>
 6dev-backup           Enabled   2022-07-28 09:59:44 +0100 BST   0 3 * * *   0s           6h ago        <none>
 7dev-update-backup    Enabled   2022-07-28 09:38:01 +0100 BST   0 3 * * *   0s           6h ago        <none>
 8develop-backup       Enabled   2022-07-28 09:29:47 +0100 BST   0 3 * * *   0s           6h ago        <none>
 9pfs-retry-backup     Enabled   2022-07-28 17:17:55 +0100 BST   0 3 * * *   0s           6h ago        <none>
10ui-backup            Enabled   2022-07-28 11:14:28 +0100 BST   0 3 * * *   0s           6h ago        <none>

A list of backups available

 1➜  ~ (|arn:aws:eks:eu-west-2:123456789012:cluster/cluster-1:default) velero backup get
 2
 3NAME                                  STATUS            ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
 429681-sys-backup-20220729030042       PartiallyFailed   1        0          2022-07-29 04:00:55 +0100 BST   29d       default            <none>
 5backup-demo-backup-20220729030042     PartiallyFailed   6        0          2022-07-29 04:01:30 +0100 BST   29d       default            <none>
 6backup-demo-schedule-20220729030042   Completed         0        0          2022-07-29 04:01:17 +0100 BST   29d       default            <none>
 7backup-demo-schedule-20220728030041   Completed         0        0          2022-07-28 04:00:41 +0100 BST   28d       default            <none>
 8dev-backup-20220729030042             Completed         0        0          2022-07-29 04:02:09 +0100 BST   29d       default            <none>
 9dev-update-backup-20220729030042      Completed         0        0          2022-07-29 04:01:56 +0100 BST   29d       default            <none>
10develop-backup-20220729030042         Completed         0        0          2022-07-29 04:01:43 +0100 BST   29d       default            <none>
11pfs-retry-backup-20220729030042       Completed         0        0          2022-07-29 04:01:05 +0100 BST   29d       default            <none>
12ui-backup-20220729030042              Completed         0        0          2022-07-29 04:00:42 +0100 BST   29d       default            <none>

Test namespace deleted

1➜  ~ (|arn:aws:eks:eu-west-2:123456789012:cluster/cluster-1:default) k get ns | grep backup-demo
2backup-demo              Active   2d18h
3
4➜  ~ (|arn:aws:eks:eu-west-2:123456789012:cluster/cluster-1:default) k delete ns backup-demo
5namespace "backup-demo" deleted
6
7➜  ~ (|arn:aws:eks:eu-west-2:123456789012:cluster/cluster-1:default) k get ns | grep backup-demo

Backup restored

 1➜  ~ (|arn:aws:eks:eu-west-2:123456789012:cluster/cluster-1:default) velero restore create  --from-backup backup-demo-schedule-20220729030042
 2Restore request "backup-demo-schedule-20220729030042-20220729102625" submitted successfully.
 3Run `velero restore describe backup-demo-schedule-20220729030042-20220729102625` or `velero restore logs backup-demo-schedule-20220729030042-20220729102625` for more details.
 4
 5➜  ~ (|arn:aws:eks:eu-west-2:123456789012:cluster/cluster-1:default) velero restore describe backup-demo-schedule-20220729030042-20220729102625
 6
 7Name:         backup-demo-schedule-20220729030042-20220729102625
 8Namespace:    velero
 9Labels:       <none>
10Annotations:  <none>
11
12Phase:                                 InProgress
13Estimated total items to be restored:  201
14Items restored so far:                 11
15
16Started:    2022-07-29 10:26:27 +0100 BST
17Completed:  <n/a>
18
19Backup:  backup-demo-schedule-20220729030042
20
21Namespaces:
22  Included:  all namespaces found in the backup
23  Excluded:  <none>
24
25Resources:
26  Included:        *
27  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
28  Cluster-scoped:  auto
29
30Namespace mappings:  <none>
31
32Label selector:  <none>
33
34Restore PVs:  auto
35
36Existing Resource Policy:   <none>
37
38Preserve Service NodePorts:  auto
39➜  ~ (|arn:aws:eks:eu-west-2:123456789012:cluster/cluster-1:default)

Backup restored

 1➜  ~ (|arn:aws:eks:eu-west-2:123456789012:cluster/cluster-1:default) velero restore describe backup-demo-schedule-20220729030042-20220729102625
 2
 3Name:         backup-demo-schedule-20220729030042-20220729102625
 4Namespace:    velero
 5Labels:       <none>
 6Annotations:  <none>
 7
 8Phase:                       Completed
 9Total items to be restored:  197
10Items restored:              197
11
12Started:    2022-07-29 10:26:27 +0100 BST
13Completed:  2022-07-29 10:27:03 +0100 BST
14
15Warnings:
16  Velero:     <none>
17  Cluster:  could not restore, CustomResourceDefinition "certificaterequests.cert-manager.io" already exists. Warning: the in-cluster version is different than the backed-up version.
18            could not restore, CustomResourceDefinition "certificates.cert-manager.io" already exists. Warning: the in-cluster version is different than the backed-up version.
19            could not restore, CustomResourceDefinition "ciliumendpoints.cilium.io" already exists. Warning: the in-cluster version is different than the backed-up version.
20            could not restore, CustomResourceDefinition "ciliumnetworkpolicies.cilium.io" already exists. Warning: the in-cluster version is different than the backed-up version.
21            could not restore, CustomResourceDefinition "orders.acme.cert-manager.io" already exists. Warning: the in-cluster version is different than the backed-up version.
22            could not restore, CustomResourceDefinition "schedules.velero.io" already exists. Warning: the in-cluster version is different than the backed-up version.
23            could not restore, CustomResourceDefinition "secretagentconfigurations.secret-agent.secrets.forgerock.io" already exists. Warning: the in-cluster version is different than the backed-up version.
24  Namespaces: <none>
25
26Backup:  backup-demo-schedule-20220729030042
27
28Namespaces:
29  Included:  all namespaces found in the backup
30  Excluded:  <none>
31
32Resources:
33  Included:        *
34  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
35  Cluster-scoped:  auto
36
37Namespace mappings:  <none>
38
39Label selector:  <none>
40
41Restore PVs:  auto
42
43Existing Resource Policy:   <none>
44
45Preserve Service NodePorts:  auto
46
47➜  ~ (|arn:aws:eks:eu-west-2:123456789012:cluster/cluster-1:default) k get po -n backup-demo
48
49NAME                                  READY   STATUS    RESTARTS   AGE
50admin-ui-b89f6f748-zf649              2/2     Running   0          58s
51am-dd47498c5-pbbzh                    1/2     Running   0          58s
52consent-and-auth-ui-f9486bbc4-vc5q2   3/3     Running   0          58s
53ds-cts-0                              2/2     Running   0          58s
54ds-cts-1                              2/2     Running   0          58s
55ds-cts-2                              2/2     Running   0          57s
56ds-idrepo-0                           2/2     Running   0          57s
57ds-idrepo-1                           1/2     Running   0          57s
58ds-umarepo-0                          2/2     Running   0          57s
59end-user-ui-759f8bb9d7-pnr72          2/2     Running   0          57s
60idm-b478d46cb-vvvcf                   1/2     Running   0          57s
61ig-c4fc95547-tdcxh                    1/2     Running   0          56s
62login-ui-7c8f994cb6-2rlqm             2/2     Running   0          56s
63rcs-agent-6f6657cdbb-2tmdb            2/2     Running   0          56s
comments powered by Disqus