AWS Deployment using S3

Note

Chef Automate 4.10.1 released on 6th September 2023 includes improvements to the deployment and installation experience of Automate HA. Please read the blog to learn more about key improvements. Refer to the pre-requisites page (On-Premises, AWS) and plan your usage with your customer success manager or account manager.

Note

If the user chooses backup_config as s3 in config.toml, backup is already configured during deployment, the below steps are not required. If we have kept the backup_config blank, then the configuration needs to be configured manually.
Encrypted S3 bucket are supported with only Amazon S3 managed keys (SSE-S3).

Overview

To Communicate with Amazon S3 we need an IAM Role with the required policy.

Attach the IAM Role to the All the OpenSearch Node and Frontend Node.

Note

In case of if you are using the Managed AWS Service you need to create a snapshot-role for OpenSearch.

Configuration in Provision host

Create a .toml say, automate.toml.

Refer to the content for the automate.toml file below:

[global.v1]
  [global.v1.external.opensearch.backup]
    enable = true
    location = "s3"

  [global.v1.external.opensearch.backup.s3]

    # bucket (required): The name of the bucket
    bucket = "bucket-name"

    # base_path (optional):  The path within the bucket where backups should be stored
    # If base_path is not set, backups will be stored at the root of the bucket.
    base_path = "opensearch"

    # name of an s3 client configuration you create in your opensearch.yml
    # see https://www.open.co/guide/en/opensearch/plugins/current/repository-s3-client.html
    # for full documentation on how to configure client settings on your
    # OpenSearch nodes
    client = "default"

  [global.v1.external.opensearch.backup.s3.settings]
    ## The meaning of these settings is documented in the S3 Repository Plugin
    ## documentation. See the following links:
    ## https://www.open.co/guide/en/opensearch/plugins/current/repository-s3-repository.html

    ## Backup repo settings
    # compress = false
    # server_side_encryption = false
    # buffer_size = "100mb"
    # canned_acl = "private"
    # storage_class = "standard"
    ## Snapshot settings
    # max_snapshot_bytes_per_sec = "40mb"
    # max_restore_bytes_per_sec = "40mb"
    # chunk_size = "null"
    ## S3 client settings
    # read_timeout = "50s"
    # max_retries = 3
    # use_throttle_retries = true
    # protocol = "https"

  [global.v1.backups]
    location = "s3"

  [global.v1.backups.s3.bucket]
    # name (required): The name of the bucket
    name = "bucket-name"

    # endpoint (required): The endpoint for the region the bucket lives in for Automate Version 3.x.y
    # endpoint (required): For Automate Version 4.x.y, use this https://s3.amazonaws.com
    endpoint = "https://s3.amazonaws.com"

    # base_path (optional):  The path within the bucket where backups should be stored
    # If base_path is not set, backups will be stored at the root of the bucket.
    base_path = "automate"

  [global.v1.backups.s3.credentials]
    access_key = "<Your Access Key>"
    secret_key = "<Your Secret Key>"

Execute the command given below to trigger the deployment.
```
chef-automate config patch --frontend automate.toml
```

Note

IAM Role: Assign the IAM Role to all the OpenSearch instances in the cluster created above.

Backup and Restore

Backup

To create the backup, by running the backup command from bastion. The backup command is as shown below:
```
chef-automate backup create
```

Restore

To restore backed-up data of the Chef Automate High Availability (HA) using External AWS S3, follow the steps given below:

Check the status of all Chef Automate and Chef Infra Server front-end nodes by executing the chef-automate status command.
Log in to the same instance of Chef Automate front-end node from which backup is taken.
Execute the restore command from bastion chef-automate backup restore s3://bucket_name/path/to/backups/BACKUP_ID --skip-preflight --s3-access-key "Access_Key" --s3-secret-key "Secret_Key".
In case of Airgapped Environment, Execute this restore command from bastion chef-automate backup restore <object-storage-bucket-path>/backups/BACKUP_ID --airgap-bundle </path/to/bundle> --skip-preflight.

Note

If you are restoring the backup from an older version, then you need to provide the --airgap-bundle </path/to/current/bundle>.
If you have not configured S3 access and secret keys during deployment or if you have taken backup on a different bucket, then you need to provide the --s3-access-key <Access_Key> and --s3-secret-key <Secret_Key> flags.
Large Compliance Report is not supported in Automate HA

Troubleshooting

Try these steps if Chef Automate returns an error while restoring data.

Check the Chef Automate status.
```
chef-automate status
```
Check the status of your Habitat service on the Automate node.
```
hab svc status
```
If the deployment services are not healthy, reload them.
```
hab svc load chef/deployment-service
```

Now check the status of the Automate node and then try running the restore command from the bastion host.

How to change the base_path or path. The steps for the File System backup are as shown below:

While at the time of deployment backup_mount default value will be /mnt/automate_backups
In case, if you modify the backup_mount in config.toml before deployment, then the deployment process will do the configuration with the updated value
In case, you changed the backup_mount value post-deployment, then we need to patch the configuration manually to all the frontend and backend nodes, for example, if you change the backup_mount to /bkp/backps

Update the FE nodes with the below template, use the command chef-automate config patch fe.toml --fe

   [global.v1.backups]
      [global.v1.backups.filesystem]
         path = "/bkp/backps"
   [global.v1.external.opensearch.backup]
      [global.v1.external.opensearch.backup.fs]
         path = "/bkp/backps"

Update the OpenSearch node with the below template, use the command chef-automate config patch os.toml --os

[path]
   repo = "/bkp/backps"

Run the curl request to one of the automate frontend node

curl localhost:10144/_snapshot?pretty

If the response is empty {}, then we are good
If the response has json output, then it should have correct value for the backup_mount, refer the location value in the response. It should start with the /bkp/backps

{
 "chef-automate-es6-event-feed-service" : {
 "type" : "fs",
 "settings" : {
 "location" : "/mnt/automate_backups/opensearch/automate-elasticsearch-data/chef-automate-es6-event-feed-service"
       }
    },
 "chef-automate-es6-compliance-service" : {
 "type" : "fs",
 "settings" : {
 "location" : "/mnt/automate_backups/opensearch/automate-elasticsearch-data/chef-automate-es6-compliance-service"
       }
    },
 "chef-automate-es6-ingest-service" : {
     "type" : "fs",
 "settings" : {
 "location" : "/mnt/automate_backups/opensearch/automate-elasticsearch-data/chef-automate-es6-ingest-service"
       }
    },
 "chef-automate-es6-automate-cs-oc-erchef" : {
 "type" : "fs",
 "settings" : {
 "location" : "/mnt/automate_backups/opensearch/automate-elasticsearch-data/chef-automate-es6-automate-cs-oc-erchef"
       }
    }
 }

If the pre string in the location is not match with backup_mount, then we need to to delete the existing snapshots. use below script to delete the snapshot from the one of the automate frontend node.

   snapshot=$(curl -XGET http://localhost:10144/_snapshot?pretty | jq 'keys[]')
   for name in $snapshot;do
       key=$(echo $name | tr -d '"')
      curl -XDELETE localhost:10144/_snapshot/$key?pretty
   done

The above scritp requires the jq needs to be installed, You can install from the airgap bundle, please use command on the one of the automate frontend node to locate the jq package.

ls -ltrh /hab/cache/artifacts/ | grep jq

-rw-r--r--. 1 ec2-user ec2-user  730K Dec  8 08:53 core-jq-static-1.6-20220312062012-x86_64-linux.hart
-rw-r--r--. 1 ec2-user ec2-user  730K Dec  8 08:55 core-jq-static-1.6-20190703002933-x86_64-linux.hart

In case of multiple jq version, then install the latest one. use the below command to install the jq package to the automate frontend node

hab pkg install /hab/cache/artifacts/core-jq-static-1.6-20190703002933-x86_64-linux.hart -bf

Below steps for object storage as a backup option

While at the time of deployment backup_config will be object_storage
To use the object_storage, we are using below template at the time of deployment

   [object_storage.config]
    google_service_account_file = ""
    location = ""
    bucket_name = ""
    access_key = ""
    secret_key = ""
    endpoint = ""
    region = ""

If you configured pre deployment, then we are good
If you want to change the bucket or base_path, then use the below template for Frontend nodes

[global.v1]
  [global.v1.external.opensearch.backup.s3]
      bucket = "<BUCKET_NAME>"
      base_path = "opensearch"
   [global.v1.backups.s3.bucket]
      name = "<BUCKET_NAME>"
      base_path = "automate"

You can choose any value for the variable base_path. base_path patch is only required for the frontend node.
Use the command to apply the above template chef-automate config patch frontend.toml --fe
Post the configuration patch, and use the curl request to validate
```
curl localhost:10144/_snapshot?pretty
```
If the response is empty {}, then we are good

If the response has JSON output, then it should have the correct value for the base_path

{
    "chef-automate-es6-event-feed-service" : {
      "type" : "s3",
      "settings" : {
        "bucket" : "MY-BUCKET",
        "base_path" : "opensearch/automate-elasticsearch-data/chef-automate-es6-event-feed-service",
        "readonly" : "false",
        "compress" : "false"
      }
    },
    "chef-automate-es6-compliance-service" : {
      "type" : "s3",
      "settings" : {
        "bucket" : "MY-BUCKET",
        "base_path" : "opensearch/automate-elasticsearch-data/chef-automate-es6-compliance-service",
        "readonly" : "false",
        "compress" : "false"
      }
    },
    "chef-automate-es6-ingest-service" : {
      "type" : "s3",
      "settings" : {
        "bucket" : "MY-BUCKET",
        "base_path" : "opensearch/automate-elasticsearch-data/chef-automate-es6-ingest-service",
        "readonly" : "false",
        "compress" : "false"
      }
    },
    "chef-automate-es6-automate-cs-oc-erchef" : {
      "type" : "s3",
      "settings" : {
        "bucket" : "MY-BUCKET",
        "base_path" : "opensearch/automate-elasticsearch-data/chef-automate-es6-automate-cs-oc-erchef",
        "readonly" : "false",
        "compress" : "false"
      }
    }
}

In case of base_path value is not matching, then we have to delete the existing snapshot. please refer to the steps from the file system.

For Disaster Recovery or AMI upgrade, while running the restore in the secondary cluster which is in a different region follow the steps given below.

Make a curl request in any OpenSearch nodecurl -XGET https://localhost:9200/_snapshot?pretty --cacert /hab/svc/automate-ha-opensearch/config/certificates/root-ca.pem --key /hab/svc/automate-ha-opensearch/config/certificates/admin-key.pem --cert /hab/svc/automate-ha-opensearch/config/certificates/admin.pem -k
Check the curl request response if the region is not matching with the primary cluster follow the below steps:

Modify the region in FrontEnd nodes by patching the below configs with command, chef-automate config patch <file-name>.toml --fe
```
[global.v1.external.opensearch.backup.s3.settings]
              region = "<FIRST-CLUSTER-REGION>"
```

Make a PUT request in an OpenSearch node by running this script:

indices=(
chef-automate-es6-automate-cs-oc-erchef
chef-automate-es6-compliance-service
chef-automate-es6-event-feed-service
chef-automate-es6-ingest-service
)
for index in ${indices[@]}; do
curl -XPUT -k -H 'Content-Type: application/json' https://<IP>:9200/_snapshot/$index --data-binary @- << EOF
{
  "type" : "s3",
    "settings" : {
      "bucket" : "<YOUR-PRIMARY-CLUSTER-BUCKET-NAME>",
      "base_path" : "elasticsearch/automate-elasticsearch-data/$index",
      "region" : "<YOUR-PRIMARY-CLUSTER-REGION>",
      "role_arn" : " ",
      "compress" : "false"
    }
}
EOF
done