Data Lifecycle

Data Lifecycle manages the retention of events, service groups, Chef Infra Client runs, compliance reports, and scans in Chef Automate. Chef Automate stores data from the ingest-service, event-feed-service, compliance-service and applications-service in OpenSearch or PostgreSQL. Over time, you may wish to remove that data from Chef Automate by using the data lifecycle settings.

Data Lifecycle

Navigate to Settings > Data Lifecycle and adjust any settings you would like to change. After making changes, use the Save Changes button to apply your changes.

Users with dataLifecycle:* IAM access can see the data lifecycle job statuses, configure jobs or run jobs. While configuring the data lifecycle in a number of days, please refer to the section on Impact On OpenSearch Shards to set the appropriate maximum shards per node value.

Data Lifecycle

Event Feed

The Event Feed Data Lifecycle settings allow you to remove all event feed data and Chef Infra Server actions after a set amount of days. The default is to remove event feed data after 7 days, and Chef Infra Server actions after 30 days.

Service Groups

The Service Group Data Lifecycle settings allow you to label health check reports as disconnected and automatically remove them after a set amount of time. The default is to label health check reports as disconnected after 5 minutes, and remove disconnected services after 7 days.

Client Runs

The Client Runs data lifecycle settings allow you to remove data after a set amount of days. They also allow you to label nodes as missing and automatically remove them after a set amount of days. The default is to remove Chef Infra Client run data after 30 days, to label nodes as missing after 1 day, and to remove nodes labeled as missing after 30 days.

Compliance

The Compliance data lifecycle settings allow you to remove compliance reports and compliance scans after a set amount of days. The default is to remove compliance reports after 60 days, and to remove compliance scans after 60 days.

Impact on OpenSearch Shards

The data retention period set in the data lifecycle settings impacts the number of shards in OpenSearch.

If the number of shards per node grows higher than the max shards per node configured in OpenSearch, you may see failures while ingesting data to Automate. For more information, refer to the error troubleshoot document.

The default value for max shards per node in OpenSearch is 1000.

Refer to the following table to calculate the max shards based on your data retention requirement.

Days	Number of Shards Per Node for Client Run	Number of Shards Per Node for Compliance	Total Number of Shards Per Node
30	350	650	1000
60	700	1300	2000
90	1100	1900	3000
365	4000	8000	12000

To set the value of max shards per node:

Create the following configuration in a .toml file.

  [opensearch.v1.sys.cluster]
  max_shards_per_node = 1000

Patch the configuration

  chef-automate config patch </path/to/your-file.toml>

Data Lifecycle API

Chef Automate stores data from the ingest-service, event-feed-service, compliance-service and applications-service in OpenSearch or PostgreSQL.

The data-lifecycle API allows configuring and running lifecycle jobs by data type:

infra - Chef Infra Server actions and Chef Infra Client converge data
compliance - Chef InSpec reports and Chef Compliance scans
event-feed - Event metadata that powers the operational visibility and query language
services - Chef Habitat Services data

To see the data lifecycle job statuses, configure jobs, or run jobs requires an api token with dataLifecycle:* IAM access.

Status

To see the combined status and configuration for all data lifecycle jobs, you can use the global status endpoint:

These examples use the Unix/Linux curl command with the options:

Short Option	Long Option	Definition
-s	–silent	Silent or quiet mode, does not show progress meter or error messages.
-S	–show-error	When used with -s, –silent, it makes curl show an error message if it fails.
-k	–insecure	Instructs curl to proceed and operate even for server connections otherwise considered insecure.
-H	–header	Sends a header with the request. In this case, the header is your API token.

curl -s -S shows errors but does not show the progress meter:

curl -s -S -k -H "api-token: $TOKEN" https://automate.example.com/api/v0/data-lifecycle/status

To see individual statuses by data type, you can access the data type sub-status endpoints:

curl -s -S -k -H "api-token: $TOKEN" https://automate.example.com/api/v0/data-lifecycle/event-feed/status

Swap event-feed for infra or compliance or services to see their corresponding jobs.

The status is the total of the job configuration, details about its next scheduled run, and details about any previous runs.

Configuration

Configure the data lifecycle job settings by creating a JSON file with the desired configuration.

{ "infra": {
    "job_settings": [
      { "name":"delete_nodes",
        "disabled": true,
        "recurrence": "FREQ=MINUTELY;DTSTART=20191106T180240Z;INTERVAL=15",
        "threshold": "365d"
      },
      { "name":"missing_nodes",
        "disabled": false,
        "recurrence": "FREQ=MINUTELY;DTSTART=20191106T180240Z;INTERVAL=15",
        "threshold": "1d"
      },
      { "name":"missing_nodes_for_deletion",
        "disabled": false,
        "recurrence": "FREQ=MINUTELY;DTSTART=20191106T180240Z;INTERVAL=15",
        "threshold": "30d"
      },
      { "name":"periodic_purge_timeseries",
        "disabled": false,
        "recurrence": "FREQ=DAILY;DTSTART=20191106T180240Z;INTERVAL=1",
        "purge_policies": {
          "opensearch": [
            {
              "policy_name": "actions",
              "older_than_days": 30,
              "disabled": false
            },
            {
              "policy_name": "converge-history",
              "older_than_days": 30,
              "disabled": false
            }
          ]
        }
      }
    ]
  },
  "compliance": {
    "job_settings": [
      {
        "name": "periodic_purge",
        "disabled": false,
        "recurrence": "FREQ=DAILY;DTSTART=20191106T180323Z;INTERVAL=1",
        "purge_policies": {
          "opensearch": [
            {
              "policy_name": "compliance-reports",
              "older_than_days": 100,
              "disabled": false
            },
            {
              "policy_name": "compliance-scans",
              "older_than_days": 100,
              "disabled": false
            }
          ]
        }
      }
    ]
  },
  "event_feed": {
    "job_settings": [
      {
        "name": "periodic_purge",
        "disabled": false,
        "recurrence": "FREQ=DAILY;DTSTART=20191106T180243Z;INTERVAL=2",
        "purge_policies": {
          "opensearch": [
            {
              "policy_name": "feed",
              "older_than_days": 90,
              "disabled": false
            }
          ]
        }
      }
    ]
  },
  "services": {
    "job_settings": [
      {
        "name": "disconnected_services",
        "disabled": false,
        "recurrence": "FREQ=SECONDLY;DTSTART=20200612T182105Z;INTERVAL=60",
        "threshold": "5m"
      },
      {
        "name": "delete_disconnected_services",
        "disabled": false,
        "recurrence": "FREQ=SECONDLY;DTSTART=20200612T182105Z;INTERVAL=60",
        "threshold": "7d"
      }
    ]
  }
}

Configure the jobs by sending the JSON payload to the config endpoint.

Note

The data sent to the config endpoint intentionally follows a different format than the data returned from the status endpoint. You cannot read the data on the status endpoint, change some values, and feed the modified data back on the config endpoint.

Save the JSON file as config.json in the current working directory:

curl -s -S -k -H "api-token: $TOKEN" -X PUT --data "@config.json" https://automate.example.com/api/v0/data-lifecycle/config

If you wish to configure a specific endpoint, you can specify the job_settings for that data type and configure it using data types sub-resource. For example, if you want to configure compliance settings, create a smaller JSON payload:

{ "job_settings": [
    {
      "name": "periodic_purge",
      "disabled": false,
      "recurrence": "FREQ=DAILY;DTSTART=20191106T180323Z;INTERVAL=1",
      "purge_policies": {
        "opensearch": [
          {
            "policy_name": "compliance-reports",
            "older_than_days": 100,
            "disabled": false
          },
          {
            "policy_name": "compliance-scans",
            "older_than_days": 100,
            "disabled": false
          }
        ]
      }
    }
  ]
}

And update the specific endpoint using the compliance sub-resource:

curl -s -S -k -H "api-token: $TOKEN" -X PUT --data "@config.json" https://automate.example.com/api/v0/data-lifecycle/compliance/config

Job Settings

All jobs have the following options:

recurrence (string) - A recurrence rule that determines how often, at what interval, and when to initially start a scheduled job. Any valid recurrence rule as defined in section 4.3.10 of RFC 2445 is valid in this field.
disabled (bool) - True or false if this job is enabled.

Infra node lifecycle jobs have the following options:

threshold (string) - Setting that allows the user to use 1w style notation to denote how long before the Infra job triggers.

Purge jobs have the following options:

purge_polices (map) - Configures how old the corresponding data must be in the configured storage before purging occurs.
- opensearch (array) - An array of OpenSearch purge policies
  - disabled (bool) - True or false if this job is enabled.
  - policy_name (string) - The name of the purge policy you wish to update.
  - older_than_days (int) - The threshold for what qualifies for deletion.

Services jobs have the following options:

threshold (string) - Setting that allows the user to use 1m style notation to select the services the task operates on.

Infra Job Settings

The infra data type has four data lifecycle jobs: three are for node lifecycle and one is for purge job with two OpenSearch purge policies.

{ "job_settings": [
    { "name":"delete_nodes",
      "disabled": true,
      "recurrence": "FREQ=MINUTELY;DTSTART=20191106T180240Z;INTERVAL=15",
      "threshold": "365d"
    },
    { "name":"missing_nodes",
      "disabled": false,
      "recurrence": "FREQ=MINUTELY;DTSTART=20191106T180240Z;INTERVAL=15",
      "threshold": "1d"
    },
    { "name":"missing_nodes_for_deletion",
      "disabled": false,
      "recurrence": "FREQ=MINUTELY;DTSTART=20191106T180240Z;INTERVAL=15",
      "threshold": "30d"
    },
    { "name":"periodic_purge_timeseries",
      "disabled": false,
      "recurrence": "FREQ=DAILY;DTSTART=20191106T180240Z;INTERVAL=1",
      "purge_policies": {
        "opensearch": [
        {
          "policy_name": "actions",
          "older_than_days": 30,
          "disabled": false
        },
        {
          "policy_name": "converge-history",
          "older_than_days": 30,
          "disabled": false
        }
        ]
      }
    }
  ]
}

delete_nodes - How long a node can exist before deletion.
missing_nodes - How long between a node’s last check-in before marked as missing.
missing_nodes_for_deletion - How long a node can be missing before deletion
periodic_purge_timeseries - How often to run the purge job
- actions - Chef Infra Server actions
- converge-history - Chef Infra Client converge data

Compliance Job Settings

The compliance data type has one compliance purge job with two OpenSearch purge policies.

{ "job_settings": [
    {
      "name": "periodic_purge",
      "disabled": false,
      "recurrence": "FREQ=DAILY;DTSTART=20191106T180323Z;INTERVAL=1",
      "purge_policies": {
        "opensearch": [
          {
            "policy_name": "compliance-reports",
            "older_than_days": 100,
            "disabled": false
          },
          {
            "policy_name": "compliance-scans",
            "older_than_days": 100,
            "disabled": false
          }
        ]
      }
    }
  ]
}

periodic_purge - How often to run the purge job
- compliance-reports - Chef InSpec reports
- compliance-scans - Chef Compliance scans

Event Feed Job Settings

The event_feed data type has one event feed purge job with one OpenSearch purge policy.

{ "job_settings": [
    { "name": "periodic_purge",
      "disabled": false,
      "recurrence": "FREQ=DAILY;DTSTART=20191106T180243Z;INTERVAL=2",
      "purge_policies": {
        "opensearch": [
        {
          "policy_name": "feed",
          "older_than_days": 90,
          "disabled": false
        }
        ]
      }
    }
  ]
}

periodic_purge - How often to run the purge job
- feed - Queryable event feed

Services Job Settings

The services data type has two jobs, one to mark services as disconnected when the elapsed time since Chef Automate last received a health check message exceeds the threshold, and one to delete services when the time since the last health check exceeds the threshold.

{  "job_settings": [
    {
      "name": "disconnected_services",
      "disabled": false,
      "recurrence": "FREQ=SECONDLY;DTSTART=20200612T182105Z;INTERVAL=61",
      "threshold": "5m"
    },
    {
      "name": "delete_disconnected_services",
      "disabled": false,
      "recurrence": "FREQ=SECONDLY;DTSTART=20200612T182105Z;INTERVAL=61",
      "threshold": "7d"
    }
  ]
}

Run

As with status and configure, you can run data lifecycle jobs globally across all data or by using the data type sub-resource.

To run all data lifecycle jobs, run the following command:

curl -s -S -k -H "api-token: $TOKEN" -X POST https://automate.example.com/api/v0/data-lifecycle/run

To run jobs for a specific data type, you can make the request to the sub-resource:

curl -s -S -k -H "api-token: $TOKEN" -X POST https://automate.example.com/api/v0/data-lifecycle/infra/run

Swap infra for event-feed or compliance to run their corresponding jobs.