Skip to main content

aws_glue_crawler Resource

Use the aws_glue_crawler InSpec audit resource to test properties of a single AWS Glue crawler.

The AWS::Glue::Crawler resource specifies an AWS Glue crawler.

For additional information, including details on parameters and properties, see the AWS documentation on Glue Crawler.

Install

This resource is available in the Chef InSpec AWS resource pack.

For information on configuring your AWS environment for Chef InSpec and creating an InSpec profile that uses the InSpec AWS resource pack, see the Chef InSpec documentation on the AWS cloud platform.

Syntax

Ensure that a crawler name exists.

describe aws_glue_crawler(name: 'CRAWLER_NAME') do
  it { should exist }
end

Parameters

name (required)

The name of the crawler.

Properties

name
The name of the crawler.
role
The ARN of an IAM role that’s used to access customer resources, such as Amazon Simple Storage Service (Amazon S3) data.
target
A collection of targets to crawl.
database_name
The name of the database in which the crawler’s output is stored.
description
A description of the crawler.
classifier
A list of UTF-8 strings that specify the custom classifiers that are associated with the crawler.
recrawl_policy
A policy that specifies whether to crawl the entire dataset again, or to crawl only folders that were added since the last crawler run.
schema_change_policy
The policy that specifies update and delete behaviors for the crawler.
lineage_configuration
A configuration that specifies whether data lineage is enabled for the crawler.
state
Whether the crawler is running, or whether a run is pending.
table_prefix
The prefix added to the names of tables that are created.
schedule
For scheduled crawlers, the schedule when the crawler runs.
crawl_elapsed_time
If the crawler is running, contains the total time elapsed since the last crawl began.
creation_time
The time that the crawler was created.
last_updated
The time that the crawler was last updated.
last_crawl
The status of the last crawl, and potentially error information if an error occurred.
version
The version of the crawler.
configuration
Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler’s behavior.
crawler_security_configuration
The name of the SecurityConfiguration structure to be used by this crawler.

Examples

Ensure a crawler name is available.

describe aws_glue_crawler(name: 'CRAWLER_NAME') do
  its('name') { should eq 'CRAWLER_NAME' }
end

Verify the database name in the crawler.

describe aws_glue_crawler(name: 'CRAWLER_NAME') do
    its('database_name') { should eq 'CRAWLER_DATABASE_NAME' }
end

Matchers

For a full list of available matchers, see our Universal Matchers page.

This resource has the following special matchers.

exist

Use should to test that the entity exists.

describe aws_glue_crawler(name: 'crawler_name') do
  it { should exist }
end

Use should_not to test the entity does not exist.

describe aws_glue_crawler(name: 'dummy') do
  it { should_not exist }
end

be_available

Use should to check if the work_group name is available.

describe aws_glue_crawler(name: 'crawler_name') do
  it { should be_available }
end

AWS Permissions

Your Principal will need the EC2:Client:GetCrawlerResponse action with Effect set to Allow.

Edit this page on GitHub

Thank you for your feedback!

×