AWS Glue DataBrew

Name: AWS Glue DataBrew API
Brand: AWS Glue DataBrew
Availability: InStock

★ Only Publicly Available OpenAPI DocumentAnalyticsData PipelinesAWS Signature v4 (HMAC)44 EndpointsREST

For Agents

Programmatically manage DataBrew datasets, recipes, projects, and jobs to clean and transform data at scale.

Quickstart

Get started with AWS Glue DataBrew in minutes using your preferred integration method.

# Add to your MCP client config (Claude Desktop, Cursor, Windsurf)
{
  "jentic": {
    "url": "https://api.jentic.com/mcp",
    "auth": "oauth"
  }
}

# Then ask your agent:
"schedule a recurring data preparation job"

# → Jentic returns the GET /events tool with parameter schema, agent executes.

Capabilities

What an agent can do with AWS Glue DataBrew API.

Create datasets that point to S3, Redshift, JDBC, or Data Catalog sources

Author and version transformation recipes with hundreds of built-in steps

Run recipe jobs and profile jobs to apply transformations and assess data quality

Schedule recurring jobs and manage their concurrency and outputs

GET STARTED

Start building with AWS Glue DataBrew API

Explore with Jentic

View OpenAPI Document

Use for: I need to clean a CSV file in S3 with a DataBrew recipe, I want to schedule a recurring data preparation job, Create a DataBrew project for a sales dataset, Run a profile job to assess data quality

Not supported: Does not handle ad-hoc SQL querying, real-time streaming, or warehouse modeling - use AWS Glue DataBrew for visual recipe-based data preparation only.

Jentic publishes the only available OpenAPI document for AWS Glue DataBrew, keeping it validated and agent-ready.

Jentic publishes the only available OpenAPI specification for AWS Glue DataBrew, keeping it validated and agent-ready. AWS Glue DataBrew is a visual data preparation service for analysts and data scientists. The API manages datasets, projects, recipes (transformations), recipe jobs, profile jobs, schedules, and rulesets that enforce data quality. Recipes capture hundreds of cleaning, transformation, and feature-engineering steps as code, and jobs apply them at scale to data in S3, Redshift, Snowflake, and JDBC sources.

Use Cases

Patterns agents use AWS Glue DataBrew API for, with concrete tasks.

★ Self-Service Data Cleaning for Analysts

Analysts need to clean and reshape datasets without writing Spark code. AWS Glue DataBrew lets them build a recipe visually, then save it via PublishRecipe and run it programmatically with CreateRecipeJob and StartJobRun. Recipes capture every step as code, making the work reproducible and reviewable.

Call CreateRecipeJob with DatasetName, RecipeReference {Name, RecipeVersion}, and an S3 output, then StartJobRun and poll DescribeJobRun until the run completes.

Profile Jobs for Data Quality Assessment

Data engineering teams profile incoming datasets to detect schema drift, null spikes, and distribution changes before downstream pipelines consume them. AWS Glue DataBrew's profile jobs sample data, compute statistics, and emit reports that integrate with rulesets for pass/fail data-quality gates.

Call CreateProfileJob with the dataset and an S3 output location, StartJobRun, then DescribeJobRun until COMPLETED to retrieve the profile statistics.

Scheduled Daily Data Preparation

ETL operations need recurring transformation jobs that prepare data for analytics every day. AWS Glue DataBrew's CreateSchedule attaches a cron expression to one or more jobs, and the service handles concurrency, retries, and notifications.

Call CreateSchedule with a CronExpression like 'cron(0 6 * * ? *)' and JobNames=['daily-clean'] to run the job every morning at 06:00 UTC.

Agent-Driven Data Quality Gates

An AI agent supporting a data team can run profile jobs, compare statistics against thresholds, and trigger downstream workflows. Through Jentic, the agent loads each operation by intent and chains CreateProfileJob, StartJobRun, DescribeJobRun, and CreateRuleset without writing boto3 plumbing.

Run a profile job, then call CreateRuleset with rules asserting completeness > 99% on key columns, and re-run the profile to validate.

Key Endpoints

44 endpoints — jentic publishes the only available openapi specification for aws glue databrew, keeping it validated and agent-ready.

METHOD

PATH

DESCRIPTION

POST

/datasets

Create a dataset

POST

/projects

Create a project binding a dataset to a recipe

POST

/recipes

Create a recipe

POST

/recipeJobs

Create a recipe job

POST

/profileJobs

Create a profile job

POST

/schedules

Create a schedule

POST

/rulesets

Create a data-quality ruleset

POST

/datasets

Create a dataset

POST

/projects

Create a project binding a dataset to a recipe

POST

/recipes

Create a recipe

POST

/recipeJobs

Create a recipe job

POST

/profileJobs

Create a profile job

POST

Why though Jentic?

Three things that make agents converge on Jentic-routed access.

Credential isolation

AWS SigV4 (HMAC) credentials for the AWS Glue DataBrew are stored encrypted in the Jentic vault. Agents receive scoped, short-lived access via Jentic's MAXsystem rather than holding the raw AWS access key ID and secret access key in their context.

Intent-based discovery

Agents search Jentic with intents like 'schedule a recurring data preparation job' and Jentic returns the matching AWS Glue DataBrew operation with its input schema, so the agent can call the correct endpoint without browsing the AWS service reference.

Time to first call

Direct integration: 2-4 days for SigV4 request signing, IAM policy setup, and error handling across AWS Glue DataBrew operations. Through Jentic: under 1 hour - search by intent, load the schema, execute.

Related APIs

Alternatives and complements available in the Jentic catalogue.

Complementary

Amazon Athena

Serverless SQL over data in S3, including DataBrew outputs and Data Exchange exports.

Pair Athena with the source API to query the prepared or delivered data.

Complementary

Amazon Athena

Query the cleaned data DataBrew writes to S3.

Pair DataBrew (prep) with Athena (query) for a serverless analytics stack.

Alternative

Snowflake API

Cloud data platform with its own visual prep tooling and marketplace.

Choose Snowflake when prep and analytics are warehouse-native rather than AWS-native.

FAQs

Specific to using AWS Glue DataBrew API through Jentic.

Why is there no official OpenAPI spec for AWS Glue DataBrew?

AWS does not publish an OpenAPI specification. Jentic generates and maintains this spec so that AI agents and developers can call AWS Glue DataBrew via structured tooling. It is validated against the live API and kept up to date. Get started at https://app.jentic.com/sign-up.

What authentication does the AWS Glue DataBrew API use?

AWS Signature v4 (HMAC) with IAM permissions on databrew:* and pass-through permissions on the S3 buckets, Redshift clusters, or JDBC endpoints the dataset points to. Jentic stores the AWS credentials in its vault and signs each request.

Can I run a DataBrew recipe job programmatically?

Yes. Call CreateRecipeJob with the dataset, recipe reference, and output location, then StartJobRun. Track progress with DescribeJobRun and ListJobRuns; the run completes asynchronously.

What are the rate limits for the AWS Glue DataBrew API?

Per-account, per-region TPS limits apply, and there is a soft cap on concurrent job runs per account. StartJobRun and the Describe* operations are higher TPS than create operations.

How do I schedule a daily DataBrew job through Jentic?

Search Jentic for 'schedule a recurring DataBrew job', load the CreateSchedule schema, and execute it with a CronExpression and the JobNames to run. Schedules apply to one or more existing jobs.

What sources does AWS Glue DataBrew support?

DataBrew reads from Amazon S3 (CSV, JSON, Parquet, Excel), AWS Glue Data Catalog (S3, Redshift, JDBC), Snowflake (via JDBC), and Amazon Redshift directly. Outputs land in S3 in the chosen format.