For Agents
Programmatically manage DataBrew datasets, recipes, projects, and jobs to clean and transform data at scale.
Get started with AWS Glue DataBrew in minutes using your preferred integration method.
# Add to your MCP client config (Claude Desktop, Cursor, Windsurf)
{
"jentic": {
"url": "https://api.jentic.com/mcp",
"auth": "oauth"
}
}
# Then ask your agent:
"schedule a recurring data preparation job"
# → Jentic returns the GET /events tool with parameter schema, agent executes.What an agent can do with AWS Glue DataBrew API.
Create datasets that point to S3, Redshift, JDBC, or Data Catalog sources
Author and version transformation recipes with hundreds of built-in steps
Run recipe jobs and profile jobs to apply transformations and assess data quality
Schedule recurring jobs and manage their concurrency and outputs
GET STARTED
Use for: I need to clean a CSV file in S3 with a DataBrew recipe, I want to schedule a recurring data preparation job, Create a DataBrew project for a sales dataset, Run a profile job to assess data quality
Not supported: Does not handle ad-hoc SQL querying, real-time streaming, or warehouse modeling - use AWS Glue DataBrew for visual recipe-based data preparation only.
Jentic publishes the only available OpenAPI document for AWS Glue DataBrew, keeping it validated and agent-ready.
Jentic publishes the only available OpenAPI specification for AWS Glue DataBrew, keeping it validated and agent-ready. AWS Glue DataBrew is a visual data preparation service for analysts and data scientists. The API manages datasets, projects, recipes (transformations), recipe jobs, profile jobs, schedules, and rulesets that enforce data quality. Recipes capture hundreds of cleaning, transformation, and feature-engineering steps as code, and jobs apply them at scale to data in S3, Redshift, Snowflake, and JDBC sources.
Validate data with rulesets that codify data-quality assertions
Tag, list, and delete projects, datasets, jobs, and recipes for housekeeping
Patterns agents use AWS Glue DataBrew API for, with concrete tasks.
★ Self-Service Data Cleaning for Analysts
Analysts need to clean and reshape datasets without writing Spark code. AWS Glue DataBrew lets them build a recipe visually, then save it via PublishRecipe and run it programmatically with CreateRecipeJob and StartJobRun. Recipes capture every step as code, making the work reproducible and reviewable.
Call CreateRecipeJob with DatasetName, RecipeReference {Name, RecipeVersion}, and an S3 output, then StartJobRun and poll DescribeJobRun until the run completes.
Profile Jobs for Data Quality Assessment
Data engineering teams profile incoming datasets to detect schema drift, null spikes, and distribution changes before downstream pipelines consume them. AWS Glue DataBrew's profile jobs sample data, compute statistics, and emit reports that integrate with rulesets for pass/fail data-quality gates.
Call CreateProfileJob with the dataset and an S3 output location, StartJobRun, then DescribeJobRun until COMPLETED to retrieve the profile statistics.
Scheduled Daily Data Preparation
ETL operations need recurring transformation jobs that prepare data for analytics every day. AWS Glue DataBrew's CreateSchedule attaches a cron expression to one or more jobs, and the service handles concurrency, retries, and notifications.
Call CreateSchedule with a CronExpression like 'cron(0 6 * * ? *)' and JobNames=['daily-clean'] to run the job every morning at 06:00 UTC.
Agent-Driven Data Quality Gates
An AI agent supporting a data team can run profile jobs, compare statistics against thresholds, and trigger downstream workflows. Through Jentic, the agent loads each operation by intent and chains CreateProfileJob, StartJobRun, DescribeJobRun, and CreateRuleset without writing boto3 plumbing.
Run a profile job, then call CreateRuleset with rules asserting completeness > 99% on key columns, and re-run the profile to validate.
44 endpoints — jentic publishes the only available openapi specification for aws glue databrew, keeping it validated and agent-ready.
METHOD
PATH
DESCRIPTION
/datasets
Create a dataset
/projects
Create a project binding a dataset to a recipe
/recipes
Create a recipe
/recipeJobs
Create a recipe job
/profileJobs
Create a profile job
/schedules
Create a schedule
/rulesets
Create a data-quality ruleset
/datasets
Create a dataset
/projects
Create a project binding a dataset to a recipe
/recipes
Create a recipe
/recipeJobs
Create a recipe job
/profileJobs
Create a profile job
Three things that make agents converge on Jentic-routed access.
Credential isolation
AWS SigV4 (HMAC) credentials for the AWS Glue DataBrew are stored encrypted in the Jentic vault. Agents receive scoped, short-lived access via Jentic's MAXsystem rather than holding the raw AWS access key ID and secret access key in their context.
Intent-based discovery
Agents search Jentic with intents like 'schedule a recurring data preparation job' and Jentic returns the matching AWS Glue DataBrew operation with its input schema, so the agent can call the correct endpoint without browsing the AWS service reference.
Time to first call
Direct integration: 2-4 days for SigV4 request signing, IAM policy setup, and error handling across AWS Glue DataBrew operations. Through Jentic: under 1 hour - search by intent, load the schema, execute.
Alternatives and complements available in the Jentic catalogue.
Amazon Athena
Serverless SQL over data in S3, including DataBrew outputs and Data Exchange exports.
Pair Athena with the source API to query the prepared or delivered data.
Amazon Athena
Query the cleaned data DataBrew writes to S3.
Pair DataBrew (prep) with Athena (query) for a serverless analytics stack.
Snowflake API
Cloud data platform with its own visual prep tooling and marketplace.
Choose Snowflake when prep and analytics are warehouse-native rather than AWS-native.
Specific to using AWS Glue DataBrew API through Jentic.
Why is there no official OpenAPI spec for AWS Glue DataBrew?
AWS does not publish an OpenAPI specification. Jentic generates and maintains this spec so that AI agents and developers can call AWS Glue DataBrew via structured tooling. It is validated against the live API and kept up to date. Get started at https://app.jentic.com/sign-up.
What authentication does the AWS Glue DataBrew API use?
AWS Signature v4 (HMAC) with IAM permissions on databrew:* and pass-through permissions on the S3 buckets, Redshift clusters, or JDBC endpoints the dataset points to. Jentic stores the AWS credentials in its vault and signs each request.
Can I run a DataBrew recipe job programmatically?
Yes. Call CreateRecipeJob with the dataset, recipe reference, and output location, then StartJobRun. Track progress with DescribeJobRun and ListJobRuns; the run completes asynchronously.
What are the rate limits for the AWS Glue DataBrew API?
Per-account, per-region TPS limits apply, and there is a soft cap on concurrent job runs per account. StartJobRun and the Describe* operations are higher TPS than create operations.
How do I schedule a daily DataBrew job through Jentic?
Search Jentic for 'schedule a recurring DataBrew job', load the CreateSchedule schema, and execute it with a CronExpression and the JobNames to run. Schedules apply to one or more existing jobs.
What sources does AWS Glue DataBrew support?
DataBrew reads from Amazon S3 (CSV, JSON, Parquet, Excel), AWS Glue Data Catalog (S3, Redshift, JDBC), Snowflake (via JDBC), and Amazon Redshift directly. Outputs land in S3 in the chosen format.
/schedules
Create a schedule
/rulesets
Create a data-quality ruleset