What are the three generations of metadata?

The three generations of metadata are catalog and discovery, governance and trust, and outcome intelligence for AI agents. The first generation helped humans find data. The second helped organizations govern data. The third helps AI systems decide which data to trust and act on for specific outcomes.

What is Gen 3 metadata?

Gen 3 metadata is outcome-connected metadata for AI agents and model training pipelines. It does not only describe data; it connects data regions to AI behavior, reliability, compliance, and model outcomes.

How does EKIP close the Gen 3 metadata gap?

EKIP closes the Gen 3 metadata gap by adding knob intelligence on top of existing catalog, governance, and quality signals. Knobs encode which data to use, which data to create or augment, and which controls must be enforced for reliable AI behavior.

How are knobs related to metadata?

Metadata describes data identity, lineage, ownership, quality, and governance. Knobs translate those signals into controllable, outcome-connected decisions that influence AI training, evaluation, and runtime behavior.

Why do AI agents need outcome-connected metadata?

AI agents do not browse catalogs or interpret governance documents like humans. They execute actions. They need metadata that already encodes trust, constraints, and expected outcome impact for a specific AI task.

AI Summary

DataKnobs describes the three generations of metadata: Gen 1 catalog and discovery for humans finding data, Gen 2 governance and trust for compliance and stewardship, and Gen 3 outcome intelligence for AI agents acting on data. EKIP, the Enterprise Knob Intelligence Platform, introduces knobs as controllable, outcome-connected metadata primitives that help AI systems decide which data to trust, use, synthesize, govern, and optimize.

CDO Perspective

The three generations of metadata

Humans found data.
Humans governed data.
AI agents act on data.

Every change in metadata consumer necessitates a distinct infrastructure, with the initial two generations already resolved. The third is not. The deficiency lies not in cataloging or governance, but in intelligence linked to outcomes: understanding which data influences AI behavior and intentionally managing that relationship.

Gen 1 — Catalog Era

"Where is the data?"

Analysts used to spend countless hours requesting table names from engineers until solutions like DataHub, Alation, and Atlan were developed to put an end to this inefficiency. And

Gen 2 — Governance Era

"Is this data compliant?"

GDPR, CCPA, and HIPAA integration necessitated lineage tracking, classification standards, and robust data stewardship within the platform, with metadata emerging as a

Gen 3 — AI Agent Era

"Which data should I trust for this outcome?"

AI agents do not spend time browsing catalogs or reading governance documents; instead, they focus on executing tasks. Therefore, they require metadata that is linked to outcomes, rather than just being descriptive.

Generation Analysis

What changed, and what each generation left unsolved

Every generation addressed a tangible issue and inadvertently created a blind spot that only became apparent with shifts in metadata consumption.

Generation 1

Catalog & Discovery

~2015 – 2019

Consumer

Data analysts & engineers

Core question

Where is the data stored? What information is in this table? Who developed this pipeline?

What was built

Search interfaces, schema harvesting, dataset descriptions, ownership records, popularity signals.

What it solved

Removed the bottleneck of relying on an engineer to provide the table name, making self-service data discovery a reality.

Gap it left

Locating data is not the same as relying on data. Analysts were able to find data tables, but lacked indicators of its quality, timeliness, or trustworthiness.

Generation 2

Governance & Trust

~2019 – 2023

Consumer

Compliance, legal & data stewards

Core question

Is the data compliant? Who is accountable? Can we track its movement and access history through an audit?

What was built

PII tagging, lineage graphs, retention rules, stewardship workflows, data quality monitors, and business glossaries are essential components for

What it solved

Compliance requirements of GDPR, CCPA, and HIPAA necessitate data teams to demonstrate data provenance and implement policies on a large scale.

Gap it left

Governance focuses on avoiding risks rather than optimizing outcomes, specifying what data is off-limits rather than guiding the selection of data to train a dependable model.

Generation 3

Outcome Intelligence

2024 – present

Consumer

AI agents & model training pipelines

Core question

What data regions generate consistent, precise, compliant AI behavior? What quantities are required for each region and in what circumstances?

What needs to be built

Density of causal signals links data regions to model results with annotated outcomes metadata and trust knobs for task-specific data selection.

Why it's different

AI agents do not rely on documentation or make judgments. They require metadata that can be immediately used for decisions, rather than just descriptions.

What EKIP provides

Knobs, as the Gen 3 metadata primitive, are controllable variables that are causally linked to AI outcomes, and are defined, governed, and manipulated through EKIP.

The Fundamental Shift

The consumer changed. The metadata didn't.

Metadata platforms were originally created with a human in mind - someone who can understand, interpret, and make decisions. However, AI agents do not have this capability. The systems made for human understanding do not inherently support machine functionality.

The Gen 3 Gap

What existing metadata can't give an AI agent

A Generation 1 or Generation 2 metadata platform can inform an AI agent about the existence of a table, its owner, and the presence of PII. However, it is unable to predict if the table will yield dependable results during training and the reasons behind it.

The missing primitive: outcome-connected metadata

Human metadata consumers use their judgment to determine the relevance of a low-quality freshness score for their specific use case. AI agents have no such judgment layer. Metadata is required to encode the decision in advance, not as a hint for human interpretation, but as a structured response to the query. "Is this data reliable for this specific AI task?"

The metadata platforms of Gen 1 and Gen 2 were not designed to address this issue. They focus on data identity and policy enforcement, rather than modeling the causal connection between data regions and AI results, which is the gap in Gen 3.

Gen 1 & 2 Metadata gives you

This table is owned by the Finance team

This column contains PII — GDPR applies

This dataset was last refreshed 6 hours ago

This pipeline has 3 upstream dependencies

"Churn" is defined as lost customer within 90 days

Gen 3 — what AI agents also need

The accuracy of complaint detection is causally linked to these three data regions.

Using this column in training reduces regulatory compliance by 18%

Below 4-hour freshness, model confidence in this task degrades measurably

Removing this upstream dependency improves low-resource performance by 2.3×

The churn definition with the highest model accuracy involves a 60-day window, not 90.

EKIP's Role

How EKIP closes the Gen 3 gap

EKIP does not supersede the metadata infrastructure of Gen 1 or Gen 2; rather, it enhances it by leveraging existing platforms' catalog, governance, and quality signals while also incorporating the essential causal outcome layer required by AI agents.

Selection Knobs

Which data regions to use

Selection Knobs determine the subsets of a dataset that are causally linked to reliable outcomes for a particular task by utilizing catalog and quality signals from Gen 1/2 platforms. They go beyond simply identifying the presence and freshness of data to ensuring that the data leads to accurate performance in the specified task.

Creation Knobs

Which data to synthesize or augment

Creation Knobs determine what data needs to be generated when there is limited existing data in critical areas such as low-resource languages, rare regulatory scenarios, and edge-case behaviors. These knobs pinpoint the frontier, where the importance of the outcome is high compared to the amount of available information.

Control Knobs

What conditions must hold for AI to act on data

Control Knobs convert governance metadata, such as PII tags, GDPR classifications, and retention rules, into constraints for training. Second-generation platforms assess risk and ensure compliance during model training, bridging the gap between policy and action.

Data Flywheel

Metadata that improves with every training cycle

The Gen 1/2 metadata remains mostly unchanged, focusing on defining data rather than its performance. EKIP creates a feedback loop where outcome signals inform knob definitions as models are trained and evaluated, leading to more accurate metadata over time, not just more comprehensive.

AI Summary

Humans found data. Humans governed data. AI agents act on data.

What changed, and what each generation left unsolved

The consumer changed. The metadata didn't.

What existing metadata can't give an AI agent

How EKIP closes the Gen 3 gap

Humans found data.
Humans governed data.
AI agents act on data.