Data Knobs | Experiment thru orthogonal knobs


Orthogonal Data Knobs

In many domains, companies have to run thousands of experiments to find plausible candidates. Data scientist team has to code experiment. But a team can only manage 5-10 or may be 50 experiments. Runnig hundreds of experiments and comparing models become unmanageble. Moreover the experiments are hidden behind data scientist desk. When they leave and new resource join, whole thing start over again.

Problems in which large number of experiments need to run, should be manage thru dials or knobs. Using knobs data scientist can use their statistics, domain knowledge and valida/invalidate hypothesis. The outcome of experiments are recorded and even if results are not fruitful, it increase knowledge base.

Knobs for experimentation

We can define experimentation problem as - we are given a pool of preprocessing methods, feature transformation, ML algorithms and hyper parameters. Our goal is to select the combination of these data processing and ML methods that produce the best model result for a given data set.

The system should deal with the messiness and complexity of data, automate feature selection, select machine learning (ML) algorithm to train a model. The system does it in such a manner that is efficient and robust and considers constraints not only about accuracy but memory, compute time, data need etc

As data pattern will continue to change and you want data scientist to make decision - features, model paramters, we define the solution in which data scientists can interact and explore in a semi-automated manner using orthogonal dials or knobs.

Orthogonal knobs are dials which data scientist or domain expert can tune. They can choose different features or normalize feature in diffeent manner, they can choose different algorithm or different loss function

They are similar to model hyper paramter, But model hyper paramters are only for model algorithms. Model hyper paramters are model algorithm code dependant

Philosphy behind data knobs are these are parameters that are bottoms up generated based on data and how these data is used in process. These are super set of hyper paramters as it let you choose features, featue transformation, data sources, loss functional computations etc

Problem can be mathmatically represented as:

Model(M)

Input
  • Dataset {Xi, Yi}
  • Objective function J(f) to evaluate model performance
  • Constraints: Data scientist time, accuracy, etc

    Output
  • A trained model in the form y = f(x)
  • We can describe this in form of y = f(x; α)
  • Where set α   =  [ α ₀, α ₁, α ₂, …, αₙ] are parameters of model
  • Processing

    Consider a vector θ. It includes all possible operations on data (e.g. ingestion, transformation, feature engineering, modeling, hyperparameter tuning)

    θ   =  [ θ ₁, θ ₂, …, θ ₙ]

    Note: For simplicity, we can consider all θ n as simple element operations. In elaborate settings, trees and graphs can be used to represent dependencies/hierarchy of operations.
    Refined Problem Statement

    We can define problem statement as - we have a pool of preprocessing methods, feature transformation methods, ML algorithms, and hyperparameters. The goal is to select the combination of knobs that produce the best results. Goal is to identify these knobs so that one can use different settings when data pattern changes.

    Goal
  • Efficiently find set of elements in θ that produce the best α
  • Enable building Orthogonal knobs O
  • Steps to implement
  • Intelligently and efficiently determine a set of values in θ that will produce results.
  • Automate execution of θ vector to produce α and evaluate the result
  • Enable creating higher-level θs and build dials O[] control
  • Once we define the θ vector, it simplify modeling and data science work. Now data scientist and domain expert focus on validating hypothesis, they are not worried to ensure whether some made short cut in feature transformation or made a mistake

    You get following benefits

  • Ability to run large number of experiment. Most experiment do not equire code changes. you change knobs settings.
  • Ability to run reproducuible experiments
  • Ability to log experiment outcome in meaningful manner - set of knobs and outcome. If someone has run experiment before in organization,they will know it. Team will build on each other experiments
  • Infographic: The Three Faces of Orthogonality in AI

    The Three Faces of

    Orthogonality

    In AI, one word has three powerful meanings. It's a strategy for engineers, a tool for researchers, and a warning for philosophers. Understanding these "orthogonal dials" is the key to building, managing, and reasoning about intelligent systems.

    🛠️

    The Engineer's Dial

    A pragmatic strategy for debugging complex models. It provides independent "dials" to fix specific problems, making development systematic and efficient.

    🔬

    The Researcher's Toolkit

    A mathematical instrument for building robust models. It uses linear algebra to create stable, interpretable, and fair model architectures.

    🧠

    The Philosopher's Thesis

    A foundational concept in AI safety. It posits that an AI's intelligence level is independent of its ultimate goals, creating the alignment problem.

    1. The Engineer's Dial

    This is a step-by-step flowchart for debugging supervised learning models. By tackling problems in sequence, you can apply the right "knob" without creating side effects.

    PROBLEM

    Poor performance on training data (High Bias)

    ORTHOGONAL KNOBS

    • Use a bigger neural network
    • Switch to a better optimization algorithm (e.g., Adam)

    PROBLEM

    Poor performance on dev set (High Variance)

    ORTHOGONAL KNOBS

    • Apply regularization (L2, dropout)
    • Acquire a larger training set

    PROBLEM

    Poor performance on test set (Dev Set Overfit)

    ORTHOGONAL KNOBS

    • Acquire a larger development set

    PROBLEM

    Poor real-world performance (Mismatched Data/Metric)

    ORTHOGONAL KNOBS

    • Change the dev/test set to reflect reality
    • Change the cost function or evaluation metric

    2. The Researcher's Toolkit

    This is about using orthogonality as a mathematical tool inside models. Different techniques offer trade-offs between computational cost, stability, and how strictly they enforce independence.

    This radar chart compares techniques for orthogonalizing neural network weights. A larger area indicates a more robust but often more expensive method.

    3. The Philosopher's Thesis

    The thesis states that an agent's intelligence is independent of its final goals. A smarter AI won't automatically be a "good" AI; its values must be explicitly designed.

    Intelligence →
    Goals ↑

    Superintelligent
    Paperclip Maximizer

    Human-Aligned
    Superintelligence

    Apathetic
    Chess AI

    Limited
    Harmful Agent

    This illustrates the core idea: any level of intelligence (x-axis) can be paired with any type of goal (y-axis). High intelligence doesn't prevent a harmful or bizarre objective.

    Frontiers of Application

    Today, orthogonality is a critical tool being used to solve cutting-edge problems in generative AI, causal inference, and large language models.

    Preserving Knowledge (OFT)

    Orthogonal Finetuning (OFT) adapts large models to new tasks by rotating their weights, not changing them. This preserves their vast pretrained knowledge and prevents "catastrophic forgetting."

    Finding True Cause (DML)

    Double Machine Learning (DML) uses orthogonalization to statistically remove the influence of confounding variables, allowing researchers to estimate the true causal effect of an intervention from messy, real-world data.

    Steering LLMs (Self-Control)

    New frameworks allow for real-time control of LLMs during inference. By calculating gradients in the latent space, we can create orthogonal "dials" to steer the model towards truthfulness or a specific tone without retraining.

    Differential privacy blog


    Know about differential privacy at Differential privacy blog

    Learn about algorithms - K-Anonymizatio, T-Closeness, L-diversisty, Delta presence

    Learn about frameowrk to apply Differential privacy using data knobs

    Dataknobs Blog

    Showcase: 10 Production Use Cases

    10 Use Cases Built By Dataknobs

    Dataknobs delivers real, shipped outcomes across finance, healthcare, real estate, e‑commerce, and more—powered by GenAI, Agentic workflows, and classic ML. Explore detailed walk‑throughs of projects like Earnings Call Insights, E‑commerce Analytics with GenAI, Financial Planner AI, Kreatebots, Kreate Websites, Kreate CMS, Travel Agent Website, and Real Estate Agent tools.

    Data Product Approach

    Why Build Data Products

    Companies should build data products because they transform raw data into actionable, reusable assets that directly drive business outcomes. Instead of treating data as a byproduct of operations, a data product approach emphasizes usability, governance, and value creation. Ultimately, they turn data from a cost center into a growth engine, unlocking compounding value across every function of the enterprise.

    AI Agent for Business Analysis

    Analyze reports, dashboard and determine To-do

    Our structured‑data analysis agent connects to CSVs, SQL, and APIs; auto‑detects schemas; and standardizes formats. It finds trends, anomalies, correlations, and revenue opportunities using statistics, heuristics, and LLM reasoning. The output is crisp: prioritized insights and an action‑ready To‑Do list for operators and analysts.

    AI Agent Tutorial

    Agent AI Tutorial

    Dive into slides and a hands‑on guide to agentic systems—perception, planning, memory, and action. Learn how agents coordinate tools, adapt via feedback, and make decisions in dynamic environments for automation, assistants, and robotics.

    Build Data Products

    How Dataknobs help in building data products

    GenAI and Agentic AI accelerate data‑product development: generate synthetic data, enrich datasets, summarize and reason over large corpora, and automate reporting. Use them to detect anomalies, surface drivers, and power predictive models—while keeping humans in the loop for control and safety.

    KreateHub

    Create New knowledge with Prompt library

    KreateHub turns prompts into reusable knowledge assets—experiment, track variants, and compose chains that transform raw data into decisions. It’s your workspace for rapid iteration, governance, and measurable impact.

    Build Budget Plan for GenAI

    CIO Guide to create GenAI Budget for 2025

    A pragmatic playbook for CIOs/CTOs: scope the stack, forecast usage, model costs, and sequence investments across infra, safety, and business use cases. Apply the framework to IT first, then scale to enterprise functions.

    RAG for Unstructured & Structured Data

    RAG Use Cases and Implementation

    Explore practical RAG patterns: unstructured corpora, tabular/SQL retrieval, and guardrails for accuracy and compliance. Implementation notes included.

    Why knobs matter

    Knobs are levers using which you manage output

    The Drivetrain approach frames product building in four steps; “knobs” are the controllable inputs that move outcomes. Design clear metrics, expose the right levers, and iterate—control leads to compounding impact.

    Our Products

    KreateBots

  • Ready-to-use front-end—configure in minutes
  • Admin dashboard for full chatbot control
  • Integrated prompt management system
  • Personalization and memory modules
  • Conversation tracking and analytics
  • Continuous feedback learning loop
  • Deploy across GCP, Azure, or AWS
  • Add Retrieval-Augmented Generation (RAG) in seconds
  • Auto-generate FAQs for user queries
  • KreateWebsites

  • Build SEO-optimized sites powered by LLMs
  • Host on Azure, GCP, or AWS
  • Intelligent AI website designer
  • Agent-assisted website generation
  • End-to-end content automation
  • Content management for AI-driven websites
  • Available as SaaS or managed solution
  • Listed on Azure Marketplace
  • Kreate CMS

  • Purpose-built CMS for AI content pipelines
  • Track provenance for AI vs human edits
  • Monitor lineage and version history
  • Identify all pages using specific content
  • Remove or update AI-generated assets safely
  • Generate Slides

  • Instant slide decks from natural language prompts
  • Convert slides into interactive webpages
  • Optimize presentation pages for SEO
  • Content Compass

  • Auto-generate articles and blogs
  • Create and embed matching visuals
  • Link related topics for SEO ranking
  • AI-driven topic and content recommendations