Data Knobs | Experiment thru orthogonal knobs


Orthogonal Data Knobs

In many domains, companies have to run thousands of experiments to find plausible candidates. Data scientist team has to code experiment. But a team can only manage 5-10 or may be 50 experiments. Runnig hundreds of experiments and comparing models become unmanageble. Moreover the experiments are hidden behind data scientist desk. When they leave and new resource join, whole thing start over again.

Problems in which large number of experiments need to run, should be manage thru dials or knobs. Using knobs data scientist can use their statistics, domain knowledge and valida/invalidate hypothesis. The outcome of experiments are recorded and even if results are not fruitful, it increase knowledge base.

Knobs for experimentation

We can define experimentation problem as - we are given a pool of preprocessing methods, feature transformation, ML algorithms and hyper parameters. Our goal is to select the combination of these data processing and ML methods that produce the best model result for a given data set.

The system should deal with the messiness and complexity of data, automate feature selection, select machine learning (ML) algorithm to train a model. The system does it in such a manner that is efficient and robust and considers constraints not only about accuracy but memory, compute time, data need etc

As data pattern will continue to change and you want data scientist to make decision - features, model paramters, we define the solution in which data scientists can interact and explore in a semi-automated manner using orthogonal dials or knobs.

Orthogonal knobs are dials which data scientist or domain expert can tune. They can choose different features or normalize feature in diffeent manner, they can choose different algorithm or different loss function

They are similar to model hyper paramter, But model hyper paramters are only for model algorithms. Model hyper paramters are model algorithm code dependant

Philosphy behind data knobs are these are parameters that are bottoms up generated based on data and how these data is used in process. These are super set of hyper paramters as it let you choose features, featue transformation, data sources, loss functional computations etc

Problem can be mathmatically represented as:

Model(M)

Input
  • Dataset {Xi, Yi}
  • Objective function J(f) to evaluate model performance
  • Constraints: Data scientist time, accuracy, etc

    Output
  • A trained model in the form y = f(x)
  • We can describe this in form of y = f(x; α)
  • Where set α   =  [ α ₀, α ₁, α ₂, …, αₙ] are parameters of model
  • Processing

    Consider a vector θ. It includes all possible operations on data (e.g. ingestion, transformation, feature engineering, modeling, hyperparameter tuning)

    θ   =  [ θ ₁, θ ₂, …, θ ₙ]

    Note: For simplicity, we can consider all θ n as simple element operations. In elaborate settings, trees and graphs can be used to represent dependencies/hierarchy of operations.
    Refined Problem Statement

    We can define problem statement as - we have a pool of preprocessing methods, feature transformation methods, ML algorithms, and hyperparameters. The goal is to select the combination of knobs that produce the best results. Goal is to identify these knobs so that one can use different settings when data pattern changes.

    Goal
  • Efficiently find set of elements in θ that produce the best α
  • Enable building Orthogonal knobs O
  • Steps to implement
  • Intelligently and efficiently determine a set of values in θ that will produce results.
  • Automate execution of θ vector to produce α and evaluate the result
  • Enable creating higher-level θs and build dials O[] control
  • Once we define the θ vector, it simplify modeling and data science work. Now data scientist and domain expert focus on validating hypothesis, they are not worried to ensure whether some made short cut in feature transformation or made a mistake

    You get following benefits

  • Ability to run large number of experiment. Most experiment do not equire code changes. you change knobs settings.
  • Ability to run reproducuible experiments
  • Ability to log experiment outcome in meaningful manner - set of knobs and outcome. If someone has run experiment before in organization,they will know it. Team will build on each other experiments
  • Differential privacy blog


    Know about differential privacy at Differential privacy blog

    Learn about algorithms - K-Anonymizatio, T-Closeness, L-diversisty, Delta presence

    Learn about frameowrk to apply Differential privacy using data knobs

    Dataknobs Blog

    10 Use Cases Built

    10 Use Cases Built By Dataknobs

    Dataknobs has developed a wide range of products and solutions powered by Generative AI (GenAI), Agent AI, and traditional AI to address diverse industry needs. These solutions span finance, healthcare, real estate, e-commerce, and more. Click on to see in-depth look at these use cases - Stocks Earning Call Analysis, Ecommerce Analysis with GenAI, Financial Planner AI Assistant, Kreatebots, Kreate Websites, Kreate CMS, Travel Agent Website, Real Estate Agent etc.

    AI Agent for Business Analysis

    Analyze reports, dashboard and determine To-do

    DataKnobs has built an AI Agent for structured data analysis that extracts meaningful insights from diverse datasets such as e-commerce metrics, sales/revenue reports, and sports scorecards. The agent ingests structured data from sources like CSV files, SQL databases, and APIs, automatically detecting schemas and relationships while standardizing formats. Using statistical analysis, anomaly detection, and AI-driven forecasting, it identifies trends, correlations, and outliers, providing insights such as sales fluctuations, revenue leaks, and performance metrics.

    AI Agent Tutorial

    Agent AI Tutorial

    Here are slides and AI Agent Tutorial. Agentic AI refers to AI systems that can autonomously perceive, reason, and take actions to achieve specific goals without constant human intervention. These AI agents use techniques like reinforcement learning, planning, and memory to adapt and make decisions in dynamic environments. They are commonly used in automation, robotics, virtual assistants, and decision-making systems.

    Build Dataproducts

    How Dataknobs help in building data products

    Building data products using Generative AI (GenAI) and Agentic AI enhances automation, intelligence, and adaptability in data-driven applications. GenAI can generate structured and unstructured data, automate content creation, enrich datasets, and synthesize insights from large volumes of information. This helps in scenarios such as automated report generation, anomaly detection, and predictive modeling.

    KreateHub

    Create New knowledge with Prompt library

    At its core, KreateHub is designed to enable creation of new data and the generation of insights from existing datasets. It acts as a bridge between raw data and meaningful outcomes, providing the tools necessary for organizations to experiment, analyze, and optimize their data processes.

    Build Budget Plan for GenAI

    CIO Guide to create GenAI Budget for 2025

    CIOs and CTOs can apply GenAI in IT Systems. The guide here describe scenarios and solutions for IT system, tech stack, GenAI cost and how to allocate budget. Once CIO and CTO can apply this to IT system, it can be extended for business use cases across company.

    RAG For Unstructred and Structred Data

    RAG Use Cases and Implementation

    Here are several value propositions for Retrieval-Augmented Generation (RAG) across different contexts: Unstructred Data, Structred Data, Guardrails.

    Why knobs matter

    Knobs are levers using which you manage output

    See Drivetrain appproach for building data product, AI product. It has 4 steps and levers are key to success. Knobs are abstract mechanism on input that you can control.

    Our Products

    KreateBots

  • Pre built front end that you can configure
  • Pre built Admin App to manage chatbot
  • Prompt management UI
  • Personalization app
  • Built in chat history
  • Feedback Loop
  • Available on - GCP,Azure,AWS.
  • Add RAG with using few lines of Code.
  • Add FAQ generation to chatbot
  • KreateWebsites

  • AI powered websites to domainte search
  • Premium Hosting - Azure, GCP,AWS
  • AI web designer
  • Agent to generate website
  • SEO powered by LLM
  • Content management system for GenAI
  • Buy as Saas Application or managed services
  • Available on Azure Marketplace too.
  • Kreate CMS

  • CMS for GenAI
  • Lineage for GenAI and Human created content
  • Track GenAI and Human Edited content
  • Trace pages that use content
  • Ability to delete GenAI content
  • Generate Slides

  • Give prompt to generate slides
  • Convert slides into webpages
  • Add SEO to slides webpages
  • Content Compass

  • Generate articles
  • Generate images
  • Generate related articles and images
  • Get suggestion what to write next