Data Knobs | Experiment thru orthogonal knobs
Orthogonal Data KnobsIn many domains, companies have to run thousands of experiments to find plausible candidates. Data scientist team has to code experiment. But a team can only manage 5-10 or may be 50 experiments. Runnig hundreds of experiments and comparing models become unmanageble. Moreover the experiments are hidden behind data scientist desk. When they leave and new resource join, whole thing start over again. Problems in which large number of experiments need to run, should be manage thru dials or knobs. Using knobs data scientist can use their statistics, domain knowledge and valida/invalidate hypothesis. The outcome of experiments are recorded and even if results are not fruitful, it increase knowledge base. Knobs for experimentationWe can define experimentation problem as - we are given a pool of preprocessing methods, feature transformation, ML algorithms and hyper parameters. Our goal is to select the combination of these data processing and ML methods that produce the best model result for a given data set. The system should deal with the messiness and complexity of data, automate feature selection, select machine learning (ML) algorithm to train a model. The system does it in such a manner that is efficient and robust and considers constraints not only about accuracy but memory, compute time, data need etc As data pattern will continue to change and you want data scientist to make decision - features, model paramters, we define the solution in which data scientists can interact and explore in a semi-automated manner using orthogonal dials or knobs. Orthogonal knobs are dials which data scientist or domain expert can tune. They can choose different features or normalize feature in diffeent manner, they can choose different algorithm or different loss function
They are similar to model hyper paramter, But model hyper paramters are only for model algorithms. Model hyper paramters are model algorithm code dependant Philosphy behind data knobs are these are parameters that are bottoms up generated based on data and how these data is used in process. These are super set of hyper paramters as it let you choose features, featue transformation, data sources, loss functional computations etc
Problem can be mathmatically represented as: Model(M)InputConstraints: Data scientist time, accuracy, etc OutputConsider a vector θ. It includes all possible operations on data (e.g. ingestion, transformation, feature engineering, modeling, hyperparameter tuning)
θ = [ θ ₁, θ ₂, …, θ ₙ] Note: For simplicity, we can consider all θ n as simple element operations. In elaborate settings, trees and graphs can be used to represent dependencies/hierarchy of operations.Refined Problem StatementWe can define problem statement as - we have a pool of preprocessing methods, feature transformation methods, ML algorithms, and hyperparameters. The goal is to select the combination of knobs that produce the best results. Goal is to identify these knobs so that one can use different settings when data pattern changes. GoalOnce we define the θ vector, it simplify modeling and data science work. Now data scientist and domain expert focus on validating hypothesis, they are not worried to ensure whether some made short cut in feature transformation or made a mistake
You get following benefits |
The Three Faces ofOrthogonalityIn AI, one word has three powerful meanings. It's a strategy for engineers, a tool for researchers, and a warning for philosophers. Understanding these "orthogonal dials" is the key to building, managing, and reasoning about intelligent systems.
🛠️
The Engineer's DialA pragmatic strategy for debugging complex models. It provides independent "dials" to fix specific problems, making development systematic and efficient.
🔬
The Researcher's ToolkitA mathematical instrument for building robust models. It uses linear algebra to create stable, interpretable, and fair model architectures.
🧠
The Philosopher's ThesisA foundational concept in AI safety. It posits that an AI's intelligence level is independent of its ultimate goals, creating the alignment problem. 1. The Engineer's DialThis is a step-by-step flowchart for debugging supervised learning models. By tackling problems in sequence, you can apply the right "knob" without creating side effects. PROBLEM Poor performance on training data (High Bias)ORTHOGONAL KNOBS
↓
PROBLEM Poor performance on dev set (High Variance)ORTHOGONAL KNOBS
↓
PROBLEM Poor performance on test set (Dev Set Overfit)ORTHOGONAL KNOBS
↓
PROBLEM Poor real-world performance (Mismatched Data/Metric)ORTHOGONAL KNOBS
2. The Researcher's ToolkitThis is about using orthogonality as a mathematical tool inside models. Different techniques offer trade-offs between computational cost, stability, and how strictly they enforce independence. This radar chart compares techniques for orthogonalizing neural network weights. A larger area indicates a more robust but often more expensive method. 3. The Philosopher's ThesisThe thesis states that an agent's intelligence is independent of its final goals. A smarter AI won't automatically be a "good" AI; its values must be explicitly designed. Intelligence →
Goals ↑
Superintelligent Human-Aligned Apathetic Limited This illustrates the core idea: any level of intelligence (x-axis) can be paired with any type of goal (y-axis). High intelligence doesn't prevent a harmful or bizarre objective. Frontiers of ApplicationToday, orthogonality is a critical tool being used to solve cutting-edge problems in generative AI, causal inference, and large language models. Preserving Knowledge (OFT)Orthogonal Finetuning (OFT) adapts large models to new tasks by rotating their weights, not changing them. This preserves their vast pretrained knowledge and prevents "catastrophic forgetting." Finding True Cause (DML)Double Machine Learning (DML) uses orthogonalization to statistically remove the influence of confounding variables, allowing researchers to estimate the true causal effect of an intervention from messy, real-world data. Steering LLMs (Self-Control)New frameworks allow for real-time control of LLMs during inference. By calculating gradients in the latent space, we can create orthogonal "dials" to steer the model towards truthfulness or a specific tone without retraining. |
Differential privacy blog |
Know about differential privacy at Differential privacy blog |
Learn about algorithms - K-Anonymizatio, T-Closeness, L-diversisty, Delta presence |
Learn about frameowrk to apply Differential privacy using data knobs |