In Hierarchal Reinforcement Learning (HRL) agents can break down a complex task into smaller one. Q Learning is an algorithm used in hierarchal learning. However Q learning find it hard to learn from very big state space. It does not work in continuous space as exploring all states is hard. Learning policy is even harder. Key to success is - can you create an abstract world (dataset) that let you learn good policies.
Identify where generative data is very different from real data and may lead to compliance issue e.g. legal, health, insurance, finance
Test , debug machine learning model behavior. Discover path NN may take in production. Discover errors, behavior and impact on rare and new input added thru generative AI
From the blog
Generative AI and RL agent can work together to build a robust system. A Variational Auto Encoder (VAE) learn from latent space and then it geenrate future trajectories. An RL agent can learn from these and build an optimal policy. Using knobs one can keep state space limited and focus on abstraction that are required for learning.
Discovering abstractions that reduce the amount of experience or thinking time an RL agent requires to find a good solution is key for RL success. Knobs let you control how big or small state space you can use.
Using knobs you can manage a trade off between compression of states and representation of good behavior.