Some Next Steps¶
More Questions¶
There are 2 questions to solve (one easy and one hard):
- The easy one is doing constraint optimization on model-free algorithm families.
The hard one is to reformulate this problem theoritically.
Harder question to solve:
Can I bake in the idea of having a world model into the algorithm itself, not just using networks. There is a theoritical perspective from EM, and then there is a practical perspective of how can we do it from constraint optimization. Here is a few question that I want to answer
We need to change the fundamental constrained optimization's formulation from MOMPO. What if I say \(q\) distribution not as a action distribution (in MOMPO) but a latent distribution in the VAE (representing the model of the environment), then we add KL as a constraint on the VAE latent distribtion at each iteration, gradually constructing \(q\). Can we still derive an ELBO for it (new MinMax Lagrangian duality optimization problem).
Need to revise the mathamatical formulation. Attempt to derive from MOMPO here
Biologically related questions
- Does establishing a world model, similar to the Cerebellum's function, facilitate motor action execution by providing a motor plan derived from previous motor control experiences for additional guidance (compare to pure sensory feedback like in model-free RL)? Moreover, can this new motor learning process be incorporated into the GDP for future motor controls?
- See if such biologically inspired strategy (mechanistic insight) improves performance.
- See if the Forward Model would resemble functionality and behavior of the cerebellum (for example, showing gradual learning of new motor skills).
With an change of the understanding for the rules of the world, can the algorithm still find a sub-optimal point in this training world such that it works still fine or even better than solely one-world-model trained agent in the other world?
Code Base & Experiments:¶
Implementation notes in here. We need to consider what phenomenon we would see when our idea actually works? Need a good testing paradigm.
- Easier and easier to learn new things? The model should develope an intuition of the enviornment and learn better when seeing similar environment? Learning a combined skill set from previous experiences of smaller skills that can be transferable should be easier?
- Being able to transfer back and have memory retention?