Some Next Steps¶
More Questions¶
There are 2 questions to solve (one easy and one hard):
- The easy one is doing constraint optimization on model-free algorithm families.
-
The hard one is to reformulate this problem theoritically.
-
Harder question to solve:
-
Can I bake in the idea of having a world model into the algorithm itself, not just using networks. There is a theoritical perspective from EM, and then there is a practical perspective of how can we do it from constraint optimization. Here is a few question that I want to answer
-
We need to change the fundamental constrained optimization's formulation from MOMPO. What if I say \(q\) distribution not as a action distribution (in MOMPO) but a latent distribution in the VAE (representing the model of the environment), then we add KL as a constraint on the VAE latent distribtion at each iteration, gradually constructing \(q\). Can we still derive an ELBO for it (new MinMax Lagrangian duality optimization problem).
-
Need to revise the mathamatical formulation. Attempt to derive from MOMPO here
-
-
Biologically related questions
- Does establishing a world model, similar to the Cerebellum's function, facilitate motor action execution by providing a motor plan derived from previous motor control experiences for additional guidance (compare to pure sensory feedback like in model-free RL)? Moreover, can this new motor learning process be incorporated into the GDP for future motor controls?
- See if such biologically inspired strategy (mechanistic insight) improves performance.
- See if the Forward Model would resemble functionality and behavior of the cerebellum (for example, showing gradual learning of new motor skills).
-
With an change of the understanding for the rules of the world, can the algorithm still find a sub-optimal point in this training world such that it works still fine or even better than solely one-world-model trained agent in the other world?
Code Base & Experiments:¶
Implementation notes in here. We need to consider what phenomenon we would see when our idea actually works? Need a good testing paradigm.
- Easier and easier to learn new things? The model should develope an intuition of the enviornment and learn better when seeing similar environment? Learning a combined skill set from previous experiences of smaller skills that can be transferable should be easier?
- Being able to transfer back and have memory retention?