Some Next Steps

More Questions

There are 2 questions to solve (one easy and one hard):

  • The easy one is doing constraint optimization on model-free algorithm families.
  • The hard one is to reformulate this problem theoritically.

  • Harder question to solve:

    • Can I bake in the idea of having a world model into the algorithm itself, not just using networks. There is a theoritical perspective from EM, and then there is a practical perspective of how can we do it from constraint optimization. Here is a few question that I want to answer

    • We need to change the fundamental constrained optimization's formulation from MOMPO. What if I say \(q\) distribution not as a action distribution (in MOMPO) but a latent distribution in the VAE (representing the model of the environment), then we add KL as a constraint on the VAE latent distribtion at each iteration, gradually constructing \(q\). Can we still derive an ELBO for it (new MinMax Lagrangian duality optimization problem).

    • Need to revise the mathamatical formulation. Attempt to derive from MOMPO here

  • Biologically related questions

    • Does establishing a world model, similar to the Cerebellum's function, facilitate motor action execution by providing a motor plan derived from previous motor control experiences for additional guidance (compare to pure sensory feedback like in model-free RL)? Moreover, can this new motor learning process be incorporated into the GDP for future motor controls?
    • See if such biologically inspired strategy (mechanistic insight) improves performance.
    • See if the Forward Model would resemble functionality and behavior of the cerebellum (for example, showing gradual learning of new motor skills).
  • With an change of the understanding for the rules of the world, can the algorithm still find a sub-optimal point in this training world such that it works still fine or even better than solely one-world-model trained agent in the other world?

Code Base & Experiments:

Implementation notes in here. We need to consider what phenomenon we would see when our idea actually works? Need a good testing paradigm.

  • Easier and easier to learn new things? The model should develope an intuition of the enviornment and learn better when seeing similar environment? Learning a combined skill set from previous experiences of smaller skills that can be transferable should be easier?
  • Being able to transfer back and have memory retention?