Inspirations

Biological

The cerebellum have been long theorized to play an crucial rule in motor control and learning (Forward modeling). Corollary discharge encodes a efferent copy of the motor command to be processed to predict the consequences of actions before sensory feedback is available. Such process would help us predicts how the sensory state of our body will change and how should these actions be performed, achieving better performances in control.

Wold Model (Environment Model)

We believe that policy is to a task specific, but if we build an reward model or an world model, such model may be agonist of the environment or the task, achieveing continual learning purpose. The question becomes how can we build such representation? Through latent space and modifying architecture? By modifying the training/learning algorithm and the fundamental way tha egnt learn things?

Philosophical (Constraint Solving)

It is not about finding the best (optimal solution) each time, but rather finding a good enough solution (Optimal under our projection of the past trees that we have explored). Every agent that learns in different world serves as a constraint for each other. They explore the world from their own perspective and “pull” each other on the way to prevent any one of them from falling into the “local optimality illusion” that one sees in one moment.

Optimality in one instance may not be optimal in the long run, our optimization landscape continuously changes across time where the true surface is the surface that includes all tasks' surfaces. Gradually, the surface of this sequential optimizing task should reveal itself. It is a highly non convex optimization, all we can do is to believe the conservative tree that we built that has previous experience pulling on each other: frome ach sub-optimality we hope to achieve optimality.

  1. We can consider such model as a constrained process, forcing the agent to learn while incorporating its previous experiences or sampling under the expectation of previous world models.
  2. The forward model's facilitation does not come directly from action facilitation but rather from providing a better representation of the feature space \(\vec \phi(\vec x)\) that the agent has built up.

It's about operating in this imaginary state. It is never about explicitly facilitating actions but rather implicitly making the model understand the dynamics and interactions in this space more comprehensively.