Sub-Optimality Optimization (S.O.O)

Introduction

To me, I feel like the traditional perspective of continual learning has always been deem as the process of doing 2 things: "how to preserve more when learning a new task (memory stability)" and "how to even learn under this condition (memory plasticity)". However, I would like to deem the problem of CL from a different perspective. I would define the problem of CL as:

Learning the best action to take given that we are in a specific condition.

Each tasks is sort of chained together to form a harder problem that we are trying to solve. With this formulation, we are breaking the problem into two parts (goes with the overall workflow of EM algorithm) where one part (VAE) focuses on creating a cohesive picture of various environment (creating the picture of the "bigger" environment) and the other (Model-free RL) searches given a niche projection, or an instance, of the "bigger" environment.

  1. VAE as construction: How to learn a correct representation of the environment? What are the low dimensional key feature of each environment that we can learn? How should we represent these features that we learned about the enviornment? Latent? Distribution?
  2. RL as Search: Given the features of the environment, how can we build an search algorithm that finds the optimal solution under the environmental projection and store them as a generalized idea (notice that this understanding is not discrete)? Maybe we need a highly parametrized network?

This sort of ideas stems from thinking about how we learn, specifically how little kids would learn. Mathamatically, it is an idea that includes 3 different domains of RL: model-based, model-free, and RL as inference. Under this perspective, we are no longer dealing with different objective surfaces in different discrete number of tasks but rather looking at only one objective surface that gradually updates with more tasks involved (think of it as a "bigger" enviornment that encodes gradually) and the goal been finding a "workable solution" in such highly non-convex surface, or we can say optimization under the projection of each environment instance. We can even say that the search algorithm is sampling under such environmental model.

Notebally, unlike traditional model-based RL where we try to model environment probability and does "virtual imagination", we try to learn an environment conditional distribution (do we want to represent it in this way? Essentially we are saying how likely the environment be, approximate with many Gaussian) for each search problem.

The architecture of the networks may look something like this where we try to establish a understanding of the enviornmnet:

schematics

This research project is intended to be an attempt to push a little bit more on building the theoritical construct of creatiung models of the world, the codebase is intended for an empirical testing method of seeing our idea.

Core Problem Formulation

We need to focus on the core problem in this project: how to construct a good q-distribution that captures overall environmental dynamics (1. how do we get enough sampling at each subtask? 2. How can we gradually learn this cohesive picture, not focusing on the forgetting part, rather focus on building a holistic picture) while developing task specific policy that works well enough given current environmental status (1. how can we build such general controller?)

As human, we never even get the full signals of a picture, rather, we piece whatever "energy" we can get and create such hollistic picture that we see. The question now is how can we constructs such process in an machine? How can we try to build a bigger picture of the world from having small noisy samples of what we see.