Why Efficient-RPLH System? 🤔¶

RPLH-efficient is the improved version of the original RPLH system where we prevent many of the fallbacks of RPLH-original hallucinating about many undpoable actions. We use the instructor package, which usese Fine State Machine that adjust the probability output of the language model to achieve specific format of the output. With this approach, it comes with tradeoff as well:

Markovian memory and limited prompt
- Languag models tends to perform better when there are limited and specific information (prompt) given to them. We designed to system to be specifically utilizing this characteristic of these models.

Limited convesration ability of the LM agents, less hallucination of future plans and attitude.

Structured format response giving only in the following fashion:

class HCA(BaseModel):
    attitude: List[str]
    reasoning: str
    future_step: str
    actions_plan: Dict
class Judge(BaseModel):
    justification: str
    actions_plan: Dict

Under this system, the judge also serve as an syntactic checker as well where it would change the wrong syntax that the HCA agent have given.
To remain the clearness of information flow and decision making, when needed to retake action, the LMs that is making the new decision would use teh original functions that the original HCA agent/Local agent/Judge uses with additional feedbacks giving through functional pass in with functool.partial.
To still promote some conversation going on, only output agent (HCA and judge) uses the strict output formatter on LMs, the local_agent and attitude agent outputs strings directly and is free to have more conversations.
Extending standard reasoning, we can bake in agent based reasoning because of the structured output.

Optimization Tricks¶

We used many optimization hacks:

When an local agent does not have boxes or targets in its block or when it does not have an action provided from the HCA agent, it would not be considered as a valid conversation source as its views are limited.
When vote counts (consecutive agreement number) surpasses half the number of agents, the HCA action is executed directly. If not consecutive, the count is re-initiated.
If the local agent agrees the HCA decision, then no judge should be involved and the HCA action would be passed on directly (relapsed) to the next round of local agent or directly get executed.
Instead of letting the HCA agent figuring out attitude, we limit the prompt length that can be given by assigning an attitude agent taht judges the attitude of each agent on the field with current round information only, then only this attitude information is passed to teh next round HCA agent.

Because of these optimization tricks, it gets really quick in later trials for execution to happen.

Advesarial Reasoning System¶

To perform agent based reasoning, there is few game rules that we have implemented for this advesarial environment: 1. There exist extra information in the response field, namely agent_model and spy_model to capture LM's thoughts on explictly building an agent reasoning system: - These models ar incrementally imrpoved by teh judge HCA, not the HCA that is giving command.

class HCA_AgentModel(HCA):
    agent_model: Dict[str, str]
    spy_model: Dict[str, str]

Each agent_model and spy_model are two degree Markovian (though this can be a hyperparameter) where each judege HCA when updating the models can see the previous two versions of the models.
- The model may get overlapped by later interactions.
For the speedyness of our system, once a spy does not have boxes to move or tragets to match in its grid, then the system will remaind the HCA that one spy agent has been identifided. If all the spy agent has been identified, the system would remaind the HCA agent to not look for spy anymore.