Hybrid Architecture for Metacognitive Learning
- Year 2
Overview of scientific progress
The primary topics of research during the past year were the modeling of reflexive reasoning in the CIC task, the contribution of metacognitive / reflective skill to reflexive learning, and modifications of SHRUTI to handle reasoning about uncertainty. A knowledge base was derived from critical incident interviews with active-duty Naval officers and from analysis of experimental protocols of officers at the Surface Warfare Officers Schools and Naval Postgraduate School. The knowledge base was then encoded within a SHRUTI network. One of the core scenarios selected to serve as at testbed for CIC-related research (Korea), was also encoded within SHRUTI, for use in testing the effects of training.

A machine learning study was conducted using the CIC knowledge base and the Korea scenario. Scenarios for training and testing were generated by creating a probabilistic event tree, with branching alternatives for all relevant features of tactical situations similar to the Korean scenario. The event tree included general information about the context (e.g., the level of hostility between countries to which various platforms belonged, the appropriateness of different platforms for attacking an AEGIS cruiser, and the degree of danger a platform was in that might motivate protective action). The event tree also included different possible intents (e.g., intent to attack, intent to protect), and branches for observable actions that might result from those intents (e.g., whether or not a platform was closing on another platform, flying at high or low altitude, and flying at high or low speed). Encoding the information in these scenarios made critical use of the dynamic variable binding capabilities of SHRUTI. The scenarios generated in this way were randomly divided into training and test sets. Backpropagation in the SHRUTI network was used to adapt the weights in the knowledge base based on exposure to the training scenarios.

Two conditions were compared: Simple reflexive learning, and learning with a metacognitive "hint" midway through learning. The hint consisted in the suggestion that the network consider the possibility of non-hostile intent of an approaching platform. The hint by itself provided no evidence for or against any particular hypothesized intent. The function of this hint was simply to shift the model's attention slightly, in order to overcome limitations on the spread of reflexive activation. Dependent variables in the experiment consisted in the changes in knowledge base weights and the performance of the trained system in the Korea scenario. Both reflexive training and reflexive + metacognitive training changed the weight on rules for hostile intent. But only reflexive + metacognitive training changed the model's prior beliefs in the likelihood of non-hostile and hostile intent. In addition, the two training manipulations led to different results in the Korea scenario. There was significant support for the possibility of intent to protect in that scenario after reflexive + metacognitive training, but no support at all for intent to protect after reflexive training. Thus, the metacognitive hint would lead an officer to take more time before engaging in this test scenario.

The following enhancements have been made to Shruti: encoding of schemas and their integration with rules; encoding of taxon facts -- distillations of prior observations and inferences; new computational encoding of instances, types, and sub- and super-ordinate relations; exception conditions for rules; abductive and defeasible inference (partially completed); and learning of strengths associated with rules and taxon facts (via backpropagation). In addition, work on integrating the Shruti architecture with adaptive critics (value function decompositions and reflexive planning) and support for metacognitive attention shifting is continuing.

Recent accomplishments
We have been working with a core scenario, common to two other projects funded under the same initiative, in which we have examined the structural changes which reflective reasoning can introduce into the Long-Term Memory (LTM) and the reflexive reasoning of an agent. In the simulations of the model, we explored how a training hint, modeled after Metacognitive training done with experienced Navel Officers, would influence the agent's perception of alternative explanations of evidence. Without the metacognitive training intervention, the computation agent would interpret a novel, but related, scenario as predominately supporting the conclusion of hostile intent for an approaching air track. While the anomalous evidence in the scenario did lend some support to the negation of that conclusion, the agent was not able to explain the evidence in terms of an alternative hypothesis. This behavior is very close to that observed by more junior CIC officers, who tend to interpret evidence more readily in favor of hostile intent and give less consideration to alternative explanations and to the broader context of events (political motives and ramifications of actions).
With the metacognitive training intervention, the simulation learned to recognize an alternative explanation of the evidence in the novel scenario: that the approaching air track might be intending to provide protection and coordinate rescue for crew in a downed helicopter near ownship. By recognizing such alternative explanations, as do more experienced officers, the agent is able to focus its resources on the structural uncertainty in the various explanations and can attempt to construct a novel explanation that best fits both prior experience and the novel aspects of the current situation.
Perhaps the most interesting aspect of this result lies in how metacognition acts to structure LTM, and, in doing so, changes the reflexive (first blush) reasoning of the agent. One of the predictions of the neural model is that the only a limited set of inferences can possibly be computed in the time scale of reflexive reasoning (~500ms). This has the effect, among others, that only knowledge within a limited inferential distance (in terms of chaining from one relation to another through learned patterns of causal relationships) can participate in reflexive reasoning. That is, while the greater body of knowledge in LTM is unable to participate in any given query, all inferentially / structurally close knowledge does participate. In the above example, this plays out where the more junior officer (and the agent without metacognitive training) fail to consider the role of the downed helo near ownship with respect to the intent of the approaching air track. However, the more experienced officer (agent with metacognitive training intervention) notices this relationship explicitly because LTM has been structurally changed to bring such relationships into a closer inferential proximity.
In related work, we are uniting reinforcement learning with the Shruti architecture by an appeal to the mathematical formalism underlying reinforcement learning, (dynamic programming realized in approximate and incremental algorithms), and to the macro-scale of cortical brain structure, especially to the organization of the sensory and motor projections. At a gross structural level, cortical organization divides into sensory and motor circuits. As we move centripetally in brain structure, the brain exchanges overtly spatial representations for innately temporal organizations encoding situational awareness and intentionality. Also, we find increasing interconnection between the sensory and motor pathways.
Reinforcement learning is concerned with translating an estimate of expected future value (EFV) (or related measures, such as cost/benefit) into a reflexive behavior, which optimizes EFV over time. In situations which have innate uncertainty, such as the decision making facing the CO/TAO in an Aegis CIC, both the situation estimate and the generated plans must be novel in response to the novel features of the environment. We have pursued an approach where abstract relations in the world, such as the causal linkage between an intention (to attack) and observed actions (localizing ownship, turning on targeting radar) are explicitly encoded within the reflexive reasoning mechanism of Shruti. Shruti then, in response to observations in its world, and in response to top-down priming concerning own and other intention, reflexively elaborates patterns of activity that are both explanations of the evidence and predictions concerning the world. To integrate this mechanism with reflexive planning in such relational networks, we are working with the notion of value function decompositions (VFD).
A value function decomposition serves as a projection of an agent's intention into actions that seek to achieve desirable world states in service of that intent. We already have defined patterns of interconnection in the sensory circuit that encode causal relations and which are concerned with modeling the world, that is, with belief. We extend this by generating, in the motor circuit, patterns of hierarchical relationships among those same relations that encode the value function decomposition. These patterns of interconnection serve to differentiate intention into goals, that is, the desire to make a relation true (or false).
We use the interconnections between the motor and sensory circuits to pass activity into the sensory system (world model), which automatically initiates the same reflexive reasoning process to uncover causal relations which bear on the goal, and which can be exploited as plans to achieve the goal. Sub-goals are discovered for every causally relevant relation that bears on any active goal. Once stable patterns of activity are discovered, the elaborated goal (and plan components) can be actualized by removing top-down inhibition, reflecting the decision to act. This model of reflexive planning is adapted, in response to reinforcement, by tuning the weights on the centrifugal projections that are the value function decomposition.
One of the more interesting aspects of this approach is that it makes explicit the functional illusion of goal/sub-goal based planners. Relations, as used to model the world, become goals by being in a state in which they are highly activated in intention (contrast with activated in belief). That is, the differentiation of intention via the value function decomposition is such that an intentional state activates relations, as goals, in proportion to, and via, weights which are tuned by reinforcement learning. The process of sub-goaling is merely that of spreading activation (of differentiated intention) along causal chains and the interconnection patterns of the value function decomposition. As the intentional state shifts, or the situation estimate shifts, the system automatically, and reflexively, re-plans.
The central limit of this model, and, arguably, a limit to which human decision makers are subject as well, is the limited scope of coherent and reflexive inference. This is the same issue discussed above with respect to the structural organization of LTM over time. The design responds to this issue by providing a reflective, or metacognitive, adaptive process that is concerned with shifting internal attention to elaborate and compare explanations and plans.
Copyright © 2000-2011 Cognitive Technologies, Inc.
Questions? Comments? Contact webmaster@cog-tech.com