/ / / / / /

 

How can an agent decide for itself to peform a sequence of actions to achieve a specific outcome?

There is no single agreed upon definition of intelligence. A working definition that I like to use is the ability to adapt to an unknown environment. If the environment is fully known in advance then an agent can perform equally well, or better, by merely following hard coded rules instead.

We can program an agent to adapt to a known environment but how do we specify its behaviour for environments that are unknown to us? The whole reason for using an artificially intelligent agent rather than a conventional computer program is that instructions do not need to be so explicitely defined. And as with humans, the more autonomy an agent has, the more work we can delegate to it. Autonomy means that an agent decides for itself without explicit instruction from us.

An autonomous agent embodied within an environment is part of a sensori-motor loop. It senses its environment, this causes some internal change leading to performing an action that subsequently changes the environment it is sensing. I use the word environment in its loosest sense. They key requirements are that there are some variables that the agent can both sense and change and these variables operate under their own rules.

But how can an agent learn a sequence of actions? Pavlovian conditioning seemed the best place to start. For example dogs can become conditioned to salivate in anticipation of receiving food when they hear a bell. I wondered if sensory stimuli could become attractive in its own right because of the subsequent stimuli it can lead to given an appropriate action. Money being the perfect example. It's attractive because of what it can be exchanged for. Its utility ultimately comes from the satisfaction of intrinsic needs, such as the need to eat, drink, maintain homoeostasis etc. But even though this could potentially provide the means, I still needed to understand why an agent would learn a sequence of actions.

The functionality of emotions

I was lucky enough to be accepted to study this research question for a PhD at Stirling University. This gave me the opportunity to develop a framework and many of the tools that I use today. I created a test framework and used artificial evolution to configure the neural networks for me. I found that the only way that it would adapt as part of a sensori-motor loop was if it acted as a minimal disturbance system. Input signals would act as form of disturbance. The stronger the signal the more they disturbed the self organising system. Weaker signals disturbed the system less so it naturally settled on outputs that reduced its input strength.

I gained an understanding of how self organising systems can be used to create intelligent agents that can adapt autonomously to unknown environments without explicit instructions. Emotions allow an embodied agent to arbitrate between competing needs when it only has one body. For example an animal that is both equally hungry and thirsty will be more successful focusing on drinking if it is near some water. An intelligent agent must also be able to decide when to continue exploiting a learned behaviour and when it would be better off exploring new opportunities. Neuromodulators can be used to bias a neural network towards either exploration or exploitation. And if an agent behaves in a way that we find either desirable or undesirable then we can signal this through the use of neuromodulators to influence its behaviour.

Tools and framework

Any adaptive system will have a set of parameters that need to be configured. While it's technically possible to choose these values manually, it's time consuming and won't lead to a good solution that will allow for reliable comparisons. I did not know how to create the system I wanted, so I used artificial evolution because it is known to be good at finding novel solutions when the problem is not well understood.

I started off using genetic algorithms but they did not provide any indication as to how many times a potential solution had been tested. I was using a stochastic mapping between genotype and phenotype so had no way of knowing whether the performance of a genotype was due to its fitness or down to luck. But even with a deterministic mapping, a solution needs to be tested in multiple environments and starting conditions to reliably ascertain its fitness. So I developed my own evolutionary algorithm to keep track of how often a solution had been tested and how far a run had progressed. This was inspired by simulated annealing. The run starts with the entirety of the genotype being mutated, with increasingly less of it being mutated as the run progresses. By the end of the run only single parameters are being mutated.

Another problem was how to encode the parameters of an adaptive system as a genotype. Genetic algorithms typically use an array of instructions and mapping a single dimensin to a hierarchical phenotype can be a complex process. I decided to use hierarchical genotypes so I could better match the structure of the phenotype to the genotype.

Once I evolved a system, I had to debug a black box without any idea of how it actually worked. Looking at lists of values over time gave me no idea of how it was behaving internally. So I developed the means to visualise the neural networks. This took significant effort but turned out to be a critical part of the development process.

Scaling up

I also wanted to move beyond using a three layer neural network. Both natural and artificial evolution takes a long time and generates complex systems that can be extremely difficult to understand. If I were to rely solely on artificial evolution to create a complete system then I would be limited by available processing power and still not have any greater understanding of how it worked.

My idea was that it is relatively easy to evolve a simple neural network with known properties and that I should use these as building blocks for a larger system. So I experimented with connecting multiple copies of the same evolved neural network to feed into one another instead. I found though that it was extremely difficult to work with neural networks in this way. I decided instead that I needed an alternative to neural networks that had greater explanatory power. So I dropped biological plausibility and opted instead to create a model that abstracted how self organising systems settle into stable states.

Dynamical systems

I started this whilst working as a Research Assistant at Stirling University. Development was slow though because not only did I do this in my free time, but I often had to wait many weeks or even months for evolutionary runs to finish. But using a network of dynamical systems that modulated each other's parameters I was able to demonstrate an agent learning to perform short sequences of actions.

I've since gained a better understanding of the capabilities of self organising systems. For example, you can take a self organising system evolved for one task and apply it to a completely different task and it will still adapt. This is because they are black boxes. There is nothing internal within them that is explicitely coupled to their usage or external environment.

Simplifying and applying

I have reimplemented the architecture for use on microcontrollers such as Arduinos so I can use it for mobile robotics. I plan to test whether a robot can learn to stand up by itself without training or any internal logic specific to the task. If it can do this then I'll try add more signals to see if it can walk, and then eventually roller skate. After this I intend reimplementing my adaptive system yet again, but this time for NVidia CUDA.