Karla Parussel

Home Research CV Publications Blog

How can an agent decide for itself to peform a sequence of actions to achieve a specific outcome?

This was the question that started my research journey. It is relatively easy to create a stimulus / response agent but we don't consider these to be intelligent. Even the simplest natural agents such as animals, birds, insects etc can perform a sequence of actions to achieve an end goal. How do they learn to do this for themselves without explicit programming?

There is no single agreed upon definition of intelligence. A working definition that I like to use is the ability to adapt to an unknown environment. If the environment is fully known in advance then an agent can perform equally well, or better, by merely following hard coded rules instead.

We can program an agent to adapt to a known environment but how do we specify its behaviour for environments that are unknown to us? The whole reason for using an artificially intelligent agent rather than a conventional computer program is that instructions do not need to be so explicitely defined. And as with humans, the more autonomy an agent has, the more work we can delegate to it. Autonomy means that an agent decides for itself without explicit instruction from us.

An autonomous agent embodied within an environment is part of a sensori-motor loop. It senses its environment, this causes some internal change leading to performing an action that subsequently changes the environment it is sensing. I use the word environment in its loosest sense. They key requirements are that there are some variables that the agent can both sense and change and these variables operate under their own rules.

But how can an agent learn a sequence of actions? Pavlovian conditioning seemed the best place to start. For example dogs can become conditioned to salivate in anticipation of receiving food when they hear a bell. I wondered if sensory stimuli could become attractive in its own right because of the subsequent stimuli it can lead to given an appropriate action. Money being the perfect example. It's attractive because of what it can be exchanged for. Its utility ultimately comes from the satisfaction of intrinsic needs, such as the need to eat, drink, maintain homoeostasis etc. But even though this could potentially provide the means, I still needed to understand why an agent would learn a sequence of actions.

The functionality of emotions

Artificial neural networks are not biologically plausible and are an overly simplistic model of real neurons. Real neural networks are far more complex. For a start they learn in a very different way, locally rather than through back propagation. Signalling between neurons occurs using neurotransmitters. But there are also neuromodulators that act over much longer time scales, over hours or even days. This led to an interest in the functional role of emotions in intelligent behaviour. People tend to think of emotions as irrational and costly, but if that was true we would have evolved not to have them. We have emotions for a reason. What functionality do they provide and can we make use of this for autonomous and embodied Artificial Intelligence?

I was lucky enough to be accepted to study this research question for a PhD at Stirling University. This gave me the opportunity to develop a framework and many of the tools that I use today. I created a test framework and used artificial evolution to configure the neural networks for me. I found that the only way that it would adapt as part of a sensori-motor loop was if it acted as a minimal disturbance system. Input signals would act as form of disturbance. The stronger the signal the more they disturbed the self organising system. Weaker signals disturbed the system less so it naturally settled on outputs that reduced its input strength.

I gained an understanding of how self organising systems can be used to create intelligent agents that can adapt autonomously to unknown environments without explicit instructions. Emotions allow an embodied agent to arbitrate between competing needs when it only has one body. For example an animal that is both equally hungry and thirsty will be more successful focusing on drinking if it is near some water. An intelligent agent must also be able to decide when to continue exploiting a learned behaviour and when it would be better off exploring new opportunities. Neuromodulators can be used to bias a neural network towards either exploration or exploitation. And if an agent behaves in a way that we find either desirable or undesirable then we can signal this through the use of neuromodulators to influence its behaviour.

Tools and framework

The adaptive algorithm is only part of the story though. Just as important is the supporting infrastructure. Even though blue-sky research may never be used in the real world without re-implementation, I am a firm believer that software written purely for research can still benefit from being implemented with strong software engineering principles in mind. The whole point of research is to find out something that you do not already know. You do not know in advance what will or will not work. This means that what you need to implement can change rapidly. The more complex and difficult the programming task, the more it benefits from being designed and written in a logical and clear fashion using solid engineering principles.

Any adaptive system will have a set of parameters that need to be configured. While it's technically possible to choose these values manually, it's time consuming and won't lead to a good solution that will allow for reliable comparisons. I did not know how to create the system I wanted, so I used artificial evolution because it is known to be good at finding novel solutions when the problem is not well understood.

I started off using genetic algorithms but they did not provide any indication as to how many times a potential solution had been tested. I was using a stochastic mapping between genotype and phenotype so had no way of knowing whether the performance of a genotype was due to its fitness or down to luck. But even with a deterministic mapping, a solution needs to be tested in multiple environments and starting conditions to reliably ascertain its fitness. So I developed my own evolutionary algorithm to keep track of how often a solution had been tested and how far a run had progressed. This was inspired by simulated annealing. The run starts with the entirety of the genotype being mutated, with increasingly less of it being mutated as the run progresses. By the end of the run only single parameters are being mutated.

Another problem was how to encode the parameters of an adaptive system as a genotype. Genetic algorithms typically use an array of instructions and mapping a single dimensin to a hierarchical phenotype can be a complex process. I decided to use hierarchical genotypes so I could better match the structure of the phenotype to the genotype.

Once I evolved a system, I had to debug a black box without any idea of how it actually worked. Looking at lists of values over time gave me no idea of how it was behaving internally. So I developed the means to visualise the neural networks. This took significant effort but turned out to be a critical part of the development process.

Scaling up

After I graduated, I became a Research Fellow at the University of Hertfordshire. Even though I continued to research synthetic emotions after my PhD, I was still interested in how an agent can decide for itself to perform a sequence of actions to achieve a specific goal.

I also wanted to move beyond using a three layer neural network. Both natural and artificial evolution takes a long time and generates complex systems that can be extremely difficult to understand. If I were to rely solely on artificial evolution to create a complete system then I would be limited by available processing power and still not have any greater understanding of how it worked.

My idea was that it is relatively easy to evolve a simple neural network with known properties and that I should use these as building blocks for a larger system. So I experimented with connecting multiple copies of the same evolved neural network to feed into one another instead. I found though that it was extremely difficult to work with neural networks in this way. I decided instead that I needed an alternative to neural networks that had greater explanatory power. So I dropped biological plausibility and opted instead to create a model that abstracted how self organising systems settle into stable states.

Dynamical systems

Instead of evolving a single self-organising system, I created a connectionist framework with each node acting as a self organising system. The nodes disturb one another as they each try to settle into a stable state. Collectively the entire system of self organising systems settle into a meta-stable state.

I started this whilst working as a Research Assistant at Stirling University. Development was slow though because not only did I do this in my free time, but I often had to wait many weeks or even months for evolutionary runs to finish. But using a network of dynamical systems that modulated each other's parameters I was able to demonstrate an agent learning to perform short sequences of actions.

I've since gained a better understanding of the capabilities of self organising systems. For example, you can take a self organising system evolved for one task and apply it to a completely different task and it will still adapt. This is because they are black boxes. There is nothing internal within them that is explicitely coupled to their usage or external environment.

Simplifying and applying

I have experimented with various other architectures to test whether my understanding of the principles of self-organisation is correct. I realised though that I was hitting the limits of what is possible with toy problems and need to start applying self organising systems to real world problems. So I removed each mechanism one at a time to see whether it reduced performance or not. Each test required a new evolutionary run. I am now satisfied that I have arrived at a minimal system, although I can't really call them dynamical systems any more.

I have reimplemented the architecture for use on microcontrollers such as Arduinos so I can use it for mobile robotics. I plan to test whether a robot can learn to stand up by itself without training or any internal logic specific to the task. If it can do this then I'll try add more signals to see if it can walk, and then eventually roller skate. After this I intend reimplementing my adaptive system yet again, but this time for NVidia CUDA.