Intra-Learning Networks

Dr Michael Reiss, formerly of Kings College London

(First draft 22 Dec 2006)

Abstract

The paper Hierarchical Temporal Memory – Concepts Theory and Terminology describes a system in which neurons have the characteristic that they learn to respond to a set of (perhaps disparate) frequently occurring input patterns. I would suggest that it is implausible that biological neurons could perform such a function at all, and even if they could, I would further suggest that it is implausible that they could be made to learn such functions. In this paper I will propose an alternative approach which is biologically plausible, both in function and learning.

Why is this paper on the web but not in a refereed journal?

Introduction

Just after finishing my PhD at Kings College London in 1992, I started experiments on a new network (which I shall call the Intra Learning Network (ILN)). However, before I had time to fully develop it and write it up, I left academia to pursue “computer go”. I stopped following developments in neural networks, partly because it appeared that the entire field was in a bit of a cul de sac at the time. Then just by chance in late 2006 I came across a video of a lecture by Jeff Hawkins and a related paper Hierarchical Temporal Memory – Concepts Theory and Terminology which made me realise that neural networks had moved on and that my theory could perhaps compliment the latest ideas in neural nets.

I liked the overall thinking behind the Hierarchical Temporal Memory (HTM) paper but when I read in more detail about what the individual "neurons" were doing in the system I suspected that the required functionality could not be implemented biologically.

HTM architecture, and the operation of its neurons

The HTM system comprises layers of neurons. The first layer will be fed inputs from the outside world, be it visual, auditory or any other senses. Each subsequent layer consists of nodes each of which will be fed a local subset of inputs from the preceding layer. Say the input space of a node was comprised of a sample receptive field of 100 inputs from a larger input array of some outside world stimulus, lets say visual stimulus: What you expect to see is some temporal consistency over short periods as the “eyes” scan around or as objects come and go in the field of vision. Undoubtedly the patterns that occur in our sample receptive fields will be extremely varied, but some combinations of nodes will be more likely to occur more often than others. A node in the Hierarchical Temporal Memory paper is asked to learn to respond to a set (size n) of the most frequently occurring sample patterns. The method by which this is achieved described in the paper made no attempt to be biologically realistic, instead the node was simply allowed to have a database and a lookup mechanism.

After reading this section I realised that a system I devised many years ago could fit in right here. In my system, instead of having a single node learn to respond to n frequently occurring patterns – I would have n nodes, each of which would learn to respond to a different single one of those n patterns. Note that this doesn’t amount to exactly the same thing. But it can be made to do so by the very simple (and easily biologically implement-able) process of having an OR operation act on my n nodes combining them into a single output. However I believe the HTM network may work perfectly well with n nodes giving separate outputs.

How does this Intra-Learning Network (ILN) work?

The neurons in ILN are of the simple threshold-and-fire variety with connections between them of variable strength. The neurons are arranged in layers. The inputs for a neuron in an ILN are from randomly selected local neurons in all directions i.e. earlier layers, later layers and within the same layer. The key feature of an ILN is the fact that each neuron in the network will use another nearby neuron (which could be selected from one in any direction, earlier, later or in the same layer) as a "target" neuron that it will attempt to emulate. It will learn to copy its target neuron using a simple delta rule but with the addition of a an extra modification: Namely every time a neuron fires at the same time as its target, a signal will be sent to nearby neurons suppressing their learning around that time. This is to prevent local neurons all learning to respond to the same things. The thresholds of the neurons in the system should be initialised so that in general only a small fraction of the nodes in the entire system should be firing at any one time perhaps <1% perhaps <<1%.

I have run some very limited simulations of early versions of the system on synthetic training data. These simulations were not thorough enough to conclusively prove that that the system works, but were encouraging, and have led me to believe that the system would indeed behave in much the same way as a HTM network. That is to say the earlier layers would become populated with neurons that were detectors of low-level frequently occurring patterns in the input space while later layers would become detectors of increasingly high level features.

Temporal features of HTM/ILN

Another aspect of the neurons described in the hTM paper was that they would learn to responds to sequences of patterns. That is to say if pattern A is frequently followed by pattern B then the neuron would record this in its database and so would learn to respond to the sequence A-B. Again this is very difficult to imagine being implemented by a single biological neuron. However in section 3.3 of my PhD I describe a system I called "crumbling leaky integrators (CLI's)" which were a computationally useful way of representing a temporal history of events within a network. If CLI's were added in to the ILN network and the outputs of the CLI nodes used as additional inputs to the ILN nodes then this would result in similar functionality (and in my opinion improved functionality) to the temporal learning aspects of the HTM system. There are many ways in which CLI nodes could be added in to a ILN. The simplest to implement would be to insert a group of CLI nodes corresponding to every node in the entire system. This way the normal ILN learning nodes would have amongst their inputs not only the information about the activity of other local nodes, but also the history of their activities.

A note on dimensionality reduction

I have seen many papers on computer vision in which the stated goal of the proposed systems is to reduce the dimensionality of the input space from thousands of raw pixels down to a handful of nodes which represent named objects. Whilst I may agree with the idea of overall dimensionality being less in the final layer then the input layer, I do not think that the dimensionality should be reduced from every layer to the next. I would suggest that it is probably beneficial to greatly expand the dimansionality in the intermediate layers. For example it may be of benefit in the first layer of neurons to have multiple nodes for each pixel, each of which would act as a detector of a different local feature.

Simulations

The complete system as described in this paper has yet to be fully simulated. However I can report on a couple of simulations run back in 1993/4? that give a flavour of the ideas expressed here.

Simulation 1. Spectrogram analysis for Rolls Royce.

This is a very simple simulation that illustrates the principle of self prediction (or intra learning) as a useful image processing tool. Note that this particular version of intra-learning is NOT the "full" system. Simulation 2 is closer to that.

A standard test Rolls Royce performed on their engines was to run them starting from idling speed up to maximum revs and record the sound as a spectrogram. This spectrogram could be considered as a greyscale image, with sound frequency on one axis, engine revs on the other and then the amplitude encoded as the brightness. This image could then be examined for abnormalities.

I was given the task of designing a neural network to assist engineers in detecting abnormalities.

The network I designed was as follows:

Each pixel in the spectrogram image had its own node. The pixel's intensity would act as a target value for its node. Each node took inputs directly (no hidden nodes) from a small annulus (ring) of local pixels. During the training phase of the network the set of known good spectrograms were repeatedly presented to the network and each node in the network would adjust its weights so as to more accurately "predict" the intensity of its target pixel at the centre of the annulus.

The testing of the network consisted of presenting a novel spectrogram to the network and displaying a new corresponding image where the intensity of each pixel was given by the error of each nodes prediction of its target. I will call this an "error map".

Results

No objective measure of the performance of the network was ever devised, but subjectively I could say that the error map did appear to correspond very neatly with regions where there were abnormalities in the spectrogram and the Rolls Royce engineer I was liaising with was very happy with the results.

Simulation 2. The automatic creation of orientated line detectors.

During my PhD I had heard about neurons in the visual system that were sensitive to edges in an image at a particular orientation (sorry no refs). I also heard that they would be arranged in patterns with nearby nodes being sensitive to similar orientations. If the "favoured" angle of orientation of each node was marked with a small line at the appropriate angle, then you would see smoothly changing patterns of orientations. I also knew that this arrangement of orientational preferences was a learned phenomena and animals (cats?) brought up in an artificial environment with only one orientation of lines would not develop their orientated line detectors in the same way.

I set out to model the learning of the orientated line detectors.

First of all I thought that *unorientated* edge detection would be available to the visual system at some point and I figured that this would be the "input" layer to the system for learning orientated lines.

The network architecture: The network was simply two layers, both two dimesional sheets of neurons and both with the same number of nodes. The input layer was simply the unorientated edge detection. Each node in the output layer was connected to a local annulus of input nodes with the centre of the annulus being the output's corresponding input node.

The learning: The learning process consisted of presenting random lines on the input layer. The lines could be at any orientation and any offset. At each presentation the task of each output note was to predict the activity of its corresponding input node (intra-learning). It would learn by adjusting the weights of its input connections with a certain "learnrate". Now a vital part of the system was that the learnrate was modulated by other local output nodes - i.e. nodes that successfully predicted an input would locally suppress the learnrate of neighbours. This was a kind of competition process to prevent the output nodes all learning the same thing.

The results

The result of this learning process was that the output nodes did indeed become sensitive to preferred orientations of line. Not only that, but the arrangement of preferred orientations made similar looking patterns to those seen in nature.

Conclusions

If this theory was proven correct then it may suggest that the most common task being performed by neurons in real brains is to continuously attempt to learn to predict the activities of other neurons but based on different inputs.

Finally...

Please email me mick@reiss.demon.co.uk with any comments and suggestions.