Working Memory

Working Memory

So if you haven’t come across the concept of working memory, you really haven’t been paying attention. Understanding Working Memory provides the panacea for all your teaching ills:

  • David Didau (@DavidDidau) has used the concept to discuss problems that might come from “Reading Along“.
  • Greg Ashman (@greg_ashman) champions its use because it underlies “Cognitive Load Theory”.
  • Baddeley’s interpretation of Working Memory provides a theoretical underpinning of Mayer’s “Dual Coding” – you learn more from the spoken word with pictures – which is all over the place at the moment.

It is a straightforward concept – we can only give attention to so much at any one time.

It seems clear to me that theories which aim to understand and allow for what could be a major bottle-neck in the cognitive process have real potential to inform teaching practice. In this post I discuss my interpretation of evidence for and the models of Working Memory. [Note that the evidence discussed here is from psychology. There is starting to be a lot more evidence about memory from neuro-imaging which I begin to discuss here.]

Any discussion of the application of WM to theories of teaching will have to wait for a later post.

Baddeley’s Original Model

Academic papers on Working Memory usually start by referencing Baddeley & Hitch’s 1974 paper; this proposed three elements to working memory:

  • The Visuo-spatial sketch pad (a memory register for objects and position)
  • The Phonological Loop (a memory register for spoken language)
  • and a Central Executive (control)

The working memory (WM) concept grew out of “dual memory” models, the idea that Short Term Memory (STM) and Long term Memory (LTM) were very different things.  Baddeley and Hitch were working on experiments to test the interface between the two. In an autobiographical review from 2012 Alan Baddeley describes how their three element model arose from experiments such as trying to break down students’ memory for spoken numbers with visual tasks:

In one study, participants performed a visually presented grammatical reasoning task while hearing and attempting to recall digit sequences of varying length. Response time increased linearly with concurrent digit load. However, the disruption was far from catastrophic: around 50% for the heaviest load, and perhaps more strikingly, the error rate remained constant at around 5%. Our results therefore suggested a clear involvement of whatever system underpins digit span, but not a crucial one. Performance slows systematically but does not break down. We found broadly similar results in studies investigating both verbal LTM and language comprehension, and on the basis of these, abandoned the assumption that WM comprised a single unitary store, proposing instead the three-component system shown in Figure 1 (Baddeley & Hitch 1974).

In the 1970s “cognitive science” had begun to influence “cognitive psychology”, and cognitive science is very much about using our ideas about computers to help understand the brain, and our ideas about the brain to advance computing (AI). Those of us old enough to remember the days when computing was severely limited by the capabilities of the hardware will recognise the Cog. Sci. ideas that informed this hypothesis. The limitations of the very small memory which held the bits of information your CPU was directly working on could slow down the whole process (even today, with the great speed of separate computer memory, processors still have “on-chip” working memory known as the “scratch pad”). Equally, a similar limitation will be familiar to anyone who makes a habit of doing long multiplication in their head; there’s only so much room for the numbers you are trying to remember so they can be added up later limiting your ability to perform the task. As such the limited Working Memory idea has instinctive appeal; the WM is limited, the LTM is not, but the LTM is not immediately accessible.

That there are two separate memory registers connected to different senses gives rise to all sorts of opportunities for designing learning with the two being used in parallel – dual coding. But does beg the question why stop there? A well cited 90s paper uses PET scans to identify three brain areas which might represent separate Working Memory caches for language, objects and position, leading them to propose a four way model just for visual information:

And in his 2012 review Baddeley himself proposes that something more complex might form the basis for future research:

Who knows, a model such as the one Baddeley speculates about above might even provide a theoretical basis for learning styles!

Evidence for a Limited Working Memory

Experimental studies into Working Memory generally take the form of exercises whereby a task such as correcting grammar or doing sums is used to suppress rehearsal. If you are doing something else you can’t be repeating what you need to remember over and over. Subjects are then tested on how many letters, words, numbers, shapes, spacings they can remember at the same time as doing this second task. The number varies a little with the task and with the model the researcher favours, but 4-7 is typical.

Experiments find working memory capacity to be consistent over time with the same subjects, and reasonably consistent when the mode of the test, language rather than maths, for example, is changed.

Scores in these tasks (dubbed Working Memory Capacity, WMC) have also been found to not only correlate with g a measure of “fluid intelligence”, but also correlate with a range of intellectual skills:

“Performance on WM span tasks correlates with a wide range of higher order cognitive tasks, such as reading and listening comprehension, language comprehension, following oral and spatial directions, vocabulary learning from context, note taking in class, writing, reasoning, hypothesis generation, bridge playing, and complex-task learning.” (in-text references removed for clarity)

The consistency of Working Memory Span Task outcomes and the breadth of skills that they correlate with strongly indicates that something important is being measured. That there is a limit to “working memory” seems hard to question. My reading has not led me to lots of evidence that there are two different and separate registers (Visuo and Phonological), but that may be a consequence of my paper selection being skewed by the firewalls that restrict access to research.

Other Models

A prolific author of papers on Working Memory and acknowledged by Baddeley, even though he puts a quite different emphasis on the conclusions that can be drawn from the tasks discussed above, is Nelson Cowan. Cowan views the WM Span tests as measuring a subject’s ability to maintain focused attention in the face of distractions. He models the Working Memory as portions of the Long Term Memory that have been activated by that focus upon them, but my reading of a little of his work is that he is less concerned with the mechanisms of memory than with “focus of attention” as a generalised capability:

“A great deal of recent research has converged on the importance of the control of attention in carrying out the standard type of WM task involving separate storage and processing components….. (1) These WM tasks correlate highly with aptitudes even when the domain of the processing task (e.g., arithmetic or spatial manipulation) does not match the domain of the aptitude test (e.g., reading). That is to be expected if the correlations are due to the involvement of processes of attention that cut across content domains. (2) An alternative account of the correlations based entirely on knowledge can be ruled out. Although acquired knowledge is extremely important for both WM tasks and aptitude tasks, correlations between WM tasks and aptitude tasks remain even when the role of knowledge is measured and controlled for. (3) On tasks involving memory retrieval, dividing attention impairs performance in individuals with high WM spans but has little effect on individuals with low WM spans” (in-text references removed for clarity)

 Cowan’s model is at the lower end for the number of items that can be stored in the working memory – four, but as Baddeley says “Importantly, however, this is four chunks or episodes, each of which may contain more than a single item“.

The nice thing about Cowan’s model is that it addresses the link to Long Term Memory, which as teachers is the thing that we hope to impress new ideas upon. Fledmann Barette et al who follow this “control of focus of attention” model for differences in WMC between individuals state:

WM is likely related to the ability to incorporate new or inconsistent information into a pre-existing representation of an object

i.e. WM is likely central to learning.

 What Cowan’s model does not include is any suggestion of different registers for different kinds of information. This may, however, be because his work is about focus of attention, not memory structure.

Related to Cowan’s model and also acknowledged by Baddeley in his 2012 review is Oberauer’s work which says:

“1. The activated part of long-term memory can serve, among other things, to memorize information over brief periods for later recall.

  1. The region of direct access holds a limited number of chunks available to be used in ongoing cognitive processes.
  2. The focus of attention holds at any time the one chunk that is actually selected as the object of the next cognitive operation.”

Which is nicely explained in his figure:

A 2014 paper, suggests current thinking is that a better fit for experimental data might be WM as a limited resource that can be spread thinly (with more recall errors) or be more tightly focussed, rather than as register with slots for somewhere between 4 and 7 things. I suspect that if this turned out to be the case it would not much bother Oberauer because it may be no more than the oval in his diagram being drawn tighter or more widely spread.

Baddeley’s Revised Model

The experimental evidence for a Working Memory Capacity discussed above caused Baddeley to revise his original model. Again from his 2012 review…

Such results were gratifying in demonstrating the practical significance of WM, but embarrassing for a model that had no potential for storage other than the limited capacities of the visuo-spatial and phonological subsystems. In response to these and related issues, I decided to add a fourth component, the episodic buffer

 “The characteristics of the new system are indicated by its name; it is episodic in that it is assumed to hold integrated episodes or chunks in a multidimensional code. In doing so, it acts as a buffer store, not only between the components of WM, but also linking WM to perception and LTM. It is able to do this because it can hold multidimensional representations, but like most buffer stores it has a limited capacity. On this point we agree with Cowan in assuming a capacity in the region of four chunks.

This change brings Baddeley’s model much closer to Cowan’s. Both have an unspecialised, four slot buffer or register, each slot (according to Baddeley, in both models) capable of holding more complex ideas than just a word or number, or as Baddeley puts it “multidimensional representations”. Baddeley regards this as a separate entity to the long term memory, and individual differences presumably follow from differences in the architecture of this entity, where Cowan sees it as an activated portion of the LTM and individual differences stem from differences in the ability to maintain focus on this activated area despite cognitive distractions.

Final Thoughts

The evidence for working memory being restricted is convincing. I have to say that my reading has me less convinced that working memory encompasses “multidimensional representations”, because all the evidence that I’ve seen has been for the retention of words, shapes and numbers.

I am more drawn to Cowan’s and in particular Oberauer’s model and like the idea that the differences these memory span tests show up are differences in focus rather than memory architecture. I don’t think I would be misrepresenting Conway (lead author on a couple of papers referenced here and a co-author with Cowan) in saying that he regards an ability to focus on what is important, in the face of requests to your internal executive for cognitive resources to be diverted elsewhere, is far more likely to correlate with all the intellectual skills that WMC has been shown to correlate with, than having one more space to remember things in.

Cowan’s model and Baddeley’s revised model downplay the structural differences in different forms of short term memory linked to different senses, in favour of a more generalised but limited tool that has direct links to Long Term Memory.