Tag Archives: Working memory

You say Schemas, I say Schemata

Well actually I don’t, I say Schemas. I honestly didn’t realise, until I started researching them, that schemata was the plural of schema!

Anyway why are they significant? Well in my post about Working Memory I said Baddeley asserted that amongst the limited number of items being held in the Working Memory, and therefore available for manipulation and change,  could be complex, interelated ideas; schemas.

Allowing more complex memory constructs to be accessed is particularly important for Cognitive Load Theory, because if working memory is really limited to just four simple things, numbers or letters, then Working Memory cannot be central to the learning process in a lesson. What is going on in a lesson is a much richer thing than could captured in four numbers. The escape clause for this is to allow the items stored in, or accessed by, Working Memory to be Schemas or schemata.

A schema is then a “data structure for representing generic concepts stored in memory” and they “represent knowledge at all levels of abstraction” (Rumelhart: Schemata The Building Blocks of Cognition 1978).

How We Use Schemas

To give an example of what schema is and how it is used we can take an example from Anderson (when researching this I was pointed at Anderson by Sue Gerrard), he developed the schema idea in the context of using pre-exisiting schemas to decode text.

In 1984 Anderson and Pearson gave an example containing British ship building and industrial action, something only those of us from that vintage are likely to appreciate. The key elements that Anderson and Pearson think someone familiar with christening a ship would have built up into a schema are shown in the figure:

They then discusses how that schema would be used to decode a newspaper article:

‘Suppose you read in the newspaper that,

“Queen Elizabeth participated-in-a long- delayed ceremony
in Clydebank, Scotland yesterday. While there is bitterness here following the protracted strike, on this occasion a crowd of shipyard workers numbering in the hundreds joined dignitaries in cheering as the HMS Pinafore slipped into the water.”

It is the generally good fit of most of this information with the SHIP CHRISTENING schema that provides the confidence that (part of) the message has been comprehended. Particular, Queen Elizabeth fits the <celebrity> slot, the fact that Clydebank is a well known ship building port and that Shipyard workers are involved is consistent with the <dry dock> slot, the HMS Pinafore is obviously a ship and the information that it “slipped into the “water”,is consistent With the <just before launching> slot. Therefore, the ceremony mentioned is probably a ship christening. No mention is made of a bottle of champagne being broken on the ship’s bow, but this “default” inference is easily made.’

They expand by discussing how more ambiguous pieces of text with less clear cut fits to the “SHIP CHRISTENING” schema might be interpreted.

Returning to Rumelhart, we find the same view of schemas as active recognition devices “whose processing is aimed at the evaluation of their goodness of fit to the data being processed.”

Developing Schemas

In his review Rumelhart states that:

‘From a logical point of view there are three basically different modes of learning that are possible in a schema-based system.

  1. Having understood some text or perceived some event, we can retrieve stored information about the text or event. Such learning corresponds roughly to “fact learning” Rumelhart and Norman have called this learning accretion
  2. Existing schemata may evolve or change to make them more in tune with experience….this corresponds to elaboration or refinement of concepts. Rumelhart and Norman have called this sort of learning tuning
  3. Creation of new schemata…. There are, in a schema theory, at least two ways in which new concepts can be generated: they can be patterned on existing schemata, or they can in principle be induced from experience. Rumelhart and Norman call learning of new schemata restructuring

Rumelhart expands on accretion:

‘thus, as we experience the world we store, as a natural side effect of comprehension, traces that can serve as the basis of future recall. Later, during, retrieval we can use this information to reconstruct an interpretation of the original experience -thereby remembering the experience.’

And tuning:

‘This sort of schema modification amounts to concept generalisation – making a schema more generally applicable.’

He is not keen on “restructuring”, however, because Rumelart regards a mechanism to recognise that a new schema is necessary as something beyond schema theory, and regards the induction of a new schema as likely to be a rare event.

A nice illustration (below) of Rumelhart’s tuning – schemas adapting and becoming more general as overlapping episodes are encountered – comes from Ghosh and Gilboa’s paper which was shared by J-P Riodan. Ghosh and Gilboa’s work looks at how the psychological concept of schema as developed in the 70s and 80s might map onto modern neuroimaging studies, as well as giving a succinct historical overview.

The figures within each circle indicate episode strengths on the normal distributions that G & G postulate occur within the encompassing schema and line thicknesses indicate associative strengths.

If accretion, tuning and restructuring is starting to sound a bit Piagetian to you “assimilation and accomodation” for “tuning and restructuring” then you would be right. Smith’s Research Starter on Schema Theory (I’ve accessed Research Starters through my College of Teaching Membership) says that Piaget coined the term schema in 1926!

It would be a lovely irony if the idea that allowed Cognitive Load Theory to flourish were attributable to the father of constructivists. However, Derry (see below) states that there is little overlap in the citations of Cognitive Schema theorists, like Anderson and later Sweller, and the Piaget school of schema thought, so Sweller probably owes Piaget for little except the name.

Types of Schema

In her in her 1996 review of the topic Derry finds three kinds of Schema in the literature:

  • Memory Objects. The simplest memory objects are named p-prims by diSessa (and have been the subject of discussion amongst science teachers see for example E=mc2andallthat). However, above p-prims are “more integrated memory objects hypothesized by Kintsch and Greeno, Marshall and Sweller and Cooper, and others, these objects are schemas that permit people to recognise and classify patterns in the external world so they can respond with the appropriate mental of physical actions.” (Unfortunately I can’t find any access to the Sweller and Cooper piece.)
  • Mental Models. “Mental model construction involves mapping active memory objects onto components of the the real-world phenomenon, then reorganising and connecting those objects so that together they form a model of the whole situation. This reorganizing and connecting process is a form of problem solving. Once constructed, a mental model may be used as a basis for further reasoning and problem solving. If two or more people are required to communicate about a situation, they must each construct a similar mental model of it.” (Derry connects mental models to constructivist teaching and again references DiSessa – “Toward an Epistemology of Physics”, 1993)
  • Cognitive Fields. “…. a distributed pattern of memory activation that occurs in response to a particular event.” “The cognitive field activated in a learning situation .. determines which previously existing memory objects and object systems can be modified or updates by an instructional experience.”

Now it is not clear to me that these three are different things, they seem to represent only differences in emphasis or abstraction. CLT would surely be interested in Cognitive Fields, but Derry associates Sweller with Memory Objects. While Mental Models seems to be as much about the tuning of the schema as the schema itself.

DiSessa does, however, have a different point of view when it comes to science teaching. Where most science teachers would probably see misconceptions as problems built-in at the higher schema level, DiSessa sees us as possessing a stable number of p-prims whose organisation is at fault, and that by building on the p-prims themselves we can alter their organisation.

Evidence for Schemas

There is some evidence for the existence of schemas. Because interpretation of text is where schema theory really developed, there are experiments which test interpretation or recall of stories or text in a schema context. For example Bransford and Johnson 1972 conducted experiments that suggest that pre-triggering the corrrect Cognitive Field before listening to a passage can aid its recall and understanding.

Others have attempted to show that we arrive at answers to problems, not via logic but by triggering a schema. One approach to this is via Wason’s Selection Task which only 10% of people correctly answer (I get it wrong every time). For example Cheng et al found that providing a context that made sense to people led them to the correct answer, and argue that this is because the test subjects are thus able to invoke a related schema.  They term higher level schemas used for reasoning, but still rooted in their context “pragmatic reasoning schemas”.

I am not sure that any of the evidence is robust, but schemas/schemata continue to be accepted because the idea that knowledge is organised and linked at more and more abstract levels offers a lot of explanatory power in thinking about what is happening in one’s own head, and what is happening in learning more generally.

Schema theory also fits with the current reaction against the promotion of the teaching of generic skills. If we can only think successfully think about something if we have a schema for it, and schemas start off at least as an organisation of memory objects, then thought is ulitmately rooted in those memory objects, thought is context specific. I suspect that this may be what Cheng et al‘s work is confirming for us.


Working Memory

Working Memory

So if you haven’t come across the concept of working memory, you really haven’t been paying attention. Understanding Working Memory provides the panacea for all your teaching ills:

  • David Didau (@DavidDidau) has used the concept to discuss problems that might come from “Reading Along“.
  • Greg Ashman (@greg_ashman) champions its use because it underlies “Cognitive Load Theory”.
  • Baddeley’s interpretation of Working Memory provides a theoretical underpinning of Mayer’s “Dual Coding” – you learn more from the spoken word with pictures – which is all over the place at the moment.

It is a straightforward concept – we can only give attention to so much at any one time.

It seems clear to me that theories which aim to understand and allow for what could be a major bottle-neck in the cognitive process have real potential to inform teaching practice. In this post I discuss my interpretation of evidence for and the models of Working Memory. [Note that the evidence discussed here is from psychology. There is starting to be a lot more evidence about memory from neuro-imaging which I begin to discuss here.]

Any discussion of the application of WM to theories of teaching will have to wait for a later post.

Baddeley’s Original Model

Academic papers on Working Memory usually start by referencing Baddeley & Hitch’s 1974 paper; this proposed three elements to working memory:

  • The Visuo-spatial sketch pad (a memory register for objects and position)
  • The Phonological Loop (a memory register for spoken language)
  • and a Central Executive (control)

The working memory (WM) concept grew out of “dual memory” models, the idea that Short Term Memory (STM) and Long term Memory (LTM) were very different things.  Baddeley and Hitch were working on experiments to test the interface between the two. In an autobiographical review from 2012 Alan Baddeley describes how their three element model arose from experiments such as trying to break down students’ memory for spoken numbers with visual tasks:

In one study, participants performed a visually presented grammatical reasoning task while hearing and attempting to recall digit sequences of varying length. Response time increased linearly with concurrent digit load. However, the disruption was far from catastrophic: around 50% for the heaviest load, and perhaps more strikingly, the error rate remained constant at around 5%. Our results therefore suggested a clear involvement of whatever system underpins digit span, but not a crucial one. Performance slows systematically but does not break down. We found broadly similar results in studies investigating both verbal LTM and language comprehension, and on the basis of these, abandoned the assumption that WM comprised a single unitary store, proposing instead the three-component system shown in Figure 1 (Baddeley & Hitch 1974).

In the 1970s “cognitive science” had begun to influence “cognitive psychology”, and cognitive science is very much about using our ideas about computers to help understand the brain, and our ideas about the brain to advance computing (AI). Those of us old enough to remember the days when computing was severely limited by the capabilities of the hardware will recognise the Cog. Sci. ideas that informed this hypothesis. The limitations of the very small memory which held the bits of information your CPU was directly working on could slow down the whole process (even today, with the great speed of separate computer memory, processors still have “on-chip” working memory known as the “scratch pad”). Equally, a similar limitation will be familiar to anyone who makes a habit of doing long multiplication in their head; there’s only so much room for the numbers you are trying to remember so they can be added up later limiting your ability to perform the task. As such the limited Working Memory idea has instinctive appeal; the WM is limited, the LTM is not, but the LTM is not immediately accessible.

That there are two separate memory registers connected to different senses gives rise to all sorts of opportunities for designing learning with the two being used in parallel – dual coding. But does beg the question why stop there? A well cited 90s paper uses PET scans to identify three brain areas which might represent separate Working Memory caches for language, objects and position, leading them to propose a four way model just for visual information:

And in his 2012 review Baddeley himself proposes that something more complex might form the basis for future research:

Who knows, a model such as the one Baddeley speculates about above might even provide a theoretical basis for learning styles!

Evidence for a Limited Working Memory

Experimental studies into Working Memory generally take the form of exercises whereby a task such as correcting grammar or doing sums is used to suppress rehearsal. If you are doing something else you can’t be repeating what you need to remember over and over. Subjects are then tested on how many letters, words, numbers, shapes, spacings they can remember at the same time as doing this second task. The number varies a little with the task and with the model the researcher favours, but 4-7 is typical.

Experiments find working memory capacity to be consistent over time with the same subjects, and reasonably consistent when the mode of the test, language rather than maths, for example, is changed.

Scores in these tasks (dubbed Working Memory Capacity, WMC) have also been found to not only correlate with g a measure of “fluid intelligence”, but also correlate with a range of intellectual skills:

“Performance on WM span tasks correlates with a wide range of higher order cognitive tasks, such as reading and listening comprehension, language comprehension, following oral and spatial directions, vocabulary learning from context, note taking in class, writing, reasoning, hypothesis generation, bridge playing, and complex-task learning.” (in-text references removed for clarity)

The consistency of Working Memory Span Task outcomes and the breadth of skills that they correlate with strongly indicates that something important is being measured. That there is a limit to “working memory” seems hard to question. My reading has not led me to lots of evidence that there are two different and separate registers (Visuo and Phonological), but that may be a consequence of my paper selection being skewed by the firewalls that restrict access to research.

Other Models

A prolific author of papers on Working Memory and acknowledged by Baddeley, even though he puts a quite different emphasis on the conclusions that can be drawn from the tasks discussed above, is Nelson Cowan. Cowan views the WM Span tests as measuring a subject’s ability to maintain focused attention in the face of distractions. He models the Working Memory as portions of the Long Term Memory that have been activated by that focus upon them, but my reading of a little of his work is that he is less concerned with the mechanisms of memory than with “focus of attention” as a generalised capability:

“A great deal of recent research has converged on the importance of the control of attention in carrying out the standard type of WM task involving separate storage and processing components….. (1) These WM tasks correlate highly with aptitudes even when the domain of the processing task (e.g., arithmetic or spatial manipulation) does not match the domain of the aptitude test (e.g., reading). That is to be expected if the correlations are due to the involvement of processes of attention that cut across content domains. (2) An alternative account of the correlations based entirely on knowledge can be ruled out. Although acquired knowledge is extremely important for both WM tasks and aptitude tasks, correlations between WM tasks and aptitude tasks remain even when the role of knowledge is measured and controlled for. (3) On tasks involving memory retrieval, dividing attention impairs performance in individuals with high WM spans but has little effect on individuals with low WM spans” (in-text references removed for clarity)

 Cowan’s model is at the lower end for the number of items that can be stored in the working memory – four, but as Baddeley says “Importantly, however, this is four chunks or episodes, each of which may contain more than a single item“.

The nice thing about Cowan’s model is that it addresses the link to Long Term Memory, which as teachers is the thing that we hope to impress new ideas upon. Fledmann Barette et al who follow this “control of focus of attention” model for differences in WMC between individuals state:

WM is likely related to the ability to incorporate new or inconsistent information into a pre-existing representation of an object

i.e. WM is likely central to learning.

 What Cowan’s model does not include is any suggestion of different registers for different kinds of information. This may, however, be because his work is about focus of attention, not memory structure.

Related to Cowan’s model and also acknowledged by Baddeley in his 2012 review is Oberauer’s work which says:

“1. The activated part of long-term memory can serve, among other things, to memorize information over brief periods for later recall.

  1. The region of direct access holds a limited number of chunks available to be used in ongoing cognitive processes.
  2. The focus of attention holds at any time the one chunk that is actually selected as the object of the next cognitive operation.”

Which is nicely explained in his figure:

A 2014 paper, suggests current thinking is that a better fit for experimental data might be WM as a limited resource that can be spread thinly (with more recall errors) or be more tightly focussed, rather than as register with slots for somewhere between 4 and 7 things. I suspect that if this turned out to be the case it would not much bother Oberauer because it may be no more than the oval in his diagram being drawn tighter or more widely spread.

Baddeley’s Revised Model

The experimental evidence for a Working Memory Capacity discussed above caused Baddeley to revise his original model. Again from his 2012 review…

Such results were gratifying in demonstrating the practical significance of WM, but embarrassing for a model that had no potential for storage other than the limited capacities of the visuo-spatial and phonological subsystems. In response to these and related issues, I decided to add a fourth component, the episodic buffer

 “The characteristics of the new system are indicated by its name; it is episodic in that it is assumed to hold integrated episodes or chunks in a multidimensional code. In doing so, it acts as a buffer store, not only between the components of WM, but also linking WM to perception and LTM. It is able to do this because it can hold multidimensional representations, but like most buffer stores it has a limited capacity. On this point we agree with Cowan in assuming a capacity in the region of four chunks.

This change brings Baddeley’s model much closer to Cowan’s. Both have an unspecialised, four slot buffer or register, each slot (according to Baddeley, in both models) capable of holding more complex ideas than just a word or number, or as Baddeley puts it “multidimensional representations”. Baddeley regards this as a separate entity to the long term memory, and individual differences presumably follow from differences in the architecture of this entity, where Cowan sees it as an activated portion of the LTM and individual differences stem from differences in the ability to maintain focus on this activated area despite cognitive distractions.

Final Thoughts

The evidence for working memory being restricted is convincing. I have to say that my reading has me less convinced that working memory encompasses “multidimensional representations”, because all the evidence that I’ve seen has been for the retention of words, shapes and numbers.

I am more drawn to Cowan’s and in particular Oberauer’s model and like the idea that the differences these memory span tests show up are differences in focus rather than memory architecture. I don’t think I would be misrepresenting Conway (lead author on a couple of papers referenced here and a co-author with Cowan) in saying that he regards an ability to focus on what is important, in the face of requests to your internal executive for cognitive resources to be diverted elsewhere, is far more likely to correlate with all the intellectual skills that WMC has been shown to correlate with, than having one more space to remember things in.

Cowan’s model and Baddeley’s revised model downplay the structural differences in different forms of short term memory linked to different senses, in favour of a more generalised but limited tool that has direct links to Long Term Memory.