Assessment for Learning is never far from a UK teacher’s mind. We all know of the “purple pen of pain” madness that SLTs can impose in the hope that their feedback method will realise AfL’s potential and, most importantly, satisfy the inspectors.

I’ve read a couple of posts recently wondering why AfL has failed to deliver on the big improvements that its authors hoped for. AfL’s authors themselves said:

if the substantial rewards promised by the evidence are to be secured, each teacher must find his or her own ways of incorporating the lessons and ideas that are set out above into his or her own patterns of classroom work. Even with optimum training and support, such a process will take time.

but it has now been 20 years since Black and Wiliam wrote those words in “Inside the Black Box” and formative feedback has been a big push in UK schools ever since.

The usual assumption is that we have still not got formative assessment correct. An example of this can be found in the recent work of eminent science teachers who have blogged about how formative assessment is not straightforward in science. Adam Boxer started the series with an assessment of AfL’s failure to deliver, and his has been followed by thoughtful pieces about assessment and planning in science teaching.

Thinking about AfL brought me back to the concerns I expressed here about the supposedly huge benefits from feedback and made me realise that I’ve never looked at any of the underlying evidence for feedback’s efficacy.

The obvious starting place was Black and Wiliam’s research review which had an entire issue of “Assessment in Education” devoted to it:

Black and Wiliam start out by stating that they have drawn evidence from four previous reviews, one by Natriello (which I can only find access to using my College of Teaching login), one by Crooks and two by Bangert-Downs and the Kuliks, one of which I found, the other I haven’t found access to, but its findings (“generally weak effects of feedback on achievement”) are discussed by Shute in her 2007 review of formative feedback.
What struck me very clearly when looking through Black and Wiliam’s paper, and through the major reviews that they referenced, is that the positive effect sizes frequently came from studies where, in order to give feedback, more frequent low stakes tests were given than was the case for the control groups.

Black had moved from a Physics background into education research, and had a specific interest in designing courses that had formative processes built into their assessment scheme. Courses and ideas which were wiped out by the evolution of GCSEs in the 1990s. As they put it “as part of this effort to re-assert the importance of formative assessment” Black and Wiliam were commissioned to conduct a review of the research on formative assessment, and they used their experience of working with teachers to write “Inside the Black Box” for a wider audience.

I have to wonder, would someone else looking at the same research, without “formative assessment” as their commissioned topic arrive at the same conclusions? Would they instead conclude that “frequent low-stakes testing is very effective” was the important finding of the research on feedback literature? Certainly testing frequency’s importance is clear to the authors who Black and Wiliam cite. In fact one of the B-D and Kuliks papers is entitled  “Effects of Frequent Classroom Testing” and contains this graph:

Which is a regression fit of the effect sizes that they found for different test frequencies.

The low-stakes testing effect was pretty well established then, it is very well established now, for example just the contents of “Ten Benefits of Testing and Their Applications to Educational Practice” makes the benefits pretty clear:


Where would we be today if Black and Wiliam had promoted low-stakes testing twenty years ago rather than formative assessment? Quite possibly nothing would have changed, they themselves profess to be puzzled as to why they had such a big impact, maybe formative assessment was just in tune with the zeitgeist of the time and if it were not Black and Wiliam it would have been someone else. But just possibly, if my interpretation is correct – without the testing effect the evidence for feedback is pretty weak -, we might be further on than we are right now.



Leave a Reply

Your email address will not be published. Required fields are marked *