AfL, Feedback and the low-stakes testing effect

 

Assessment for Learning is never far from a UK teacher’s mind. We all know of the “purple pen of pain” madness that SLTs can impose in the hope that their feedback method will realise AfL’s potential and, most importantly, satisfy the inspectors.

I’ve read a couple of posts recently wondering why AfL has failed to deliver on the big improvements that its authors hoped for. AfL’s authors themselves said:

if the substantial rewards promised by the evidence are to be secured, each teacher must find his or her own ways of incorporating the lessons and ideas that are set out above into his or her own patterns of classroom work. Even with optimum training and support, such a process will take time.

but it has now been 20 years since Black and Wiliam wrote those words in “Inside the Black Box” and formative feedback has been a big push in UK schools ever since.

The usual assumption is that we have still not got formative assessment correct. An example of this can be found in the recent work of eminent science teachers who have blogged about how formative assessment is not straightforward in science. Adam Boxer started the series with an assessment of AfL’s failure to deliver, and his has been followed by thoughtful pieces about assessment and planning in science teaching.

Thinking about AfL brought me back to the concerns I expressed here about the supposedly huge benefits from feedback and made me realise that I’ve never looked at any of the underlying evidence for feedback’s efficacy.

The obvious starting place was Black and Wiliam’s research review which had an entire issue of “Assessment in Education” devoted to it:

 
Black and Wiliam start out by stating that they have drawn evidence from four previous reviews, one by Natriello (which I can only find access to using my College of Teaching login), one by Crooks and two by Bangert-Downs and the Kuliks, one of which I found, the other I haven’t found access to, but its findings (“generally weak effects of feedback on achievement”) are discussed by Shute in her 2007 review of formative feedback.
 
What struck me very clearly when looking through Black and Wiliam’s paper, and through the major reviews that they referenced, is that the positive effect sizes frequently came from studies where, in order to give feedback, more frequent low stakes tests were given than was the case for the control groups.
 

Black had moved from a Physics background into education research, and had a specific interest in designing courses that had formative processes built into their assessment scheme. Courses and ideas which were wiped out by the evolution of GCSEs in the 1990s. As they put it “as part of this effort to re-assert the importance of formative assessment” Black and Wiliam were commissioned to conduct a review of the research on formative assessment, and they used their experience of working with teachers to write “Inside the Black Box” for a wider audience.

I have to wonder, would someone else looking at the same research, without “formative assessment” as their commissioned topic arrive at the same conclusions?

Would some one else instead conclude that “frequent low-stakes testing is very effective” was the important finding of the research on feedback literature? Certainly testing frequency’s importance is clear to the authors who Black and Wiliam cite. In fact one of the B-D and Kuliks papers is entitled  “Effects of Frequent Classroom Testing” and contains this graph:

Which is a regression fit of the effect sizes that they found for different test frequencies.

The low-stakes testing effect was pretty well established then, it is very well established now, for example just the contents of “Ten Benefits of Testing and Their Applications to Educational Practice” makes the benefits pretty clear:

 

Where would we be today if Black and Wiliam had promoted low-stakes testing twenty years ago rather than formative assessment? Quite possibly nothing would have changed, they themselves profess to be puzzled as to why they had such a big impact, maybe formative assessment was just in tune with the zeitgeist of the time and if it were not Black and Wiliam it would have been someone else. But just possibly, if my interpretation is correct – without the testing effect the evidence for feedback is pretty weak -, we might be further on than we are right now.

 

 

Testing the Wave Equation in a Gratnells Tray

Ripple tank experiments are not really class practicals, they are usually demos, but we do a block of work on Waves in Y8 and again in Y10 and so I wanted some practical work to go with it. The fact that the GCSE has a required waves practical that is really only a demo added impetus to my thinking.

I started with the classic AQA A level ISA experiment (PHY-3T-Q09) where multiple crossings of a gratnells tray are timed and the waves’ speed is calculated, and tried to build something from there.

By adding clockwork dippers made from chattering teeth toys to make a wave train, we got a set of practicals that work quite well. However, the match between the measured wave speed and calculated wave speed from the wave equation is far from perfect. I think there are probably two reasons; the dipper frequency varies quite a lot (we could extend things by getting students to measure their own dipper frequency instead of demoing the measurement of one, as has been our practice so far), and I have a suspicion that the single waves do actually run a bit faster than the wave trains made by the dipper.

The worksheets (follow the link below) are for four or five lessons and are deliberately tough. We leave higher groups pretty much to their own devices with them and give more help to lower groups. We’ve tried it with several groups now and think it has some value.

Waves Practicals

I’ve left in the notes for the two speed of sound experiments, which we only do, a bit later on in the course, if we have time.

If more of us take this up then I am hoping someone will have a good idea for making a more reliable dipper, or even get someone to manufacture one.