Percentages are rubbish (sample datasets are cool)

We use percentages all the time in planning. It’s in the DNA – percentage compliance with a national indicator is what makes many planners get up in the morning. But they are useless at understanding what is going on.

Take this output from a report I was looking at today, pretending I was from Salford. It’s section 2a – a breakdown of approval rates by category of development.


As you’d expect, most of the approval rates are 90%+, so perhaps you should investigate what’s happening with prior approvals and certificates of lawfulness ? But hold on. A percentage┬átells you that there is a difference, but gives you no idea of the *impact* of that difference.

Enter the sample dataset. We have always offered a sample report which trims each council’s data so that everyone has the same count of applications. Why not extend this idea a little so that everyone has, say, exactly 1000 applications picked at random from their dataset.

Then, rather than calculating a percentage we can count the actual number of applications that are refused. Like this :


We don’t need to do any mental jiggery pokery. For each 1000 applications we refuse 10 conditions (this is more than our peers). Majors are important and while they are small in number again I’d want to understand why.

So – percentages told me to look at the wrong thing. Sample datasets helped me understand what is important. Sample datasets kick percentages ass. I’m working this idea through the report now in the test area – let me know if you want a sneak peek and you can tell me how to make it all better still.