As a mathematician, statistics are important to me. The right calculation can give you exactly the information you need, right when you need it. In general, statistics are useful tools when tracking progress, so you can see how far you've come how quickly, and how your results compare against others. That's why it pisses me off when I see them being misused on purpose.

This is your only warning, this article has a bunch of math in it, so if that's not your cup of tea, you'd better go back to your porn. I won't judge you, but all your friends and family will. Clear your browser history.

I'm sure you've figured out by now that I try to give out as many references as possible when I write these, and this article's going to be no different. Well, it'll be a little different. I'm not so much going to cite sources as I'm going to point to small abuses of statistics so you can join me in mentally pissing on their graves. Why small ones instead of relevant ones like global warming or how many rapes go unreported? Because I only want bite-sized examples here. Gotta make the information accessible to those in my readership who are functionally retarded. The funny part is they don't know who they are.

A quick game of pool

The first is a game of pool. The game calculates your shot percentage as the number of shots with which you sink at least 1 ball / the total number of shots you take. It doesn't matter how many balls you sink, and it doesn't matter if your miss was a scratch, all that matters is shots with balls sunk / total shots. That means if I sink 2 balls on 5 consecutive shots and then miss, and you sink 6 balls one at a time and then miss, I'll have more pocketed balls, but you'll have a higher shot percentage. What the fuck?

In a game of straight pool, by definition the person who wins must have a higher shot percentage than the person who loses, because they fucking won. If I can win the game with a lower shot percentage, then your calculation is fucking wrong.

I contacted the author of the game and told him about this, and that I had a better formula worked out, and he politely told me he's not changing it because it doesn't affect who wins. That's not the fucking point. The point is to give an accurate indication of how well you're doing, and it doesn't do that, therefore it needs to be changed. Here's the formula I came up with:

(pots - (fouls/2)) / (pots + misses + fouls)

That takes into account how many balls are sunk, whether there are fouls, and can be calculated from existing information. So in the same scenarios as above, my shot% would be 91% and yours would be 86%. And if your miss was a foul, yours would be 79%, because fouls should fucking matter.

Pool Stats

Here's the coup de gras for the formula the game uses: at any time you can see your shot stats, which give you your total balls potted, shots taken, and fouls. But since "shots on which a ball was potted" isn't tracked, there's no way to get from the stats presented to the shot% it says you have. My current stats are to the right, and I defy you to find a logical way to get a 67% accuracy out of them.

The saving grace of the game is that if you ignore the stats, it's still a rather good game that hasn't been replicated elsewhere as far as I know. The physics are basically pretty solid, and you can't play it anywhere else. Not so with my second example, which is one of the world's most popular logic games, Sudoku.

Here's the offensive party. On the surface it's just like any other Sudoku game; fill in the numbers, get to the end, win a mental cookie. However, if you solve it without making any mistakes along the way, you can compare your time with other people's times to see how you hold up against them. So if you're a logic champion and you want to push yourself against other champions, you can just do a bunch of Evil puzzles and see how you've done against other people who've done the same puzzles, right? Wrong.

Instead of actually comparing you against anyone else, all they've done is manufacture fake graphs with arbitrary data points. Don't believe me? Linked on the left are direct links to their graphs for each difficulty. That's right, there's one graph per difficulty level. You may also notice they placed the top of their curve for Evil puzzles at under 10 minutes (each line is 7.5: 12 lines for 90 minutes); I've done a fair number of their Evil puzzles, and I'm really good at Sudoku, and my average time is well over 20 minutes. So even if they were based on real data, it's not going to give you any idea at all of how you did on the puzzle you just did, because it would have data for all other puzzles mixed in too.


They have a feedback area, so I sent them this before I noticed the graphs were in fact different but faked:

A Sudoku graph
It saddens me that your curve for solving speed doesn't take difficulty of puzzle into account. I would very much prefer if there were 4 different curves for the 4 different difficulty levels, or even better if it were tracked on a per-puzzle basis. When I finish an Evil puzzle in under 10 minutes, there's no way that's slower than 30% of all other people solving an Evil puzzle, and it's highly unlikely that it's slower than 30% of all other people solving that specific puzzle.

I am including my email address with this feedback, and unless I receive a response saying one of my suggestions has been implemented for casual play (i.e. not having to register an account), I shall not return to your site.

And their reply:

Thank you for your feedback.

Your suggestion will be taken into account when the Web Sudoku web site is next updated.

Best regards,

Ilana Gilbert
Web Sudoku

I have a hard time believing that's not a form response they send everyone. However, since it's just the kind of guy I am, I'm going to give them the benefit of the doubt that they actually care. And to make sure they care, I want each of you to send them feedback telling them if they're going to have statistics on their site, they'd better fucking make sure it's accurate.

