As someone who spends a lot of time dealing with maths (the joys of data vis development!), I spend a lot of time entrenched in statistics. Whilst that’s great fun, or so I like to think, I’m always aware that there’s a lot of people out there who were never really taught the why behind a lot of maths, and so I’ve decided to write a short series of introductory posts on statistics and what part they have to play in the life of a modern marketer. However, first, a disclaimer

### Disclaimer

Once you actually understand statistics, you’ll become deeply annoyed by the vast bulk of really bad statistical reporting out there. I make no apologies for this.

### The Outline

In this third post, we’re going to look at skewness and kurtosis, which are the start of the more complex numbers we’ll be dealing with in this little series. As always, don’t worry if you’ve not come across these terms before; we’ll break down each one to look at what they do, and why they matter.

### Skewness and Kurtosis

I personally have a bit of an issue with two of the classic numbers for describing dataset shape. Skewness and kurtosis both suffer from two problems:

- You need a huge amount of data for the values to become accurate
- The describe and are influenced by extreme values more than by the bulk of the data

As a result, I tend to prefer using the other available values for discussion the location and dispersion values computed from a set of data, but for the sake of completeness, I’m going to cover all four moments, including skewness and kurtosis.

### Skewness: The Wonky Factor

The first of our two numbers, skewness, does pretty much what it says on the tin. With skewness we can tell what the shape of a graph of data looks like, which is to say, if plotted, would the graph be skewed to the right or left.

As a result, skewness can be positive or negative (because your data could be skewed towards either the low or high end, or just not skewed at all). To know which it is, if you imagine you bisected your data down the modal value, you’d have two halves, one on the left, one on the right. If the left tail is longer (so the mode is on the right hand side of the graph), then that’s a negative skew, whilst a long right hand tail, or mode on the left side of the graph is a positive skew.

On the other hand, if a dataset is equally long on both the left and right, then it has no skewness. This can mean it’s what’s known as a *normal distribution*. However, to tell if that’s the case, you have to know what the normal distribution of your data would look like. So let’s just take a moment to define that first…

#### The Normal Distribution

A normal distribution has a few major characteristics. Firstly, as we’ve just said, it must be symmetrical on both sides. Then, there’s three rules that will describe the shape of your data:

- About 68% of the area under the curve of your data, when graphed, falls within 1 standard deviation of the mean.
- About 95% of the area under the curve falls within 2 standard deviations of the mean.
- About 99.7% of the area under the curve falls within 3 standard deviations of the mean.

This is a bit hard to visualise, so lets throw in some charts to make it live a little. A normal distribution could look something like this:

The important thing to be aware of here is that it’s not the width of the curve of the data that’s important, as it could squashed together or stretched out much more than it is here. What matters is that it’s symmetrical and conforms to the rules we established earlier.

Now let’s look at some examples of positive, neutral and negative skewness!

#### Examples

For some things that are positively skewed, you can think of:

- Income distribution (the modal wage is vastly closer to 0 than the highest value)
- Number of children in highly developed nations (most families have only one or two children, with very few having five or six or more)

…and here’s some typical negative skews:

- Retirement age (most people retire when they’re older than younger)
- Global temperature trends (the Earth’s mean surface temperature is warmer now than it used to be

To interpret the result, you can use these rules of thumb (shamelessly lifted from M.G. Bulmer’s *Principles of Statistics*):

- If skewness is < -1 or > +1, your data’s distribution is
*highly*skewed. - If skewness is -1 <> -0.5 or +0.5 <> +1, then it is
*moderately*skewed. - If skewness is -0.5 <> +0.5, then it’s approximately symmetric.

With that out the way, on to our second value…

### Kurtosis: How Extreme is Extreme

If skewness tells us how wonky our data is, our second value, kurtosis, tells us how concentrated the data is into a single area. It shows us how clustered the data is to the modal value.

Therefore again, we can have positive or negative kurtosis (because our data could chart a graph where the modal value was barely higher than the tails, or it could be that it’s almost completely flat, except for an incredibly tall set of data around the mode).

What we refer to as kurtosis though is more accurately called *excess kurtosis*. This is because we tend to subtract 3 from the ordinary kurtosis value, so that the kurtosis of a normal distribution is 0. This then gives a nice central reference point, and brings the value of kurtosis more in line with skewness. Data that has a positive value is then known as *platykurtic*, while datasets yielding negative excess kurtosis are known as *leptokurtic*.

Anyway, enough theory, let’s look at a positive and negative example, and see what they look like on a chart.

As we said earlier, these values are heavily biased by extreme values, so our positive kurtosis example is positive because of the extreme peak in the centre of the data, whereas our negative set has a low value because there’s no real cluster or spike of values anywhere.

Again, for a quick rule of thumb, when dealing with excess kurtosis, if the value is positive, it’s likely to have extreme values that are a long way from the mean, negative implies a fairly uniform dataset with no outliers, and a value close to zero shows a normal distribution.

### A Quick Note on Calculations

I’m not going to go in to how to calculate these values. If you want to learn how, there’s plenty of tutorials around, and they aren’t hard to do, but for most people I’ll be happy enough if they simply know these exist, and that you can get them from Excel with SKEW and KURT.

Are the examples the wrong way around?

Either I’ve just become massively confused or things like retirement ages are positively skewed rather than negative?

They skew towards the old, which assuming you’re charting from young to old, would put the bulk of people on the right hand side of that graph, which is a negative skew.

Indeed. I think the skew example labels just need to be flipped: negative or “left” skew should show a longer tail on the left, which is the green line. This would always confuse me:-)