Day 5

A Little Speech Before We Begin

Before we dive into more math, let’s take a moment to talk about motivation. Over the fast few days, we started our journey with some basic stats—nothing too overwhelming, right? But today, you might be wondering, Why should I keep going? Why do I need all this math in the first place?

The short answer: because it’s the foundation. You’re not just learning random numbers and formulas; you’re gaining the keys to understanding how AI works. Think of it like building a house. Sure, it might feel tedious to lay down the foundation, but without it, your house would crumble.

And let’s be honest, it’s easy to get discouraged. I mean, who actually enjoys staring at graphs and equations all day? It’s like trying to read a novel written in an alien language. But here’s the thing: once you start getting the hang of it, the picture begins to clear up. That’s when you realize—this stuff isn’t so scary after all. It’s actually kind of fun. And who doesn’t like feeling like a wizard who can decode the world?

So don’t sweat the small stuff. We’re in this together, and every step forward, no matter how tiny, brings you closer to that Aha! moment. Stay curious. Keep asking questions. The motivation will come. I promise.

Now that we’ve got the pep talk out of the way, let’s tackle some more stats and dive deeper into what makes AI tick. Ready to go? Let’s do this!

Beyond the Average: Understanding Variance and Standard Deviation

You probably already know what the mean is, right? It’s that classic average we’ve all encountered since middle school. But today, we’re going to shine a spotlight on the less glamorous cousins of the mean: variance and standard deviation. Don’t worry—once you understand them, they’ll be your best friends.

A Quick Refresh on Mean

Let’s briefly revisit the mean just to keep things fresh. If you have a set of numbers—say, 1, 2, 3, 4, and 5—the mean is the sum of those numbers divided by how many numbers you have. So:

Mean

=

\frac{1+2+3+4+5}{5}

\frac{1+2+3+4+5}{5}

Diving Deep into Variance

Imagine you’re playing a game where everyone has to throw a ball at a target. If everyone’s throws land close to the bullseye, the variance is small because the throws are clustered tightly around the target. But if the throws are scattered all over the place—some near the bullseye, some far away—the variance is large.

So, in simple terms, variance tells you how much the numbers in your dataset are spread out. A small variance means most numbers are close to the average (like the throws near the target), while a large variance means the numbers are more spread out and not as close to the average.

In this figure you can get an understand. As there is low variance, the lines are more compressed, while the spread is pretty large in high variance, i.e., most points are more far away from the mean.

Here’s how it’s calculated:

Find the mean of your data.
Subtract the mean from each data point.
Square the result of each subtraction.
Find the average of these squared differences

Why subtract from mean? Well, we want to find the deviation of the points from our mean, so we can get the spread.

Why squaring the terms? Well,

It ensures all differences are positive (example: 5 - mean = 5 - 10 = -5). We can't work with negative values as they'll cancel each other out. -5 + 5 = 0.
It gives more weight to larger deviations. This is because exponential functions, as the name suggest, grow very quickly.

As you can see in the figure, 2^x, grows pretty fast.

Now, consider this: when there are larger deviations from the mean, we want to penalize those differences more. Intuitively, values that are close to the mean and values that are far away shouldn’t be treated the same. By squaring the differences, we emphasize the larger deviations, making them stand out more than smaller ones. This way, the model can account for extreme values more effectively.

Why average these squared differences? The reason is to make the variance a more standardized measure. Let me explain:
Imagine you have two datasets. One dataset has 5 numbers, and another has 100 numbers. If you just added up the squared differences, the dataset with 100 numbers would automatically have a much larger sum, just because there are more numbers in it. This would make the variance unfairly large.

By averaging the squared differences, we normalize the result. It makes sure that the variance is based on the number of data points in the dataset, rather than just the total size. This way, you can compare datasets of different sizes on an equal footing.

Now let's calculate variance for our test scores:

Consider two datasets:

Dataset 1: 2, 4, 6, 8, 10
Dataset 2: 1, 4, 6, 8, 12

The means of both datasets are 6.

For Dataset 1, you’d subtract the mean (6) from each number and square the result:

$\frac{(2-6)^2 + (4-6)^2 + (6-6)^2 + (8-6)^2 + (10-6)^2}{5} = 8$

For Dataset 2, you'd do the same,

$\frac{(1-6)^2 + (4-6)^2 + (6-6)^2 + (8-6)^2 + (12-6)^2}{5} = 13.8$

Standard Deviation: The User-Friendly Cousin

If variance is a bit like the nerdy kid who talks in equations, the standard deviation is the friendly cousin who puts it all in simpler terms.

Standard deviation is just the square root of the variance. Why? Because the variance squared everything, so to get back to the original scale of the data, you take the square root. It’s like translating a book from math-speak back into regular language.

When we calculate variance, we square the differences from the mean. This squaring step is important because it helps emphasize larger deviations, but it also causes a problem: the units of the variance are no longer the same as the original data.

For example, if you’re working with data in meters, then the variance ends up in square meters. If your data is in seconds, the variance is in square seconds. This isn’t very intuitive—how do you interpret a measurement in square seconds? It’s hard to make sense of.

This is where standard deviation comes to the rescue. The standard deviation is simply the square root of the variance. By taking the square root, we return the units to their original form—so if our data is in meters, the standard deviation will also be in meters. This makes it much easier to interpret because the standard deviation tells you, on average, how far the data points are from the mean in the same units as your original data.

So, let’s say the variance of a dataset is 8 (like in our first example). The standard deviation would be:

$\sqrt{\smash[b]{8}} = 2.828..$

Now, instead of saying, The variance is 8, you can say, On average, the data points are about 2.83 units away from the mean. This makes it much easier to interpret.

CONGRATULATIONS!!!

You have just completed Day 5. Now do re-read the whole thing again. Until you can understand every concept. Take a pen and paper; and make notes. Revise. And remember, nothing is tough. You just need to have the hunger for knowledge.