Open
Close

Standard deviation calculation. Geometric simple

Wise mathematicians and statisticians came up with a more reliable indicator, although for a slightly different purpose - average linear deviation. This indicator characterizes the measure of dispersion of the values ​​of a data set around their average value.

In order to show the measure of data scatter, you must first decide against what this scatter will be calculated - usually this is the average value. Next, you need to calculate how far the values ​​of the analyzed data set are from the average. It is clear that each value corresponds to a certain deviation value, but we are interested in the overall assessment, covering the entire population. Therefore, the average deviation is calculated using the usual arithmetic mean formula. But! But in order to calculate the average of the deviations, they must first be added. And if we add positive and negative numbers, they will cancel each other out and their sum will tend to zero. To avoid this, all deviations are taken modulo, that is, all negative numbers become positive. Now the average deviation will show a generalized measure of the spread of values. As a result, the average linear deviation will be calculated using the formula:

a– average linear deviation,

x– the analyzed indicator, with a dash above – the average value of the indicator,

n– number of values ​​in the analyzed data set,

I hope the summation operator doesn't scare anyone.

The average linear deviation calculated using the specified formula reflects the average absolute deviation from average size for this aggregate.

In the picture, the red line is the average value. The deviations of each observation from the mean are indicated by small arrows. They are taken modulo and summed up. Then everything is divided by the number of values.

To complete the picture, we need to give an example. Let's say there is a company that produces cuttings for shovels. Each cutting should be 1.5 meters long, but, more importantly, they should all be the same or at least plus or minus 5 cm. However, careless workers will cut off 1.2 m or 1.8 m. Summer residents are unhappy . The director of the company decided to conduct a statistical analysis of the length of the cuttings. I selected 10 pieces and measured their length, found the average and calculated the average linear deviation. The average turned out to be just what was needed - 1.5 m. But the average linear deviation was 0.16 m. So it turns out that each cutting is longer or shorter than needed on average by 16 cm. There is something to talk about with the workers . In fact, I have not seen any real use of this indicator, so I came up with an example myself. However, there is such an indicator in statistics.

Dispersion

Like the average linear deviation, variance also reflects the extent of the spread of data around the mean value.

The formula for calculating variance looks like this:

(for variation series (weighted variance))

(for ungrouped data (simple variance))

Where: σ 2 – dispersion, Xi– we analyze the sq indicator (sign value), – the average value of the indicator, f i – the number of values ​​in the analyzed data set.

Dispersion is the average square of deviations.

First, the average value is calculated, then the difference between each original and average value is taken, squared, multiplied by the frequency of the corresponding attribute value, added and then divided by the number of values ​​in the population.

However, in its pure form, such as the arithmetic mean, or index, dispersion is not used. It is rather an auxiliary and intermediate indicator that is used for other types of statistical analysis.

A simplified way to calculate variance

Standard deviation

To use the variance for data analysis, the square root of the variance is taken. It turns out the so-called standard deviation.

By the way, standard deviation is also called sigma - from the Greek letter that denotes it.

The standard deviation, obviously, also characterizes the measure of data dispersion, but now (unlike variance) it can be compared with the original data. As a rule, root mean square measures in statistics give more accurate results than linear ones. Therefore, the standard deviation is a more accurate measure of the dispersion of the data than the linear mean deviation.

In this article I will talk about how to find standard deviation. This material is extremely important for a full understanding of mathematics, so a math tutor should devote a separate lesson or even several to studying it. In this article you will find a link to a detailed and understandable video tutorial that explains what standard deviation is and how to find it.

Standard deviation makes it possible to evaluate the spread of values ​​obtained as a result of measuring a certain parameter. Indicated by the symbol (Greek letter "sigma").

The formula for calculation is quite simple. To find the standard deviation, you need to take square root from dispersion. So now you have to ask, “What is variance?”

What is variance

The definition of variance goes like this. Dispersion is the arithmetic mean of the squared deviations of values ​​from the mean.

To find the variance, perform the following calculations sequentially:

  • Determine the average (simple arithmetic average of a series of values).
  • Then subtract the average from each value and square the resulting difference (you get squared difference).
  • The next step is to calculate the arithmetic mean of the resulting squared differences (You can find out why exactly the squares below).

Let's look at an example. Let's say you and your friends decide to measure the height of your dogs (in millimeters). As a result of the measurements, you received the following height measurements (at the withers): 600 mm, 470 mm, 170 mm, 430 mm and 300 mm.

Let's calculate the mean, variance and standard deviation.

First let's find the average value. As you already know, to do this you need to add up all the measured values ​​and divide by the number of measurements. Calculation progress:

Average mm.

So, the average (arithmetic mean) is 394 mm.

Now we need to determine deviation of the height of each dog from the average:

Finally, to calculate variance, we square each of the resulting differences, and then find the arithmetic mean of the results obtained:

Dispersion mm 2 .

Thus, the dispersion is 21704 mm 2.

How to find standard deviation

So how can we now calculate the standard deviation, knowing the variance? As we remember, take the square root of it. That is, the standard deviation is equal to:

Mm (rounded to the nearest whole number in mm).

Using this method, we found that some dogs (for example, Rottweilers) are very big dogs. But there are also very small dogs (for example, dachshunds, but you shouldn’t tell them that).

The most interesting thing is that the standard deviation carries with it useful information. Now we can show which of the obtained height measurement results are within the interval that we get if we plot the standard deviation from the average (to both sides of it).

That is, using the standard deviation, we obtain a “standard” method that allows us to find out which of the values ​​is normal (statistically average), and which is extraordinarily large or, conversely, small.

What is standard deviation

But... everything will be a little different if we analyze sample data. In our example we considered general population. That is, our 5 dogs were the only dogs in the world that interested us.

But if the data is a sample (values ​​selected from a large population), then the calculations need to be done differently.

If there are values, then:

All other calculations are carried out similarly, including the determination of the average.

For example, if our five dogs are just a sample of the population of dogs (all dogs on the planet), we must divide by 4, not 5, namely:

Sample variance = mm 2.

In this case, the standard deviation for the sample is equal to mm (rounded to the nearest whole number).

We can say that we have made some “correction” in the case where our values ​​are just a small sample.

Note. Why exactly squared differences?

But why do we take exactly the squared differences when calculating the variance? Let's say when measuring some parameter, you received the following set of values: 4; 4; -4; -4. If we simply add the absolute deviations from the average (differences) together... the negative values ​​cancel out with the positive ones:

.

It turns out that this option is useless. Then maybe it’s worth trying the absolute values ​​of the deviations (that is, the modules of these values)?

At first glance, it turns out well (the resulting value, by the way, is called the mean absolute deviation), but not in all cases. Let's try another example. Let the measurement result in the following set of values: 7; 1; -6; -2. Then the average absolute deviation is:

Wow! Again we got a result of 4, although the differences have a much larger spread.

Now let's see what happens if we square the differences (and then take the square root of their sum).

For the first example it will be:

.

For the second example it will be:

Now it’s a completely different matter! The greater the spread of the differences, the greater the standard deviation... which is what we were aiming for.

In fact, in this method The same idea is used as when calculating the distance between points, only applied in a different way.

And from a mathematical point of view, using squares and square roots gives more benefit than we could get from the absolute values ​​of the deviations, making the standard deviation applicable to other mathematical problems.

Sergey Valerievich told you how to find the standard deviation

Expectation and variance

Let us measure a random variable N times, for example, we measure the wind speed ten times and want to find the average value. How is the average value related to the distribution function?

We'll throw dice large number once. The number of points that will appear on the dice with each throw is a random variable and can take any natural value from 1 to 6. The arithmetic average of the dropped points calculated for all dice throws is also a random variable, but for large N it tends to a very specific number - mathematical expectation M x. In this case M x = 3,5.

How did you get this value? Let in N tests, once you get 1 point, once you get 2 points, and so on. Then When N→ ∞ number of outcomes in which one point was rolled, Similarly, Hence

Model 4.5. Dice

Let us now assume that we know the distribution law of the random variable x, that is, we know that the random variable x can take values x 1 , x 2 , ..., x k with probabilities p 1 , p 2 , ..., p k.

Expectation M x random variable x equals:

Answer. 2,8.

The mathematical expectation is not always a reasonable estimate of some random variable. Thus, to estimate the average salary, it is more reasonable to use the concept of median, that is, such a value that the number of people receiving a salary lower than the median and a greater one coincide.

Median a random variable is a number x 1/2 is such that p (x < x 1/2) = 1/2.

In other words, the probability p 1 that the random variable x will be smaller x 1/2, and probability p 2 that the random variable x will be greater x 1/2 are identical and equal to 1/2. The median is not uniquely determined for all distributions.

Let's return to the random variable x, which can take values x 1 , x 2 , ..., x k with probabilities p 1 , p 2 , ..., p k.

Variance random variable x The average value of the squared deviation of a random variable from its mathematical expectation is called:

Example 2

Under the conditions of the previous example, calculate the variance and standard deviation of the random variable x.

Answer. 0,16, 0,4.

Model 4.6. Shooting at a target

Example 3

Find the probability distribution of the number of points obtained on the first roll of the dice, the median, the mathematical expectation, the variance and the standard deviation.

Any edge is equally likely to fall out, so the distribution will look like this:

Standard deviation It can be seen that the deviation of the value from the average value is very large.

Properties of mathematical expectation:

  • The mathematical expectation of the sum of independent random variables is equal to the sum of their mathematical expectations:

Example 4

Find the mathematical expectation of the sum and product of points rolled on two dice.

In example 3 we found that for one cube M (x) = 3.5. So for two cubes

Dispersion properties:

  • The variance of the sum of independent random variables is equal to the sum of the variances:

D x + y = D x + Dy.

Let for N rolls on the dice rolled y points. Then

This result is true not only for dice rolls. In many cases, it determines the accuracy of measuring the mathematical expectation empirically. It can be seen that with increasing number of measurements N the spread of values ​​around the average, that is, the standard deviation, decreases proportionally

The variance of a random variable is related to the mathematical expectation of the square of this random variable by the following relation:

Let's find the mathematical expectations of both sides of this equality. By definition,

The mathematical expectation of the right side of the equality, according to the property of mathematical expectations, is equal to

Standard deviation

Standard deviation equal to the square root of the variance:
When determining the standard deviation for a sufficiently large volume of the population being studied (n > 30), the following formulas are used:

Related information.


The square root of the variance is called the standard deviation from the mean, which is calculated as follows:

An elementary algebraic transformation of the standard deviation formula leads it to the following form:

This formula often turns out to be more convenient in calculation practice.

The standard deviation, just like the average linear deviation, shows how much on average specific values ​​of a characteristic deviate from their average value. The standard deviation is always greater than the mean linear deviation. There is the following relationship between them:

Knowing this ratio, you can use the known indicators to determine the unknown, for example, but (I calculate a and vice versa. The standard deviation measures the absolute size of the variability of a characteristic and is expressed in the same units of measurement as the values ​​of the characteristic (rubles, tons, years, etc.). It is an absolute measure of variation.

For alternative signs, for example presence or absence higher education, insurance, dispersion and standard deviation formulas are as follows:

Let us show the calculation of the standard deviation according to the data of a discrete series characterizing the distribution of students in one of the university faculties by age (Table 6.2).

Table 6.2.

The results of auxiliary calculations are given in columns 2-5 of table. 6.2.

The average age of a student, years, is determined by the weighted arithmetic mean formula (column 2):

The squared deviations of the student’s individual age from the average are contained in columns 3-4, and the products of the squared deviations and the corresponding frequencies are contained in column 5.

We find the variance of students’ age, years, using formula (6.2):

Then o = l/3.43 1.85 *oda, i.e. Each specific value of a student’s age deviates from the average by 1.85 years.

Coefficient of variation

In its absolute value, the standard deviation depends not only on the degree of variation of the characteristic, but also on the absolute levels of options and the average. Therefore, compare the average standard deviations Variation series with different average levels are directly impossible. To be able to make such a comparison, you need to find the share of the average deviation (linear or quadratic) in the arithmetic average, expressed as a percentage, i.e. calculate relative measures of variation.

Linear coefficient of variation calculated by the formula

Coefficient of variation determined by the following formula:

In coefficients of variation, not only the incomparability associated with different units of measurement of the characteristic being studied is eliminated, but also the incomparability that arises due to differences in the value of arithmetic means. In addition, variation indicators characterize the homogeneity of the population. The population is considered homogeneous if the coefficient of variation does not exceed 33%.

According to the table. 6.2 and the calculation results obtained above, we determine the coefficient of variation, %, according to formula (6.3):

If the coefficient of variation exceeds 33%, then this indicates the heterogeneity of the population being studied. The value obtained in our case indicates that the population of students by age is homogeneous in composition. Thus, important function generalizing indicators of variation - assessment of the reliability of averages. The less c1, a2 and V, the more homogeneous the resulting set of phenomena and the more reliable the resulting average. According to the “three sigma rule” considered by mathematical statistics, in normally distributed or close to them series, deviations from the arithmetic mean not exceeding ±3st occur in 997 cases out of 1000. Thus, knowing X and a, you can get a general initial idea of ​​the variation series. If, for example, the average salary of an employee in a company is 25,000 rubles, and a is equal to 100 rubles, then with a probability close to certainty, we can say that the wages of the company’s employees fluctuate within the range (25,000 ± ± 3 x 100 ) i.e. from 24,700 to 25,300 rubles.

Standard Deviation is a classic indicator of variability from descriptive statistics.

Standard Deviation, standard deviation, standard deviation, sample standard deviation (eng. standard deviation, STD, STDev) - a very common indicator of dispersion in descriptive statistics. But, because technical analysis is akin to statistics; this indicator can (and should) be used in technical analysis to detect the degree of dispersion of the price of the analyzed instrument over time. Denoted by the Greek symbol Sigma "σ".

Thanks to Karl Gauss and Pearson for allowing us to use standard deviation.

Using standard deviation in technical analysis, we turn this "dispersion index"" V "volatility indicator“, maintaining the meaning, but changing the terms.

What is standard deviation

But besides the intermediate auxiliary calculations, standard deviation is quite acceptable for independent calculation and applications in technical analysis. As an active reader of our magazine burdock noted, “ I still don’t understand why the standard deviation is not included in the set of standard indicators of domestic dealing centers«.

Really, standard deviation can measure the variability of an instrument in a classic and “pure” way. But unfortunately, this indicator is not so common in securities analysis.

Applying standard deviation

Manually calculating the standard deviation is not very interesting, but useful for experience. Standard deviation can be expressed formula STD=√[(∑(x-x ) 2)/n] , which sounds like the root of the sum of the squares of the differences between the elements of the sample and the mean, divided by the number of elements in the sample.

If the number of elements in the sample exceeds 30, then the denominator of the fraction under the root takes the value n-1. Otherwise n is used.

Step by step standard deviation calculation:

  1. calculate the arithmetic mean of the data sample
  2. subtract this average from each sample element
  3. we square all the resulting differences
  4. sum up all the resulting squares
  5. divide the resulting amount by the number of elements in the sample (or by n-1, if n>30)
  6. calculate the square root of the resulting quotient (called dispersion)