Open
Close

How to calculate standard deviation. Standard deviation

Lesson No. 4

Topic: “Descriptive statistics. Indicators of trait diversity in the aggregate"

The main criteria for the diversity of a characteristic in a statistical population are: limit, amplitude, standard deviation, coefficient of oscillation and coefficient of variation. In the previous lesson, it was discussed that average values ​​provide only a generalized characteristic of the characteristic being studied in the aggregate and do not take into account the values ​​of its individual variants: minimum and maximum values, above average, below average, etc.

Example. Average values ​​of two different number sequences: -100; -20; 100; 20 and 0.1; -0.2; 0.1 are absolutely identical and equalABOUT.However, the scatter ranges of these relative mean sequence data are very different.

The determination of the listed criteria for the diversity of a characteristic is primarily carried out taking into account its value in individual elements of the statistical population.

Indicators for measuring variation of a trait are absolute And relative. Absolute indicators of variation include: range of variation, limit, standard deviation, dispersion. The coefficient of variation and the coefficient of oscillation refer to relative measures of variation.

Limit (lim)– This is a criterion that is determined by the extreme values ​​of a variant in a variation series. In other words, this criterion is limited by the minimum and maximum values ​​of the attribute:

Amplitude (Am) or range of variation – This is the difference between the extreme options. The calculation of this criterion is carried out by subtracting its minimum value from the maximum value of the attribute, which allows us to estimate the degree of scatter of the option:

The disadvantage of limit and amplitude as criteria of variability is that they completely depend on the extreme values ​​of the characteristic in the variation series. In this case, fluctuations in attribute values ​​within a series are not taken into account.

The most complete description of the diversity of a trait in a statistical population is provided by standard deviation(sigma), which is a general measure of the deviation of an option from its average value. Standard deviation is often called standard deviation.

The standard deviation is based on a comparison of each option with the arithmetic mean of a given population. Since in the aggregate there will always be options both less and more than it, the sum of deviations with the sign "" will be canceled out by the sum of deviations with the sign "", i.e. the sum of all deviations is zero. In order to avoid the influence of the signs of the differences, deviations from the arithmetic mean squared are taken, i.e. . The sum of squared deviations does not equal zero. To obtain a coefficient that can measure variability, take the average of the sum of squares - this value is called variances:

In essence, dispersion is the average square of deviations of individual values ​​of a characteristic from its average value. Dispersion square of the standard deviation.

Variance is a dimensional quantity (named). So, if the variants of a number series are expressed in meters, then the variance gives square meters; if the options are expressed in kilograms, then the variance gives the square of this measure (kg 2), etc.

Standard deviationSquare root from dispersion:

, then when calculating the dispersion and standard deviation in the denominator of the fraction, instead ofmust be put.

The calculation of the standard deviation can be divided into six stages, which must be carried out in a certain sequence:

Application of standard deviation:

a) for judging the variability of variation series and comparative assessment of the typicality (representativeness) of arithmetic averages. This is necessary in differential diagnosis when determining the stability of traits.

b) to reconstruct the variation series, i.e. restoration of its frequency response based on three sigma rules. In the interval (М±3σ) 99.7% of all variants of the series are located in the interval (М±2σ) - 95.5% and in the range (М±1σ) - 68.3% row variant(Fig. 1).

c) to identify “pop-up” options

d) to determine the parameters of norm and pathology using sigma estimates

e) to calculate the coefficient of variation

f) to calculate the average error of the arithmetic mean.

To characterize any population that hasnormal distribution type , it is enough to know two parameters: the arithmetic mean and the standard deviation.

Figure 1. Three Sigma rule

Example.

In pediatrics, standard deviation is used to assess the physical development of children by comparing the data of a particular child with the corresponding standard indicators. The arithmetic average of the physical development of healthy children is taken as the standard. Comparison of indicators with standards is carried out using special tables in which the standards are given along with their corresponding sigma scales. It is believed that if the child’s physical development indicator is within the standard ( average) ±σ, then the child’s physical development (according to this indicator) corresponds to the norm. If the indicator is within the standard ±2σ, then there is a slight deviation from the norm. If the indicator goes beyond these limits, then the child’s physical development differs sharply from the norm (pathology is possible).

In addition to variation indicators expressed in absolute values, statistical research uses variation indicators expressed in relative values. Oscillation coefficient - this is the ratio of the range of variation to the average value of the trait. The coefficient of variation - this is the ratio of the standard deviation to the average value of the characteristic. Typically, these values ​​are expressed as percentages.

Formulas for calculating relative variation indicators:

From the above formulas it is clear that the greater the coefficient V is closer to zero, the smaller the variation in the values ​​of the characteristic. The more V, the more variable the sign.

In statistical practice, the coefficient of variation is most often used. It is used not only for comparative assessment of variation, but also to characterize the homogeneity of the population. The population is considered homogeneous if the coefficient of variation does not exceed 33% (for distributions close to normal). Arithmetically, the ratio of σ and the arithmetic mean neutralizes the influence of the absolute value of these characteristics, and the percentage ratio makes the coefficient of variation a dimensionless (unnamed) value.

The resulting value of the coefficient of variation is estimated in accordance with the approximate gradations of the degree of diversity of the trait:

Weak - up to 10%

Average - 10 - 20%

Strong - more than 20%

The use of the coefficient of variation is advisable in cases where it is necessary to compare characteristics that are different in size and dimension.

The difference between the coefficient of variation and other scatter criteria is clearly demonstrated example.

Table 1

Composition of industrial enterprise workers

Based on the statistical characteristics given in the example, we can draw a conclusion about the relative homogeneity of the age composition and educational level of the enterprise’s employees, given the low professional stability of the surveyed contingent. It is easy to see that an attempt to judge these social trends by the standard deviation would lead to an erroneous conclusion, and an attempt to compare the accounting characteristics “work experience” and “age” with the accounting indicator “education” would generally be incorrect due to the heterogeneity of these characteristics.

Median and percentiles

For ordinal (rank) distributions, where the criterion for the middle of the series is the median, the standard deviation and dispersion cannot serve as characteristics of the dispersion of the variant.

The same is true for open variation series. This circumstance is due to the fact that the deviations from which variance and σ are calculated are measured from the arithmetic mean, which is not calculated in open variation series and in series of distributions of qualitative characteristics. Therefore, for a compressed description of distributions, another scatter parameter is used - quantile(synonym - “percentile”), suitable for describing qualitative and quantitative characteristics in any form of their distribution. This parameter can also be used to convert quantitative characteristics into qualitative ones. In this case, such ratings are assigned depending on which order of quantile a particular option corresponds to.

In the practice of biomedical research, the following quantiles are most often used:

– median;

, – quartiles (quarters), where – lower quartile, top quartile.

Quantiles divide the area of ​​possible changes in a variation series into certain intervals. Median (quantile) is an option that is in the middle of a variation series and divides this series in half into two equal parts ( 0,5 And 0,5 ). A quartile divides a series into four parts: the first part (lower quartile) is an option that separates options whose numerical values ​​do not exceed 25% of the maximum possible in a given series; a quartile separates options with a numerical value of up to 50% of the maximum possible. The upper quartile () separates options up to 75% of the maximum possible values.

In case of asymmetric distribution variable relative to the arithmetic mean, the median and quartiles are used to characterize it. In this case, the following form of displaying the average value is used - Meh (;). For example, the studied feature – “the period at which the child began to walk independently” – has an asymmetric distribution in the study group. At the same time, the lower quartile () corresponds to the start of walking - 9.5 months, the median - 11 months, the upper quartile () - 12 months. Accordingly, the characteristic of the average trend of the specified attribute will be presented as 11 (9.5; 12) months.

Assessing the statistical significance of the study results

The statistical significance of data is understood as the degree to which it corresponds to the displayed reality, i.e. statistically significant data are those that do not distort and correctly reflect objective reality.

Assessing the statistical significance of the research results means determining with what probability it is possible to transfer the results obtained from the sample population to the entire population. Assessing statistical significance is necessary to understand how much of a phenomenon can be used to judge the phenomenon as a whole and its patterns.

The assessment of the statistical significance of the research results consists of:

1. errors of representativeness (errors of average and relative values) - m;

2. confidence limits of average or relative values;

3. reliability of the difference in average or relative values ​​according to the criterion t.

Standard error of the arithmetic mean or representativeness error characterizes the fluctuations of the average. It should be noted that the larger the sample size, the smaller the spread of average values. The standard error of the mean is calculated using the formula:

In modern scientific literature, the arithmetic mean is written together with the representativeness error:

or together with standard deviation:

As an example, consider data on 1,500 city clinics in the country (general population). The average number of patients served in the clinic is 18,150 people. Random selection of 10% of sites (150 clinics) gives an average number of patients equal to 20,051 people. The sampling error, obviously due to the fact that not all 1500 clinics were included in the sample, is equal to the difference between these averages - the general average ( M gene) and sample mean ( M selected). If we form another sample of the same size from our population, it will give a different error value. All these sample means, with sufficiently large samples, are distributed normally around the general mean with a sufficiently large number of repetitions of the sample of the same number of objects from the general population. Standard error of the mean m- this is the inevitable spread of sample means around the general mean.

In the case when the research results are presented in relative quantities (for example, percentages) - calculated standard error of fraction:

where P is the indicator in %, n is the number of observations.

The result is displayed as (P ± m)%. For example, the percentage of recovery among patients was (95.2±2.5)%.

In the event that the number of elements of the population, then when calculating the standard errors of the mean and the fraction in the denominator of the fraction, instead ofmust be put.

For a normal distribution (the distribution of sample means is normal), we know what portion of the population falls within any interval around the mean. In particular:

In practice, the problem is that the characteristics of the general population are unknown to us, and the sample is made precisely for the purpose of estimating them. This means that if we make samples of the same size n from the general population, then in 68.3% of cases the interval will contain the value M(in 95.5% of cases it will be on the interval and in 99.7% of cases – on the interval).

Since only one sample is actually taken, this statement is formulated in terms of probability: with a probability of 68.3%, the average value of the attribute in the population lies in the interval, with a probability of 95.5% - in the interval, etc.

In practice, an interval is built around the sample value such that, with a given (sufficiently high) probability, confidence probability – would "cover" true meaning of this parameter in the general population. This interval is called confidence interval.

Confidence probabilityP this is the degree of confidence that the confidence interval will actually contain the true (unknown) value of the parameter in the population.

For example, if the confidence probability R is 90%, this means that 90 samples out of 100 will give the correct estimate of the parameter in the population. Accordingly, the probability of error, i.e. incorrect estimate of the general average for the sample is equal in percentage: . For this example, this means that 10 samples out of 100 will give an incorrect estimate.

Obviously, the degree of confidence (confidence probability) depends on the size of the interval: the wider the interval, the higher the confidence that an unknown value for the population will fall into it. In practice, at least twice the sampling error is used to construct a confidence interval to provide at least 95.5% confidence.

Determining the confidence limits of averages and relative values ​​allows us to find their two extreme values ​​- the minimum possible and the maximum possible, within which the studied indicator can occur in the entire population. Based on this, confidence limits (or confidence interval)- these are the boundaries of average or relative values, beyond which due to random fluctuations there is an insignificant probability.

The confidence interval can be rewritten as: , where t– confidence criterion.

The confidence limits of the arithmetic mean in the population are determined by the formula:

M gene = M select + t m M

for relative value:

R gene = P select + t m R

Where M gene And R gene- values ​​of average and relative values ​​for the general population; M select And R select- values ​​of average and relative values ​​obtained from the sample population; m M And m P- errors of average and relative values; t- confidence criterion (accuracy criterion, which is established when planning the study and can be equal to 2 or 3); t m- this is a confidence interval or Δ - the maximum error of the indicator obtained in a sample study.

It should be noted that the value of the criterion t to a certain extent related to the probability of an error-free forecast (p), expressed in %. It is chosen by the researcher himself, guided by the need to obtain the result with the required degree of accuracy. Thus, for the probability of an error-free forecast of 95.5%, the value of the criterion t is 2, for 99.7% - 3.

The given estimates of the confidence interval are acceptable only for statistical populations with the number of observations of more than 30. With a smaller population size (small samples), special tables are used to determine the t criterion. In these tables, the desired value is located at the intersection of the line corresponding to the size of the population (n-1), and a column corresponding to the probability level of an error-free forecast (95.5%; 99.7%) chosen by the researcher. In medical research, when establishing confidence limits for any indicator, the probability of an error-free forecast is 95.5% or more. This means that the value of the indicator obtained from the sample population must be found in the general population in at least 95.5% of cases.

    Questions on the topic of the lesson:

    Relevance of indicators of trait diversity in a statistical population.

    General characteristics of absolute variation indicators.

    Standard deviation, calculation, application.

    Relative measures of variation.

    Median, quartile score.

    Assessing the statistical significance of study results.

    Standard error of the arithmetic mean, calculation formula, example of use.

    Calculation of the proportion and its standard error.

    The concept of confidence probability, an example of use.

10. The concept of a confidence interval, its application.

    Test tasks on the topic with standard answers:

1. ABSOLUTE INDICATORS OF VARIATION REFER TO

1) coefficient of variation

2) oscillation coefficient

4) median

2. RELATIVE INDICATORS OF VARIATION RELATE

1) dispersion

4) coefficient of variation

3. CRITERION WHICH IS DETERMINED BY THE EXTREME VALUES OF AN OPTION IN A VARIATION SERIES

2) amplitude

3) dispersion

4) coefficient of variation

4. THE DIFFERENCE OF EXTREME OPTIONS IS

2) amplitude

3) standard deviation

4) coefficient of variation

5. THE AVERAGE SQUARE OF DEVIATIONS OF INDIVIDUAL VALUES OF A CHARACTERISTIC FROM ITS AVERAGE VALUES IS

1) oscillation coefficient

2) median

3) dispersion

6. THE RATIO OF THE SCALE OF VARIATION TO THE AVERAGE VALUE OF A CHARACTER IS

1) coefficient of variation

2) standard deviation

4) oscillation coefficient

7. THE RATIO OF THE AVERAGE SQUARE DEVIATION TO THE AVERAGE VALUE OF A CHARACTERISTIC IS

1) dispersion

2) coefficient of variation

3) oscillation coefficient

4) amplitude

8. THE OPTION THAT IS IN THE MIDDLE OF THE VARIATION SERIES AND DIVIDES IT INTO TWO EQUAL PARTS IS

1) median

3) amplitude

9. IN MEDICAL RESEARCH, WHEN ESTABLISHING CONFIDENCE LIMITS FOR ANY INDICATOR, THE PROBABILITY OF AN ERROR-FREE PREDICTION IS ACCEPTED

10. IF 90 SAMPLES OUT OF 100 GIVE THE CORRECT ESTIMATE OF A PARAMETER IN THE POPULATION, THIS MEANS THAT THE CONFIDENCE PROBABILITY P EQUAL

11. IF 10 SAMPLES OUT OF 100 GIVE AN INCORRECT ESTIMATE, THE PROBABILITY OF ERROR IS EQUAL

12. LIMITS OF AVERAGE OR RELATIVE VALUES, GOING BEYOND WHICH DUE TO RANDOM OSCILLATIONS HAS A SMALL PROBABILITY – THIS IS

1) confidence interval

2) amplitude

4) coefficient of variation

13. A SMALL SAMPLE IS CONSIDERED THAT POPULATION IN WHICH

1) n is less than or equal to 100

2) n is less than or equal to 30

3) n is less than or equal to 40

4) n is close to 0

14. FOR THE PROBABILITY OF AN ERROR-FREE FORECAST 95% CRITERION VALUE t IS

15. FOR THE PROBABILITY OF AN ERROR-FREE FORECAST 99% CRITERION VALUE t IS

16. FOR DISTRIBUTIONS CLOSE TO NORMAL, THE POPULATION IS CONSIDERED HOMOGENEOUS IF THE COEFFICIENT OF VARIATION DOES NOT EXCEED

17. OPTION, SEPARATING OPTIONS, THE NUMERICAL VALUES OF WHICH DO NOT EXCEED 25% OF THE MAXIMUM POSSIBLE IN A GIVEN SERIES – THIS IS

2) lower quartile

3) upper quartile

4) quartile

18. DATA THAT DOES NOT DISTORT AND CORRECTLY REFLECTS OBJECTIVE REALITY IS CALLED

1) impossible

2) equally possible

3) reliable

4) random

19. ACCORDING TO THE RULE OF "THREE Sigma", WITH NORMAL DISTRIBUTION OF A CHARACTERISTIC WITHIN
WILL BE LOCATED

1) 68.3% option

Values ​​obtained from experience inevitably contain errors due to a wide variety of reasons. Among them, one should distinguish between systematic and random errors. Systematic errors are caused by reasons that act in a very specific way, and can always be eliminated or taken into account quite accurately. Random errors are caused by a very large number of individual causes that cannot be accurately accounted for and act in different ways in each individual measurement. These errors cannot be completely excluded; they can only be taken into account on average, for which it is necessary to know the laws that govern random errors.

We will denote the measured quantity by A, and the random error in the measurement by x. Since the error x can take on any value, it is a continuous random variable, which is fully characterized by its distribution law.

The simplest and most accurately reflecting reality (in the vast majority of cases) is the so-called normal error distribution law:

This distribution law can be obtained from various theoretical premises, in particular, from the requirement that the most probable value of an unknown quantity for which a series of values ​​with the same degree of accuracy is obtained by direct measurement is the arithmetic mean of these values. Quantity 2 is called dispersion of this normal law.

Average

Determination of dispersion from experimental data. If for any value A, n values ​​a i are obtained by direct measurement with the same degree of accuracy and if the errors of value A are subject to the normal distribution law, then the most probable value of A will be average:

a - arithmetic mean,

a i - measured value at the i-th step.

Deviation of the observed value (for each observation) a i of value A from arithmetic mean: a i - a.

To determine the variance of the normal error distribution law in this case, use the formula:

2 - dispersion,
a - arithmetic mean,
n - number of parameter measurements,

Standard deviation

Standard deviation shows the absolute deviation of the measured values ​​from arithmetic mean. In accordance with the formula for the measure of accuracy of a linear combination mean square error The arithmetic mean is determined by the formula:

, Where


a - arithmetic mean,
n - number of parameter measurements,
a i - measured value at the i-th step.

The coefficient of variation

The coefficient of variation characterizes the relative measure of deviation of measured values ​​from arithmetic mean:

, Where

V - coefficient of variation,
- standard deviation,
a - arithmetic mean.

How more value coefficient of variation, the relatively greater the scatter and less uniformity of the studied values. If the coefficient of variation less than 10%, then the variability of the variation series is considered to be insignificant, from 10% to 20% is considered average, more than 20% and less than 33% is considered significant and if the coefficient of variation exceeds 33%, this indicates the heterogeneity of information and the need to exclude the largest and smallest values.

Average linear deviation

One of the indicators of the scope and intensity of variation is average linear deviation(average deviation module) from the arithmetic mean. Average linear deviation calculated by the formula:

, Where

_
a - average linear deviation,
a - arithmetic mean,
n - number of parameter measurements,
a i - measured value at the i-th step.

To check the compliance of the studied values ​​with the law normal distribution apply attitude asymmetry indicator to his mistake and attitude kurtosis indicator to his mistake.

Asymmetry indicator

Asymmetry indicator(A) and its error (m a) is calculated using the following formulas:

, Where

A - asymmetry indicator,
- standard deviation,
a - arithmetic mean,
n - number of parameter measurements,
a i - measured value at the i-th step.

Kurtosis indicator

Kurtosis indicator(E) and its error (m e) is calculated using the following formulas:

, Where

Expectation and variance

Let us measure a random variable N times, for example, we measure the wind speed ten times and want to find the average value. How is the average value related to the distribution function?

Let's roll the dice a large number of once. The number of points that will appear on the dice with each throw is a random variable and can take any natural value from 1 to 6. The arithmetic average of the dropped points calculated for all dice throws is also a random variable, but for large N it tends to a very specific number - mathematical expectation M x. In this case M x = 3,5.

How did you get this value? Let in N tests, once you get 1 point, once you get 2 points, and so on. Then When N→ ∞ number of outcomes in which one point was rolled, Similarly, Hence

Model 4.5. Dice

Let us now assume that we know the distribution law of the random variable x, that is, we know that the random variable x can take values x 1 , x 2 , ..., x k with probabilities p 1 , p 2 , ..., p k.

Expected value M x random variable x equals:

Answer. 2,8.

The mathematical expectation is not always a reasonable estimate of some random variable. Thus, to estimate the average salary, it is more reasonable to use the concept of median, that is, such a value that the number of people receiving a salary lower than the median and a higher one coincide.

Median random variable is called a number x 1/2 is such that p (x < x 1/2) = 1/2.

In other words, the probability p 1 that the random variable x will be smaller x 1/2, and probability p 2 that the random variable x will be greater x 1/2 are identical and equal to 1/2. The median is not determined uniquely for all distributions.

Let's return to the random variable x, which can take values x 1 , x 2 , ..., x k with probabilities p 1 , p 2 , ..., p k.

Variance random variable x The average value of the squared deviation of a random variable from its mathematical expectation is called:

Example 2

Under the conditions of the previous example, calculate the variance and standard deviation of the random variable x.

Answer. 0,16, 0,4.

Model 4.6. Shooting at a target

Example 3

Find the probability distribution of the number of points that appear on the dice on the first throw, the median, the mathematical expectation, the variance and the standard deviation.

Any edge is equally likely to fall out, so the distribution will look like this:

Standard deviation It can be seen that the deviation of the value from the average value is very large.

Properties of mathematical expectation:

  • The mathematical expectation of the sum of independent random variables is equal to the sum of their mathematical expectations:

Example 4

Find the mathematical expectation of the sum and product of points rolled on two dice.

In example 3 we found that for one cube M (x) = 3.5. So for two cubes

Dispersion properties:

  • The variance of the sum of independent random variables is equal to the sum of the variances:

D x + y = D x + Dy.

Let for N rolls on the dice rolled y points. Then

This result is true not only for dice rolls. In many cases, it determines the accuracy of measuring the mathematical expectation empirically. It can be seen that with increasing number of measurements N the spread of values ​​around the average, that is, the standard deviation, decreases proportionally

The variance of a random variable is related to the mathematical expectation of the square of this random variable by the following relation:

Let's find the mathematical expectations of both sides of this equality. A-priory,

The mathematical expectation of the right side of the equality, according to the property of mathematical expectations, is equal to

Standard deviation

Standard deviation equal to the square root of the variance:
When determining the standard deviation for a sufficiently large volume of the population being studied (n > 30), the following formulas are used:

Related information.


The Excel program is highly valued by both professionals and amateurs, because users of any skill level can work with it. For example, anyone with minimal “communication” skills in Excel can draw a simple graph, make a decent plate, etc.

At the same time, this program even allows you to perform various types of calculations, for example, calculations, but this requires a slightly different level of training. However, if you have just begun to become closely acquainted with this program and are interested in everything that will help you become a more advanced user, this article is for you. Today I will tell you what the standard deviation formula in Excel is, why it is needed at all and, strictly speaking, when it is used. Go!

What it is

Let's start with the theory. The standard deviation is usually called the square root obtained from the arithmetic mean of all squared differences between the available quantities, as well as their arithmetic mean. By the way, this value is usually called the Greek letter “sigma”. Standard deviation

is calculated using the STANDARDEVAL formula; accordingly, the program does this for the user itself. The point is this concept

is to identify the degree of variability of the instrument, that is, this is, in its own way, an indicator originally from descriptive statistics. It identifies changes in the volatility of an instrument over a certain time period. The STDEV formulas can be used to estimate the standard deviation of a sample, ignoring Boolean and text values.

Formula Helps calculate the standard deviation in, which is automatically provided in Excel. To find it, you need to find the formula section in Excel, and then select the one called STANDARDEVAL, so it’s very simple.

After this, a window will appear in front of you in which you will need to enter data for the calculation. In particular, two numbers should be entered in special fields, after which the program itself will calculate the standard deviation for the sample.

Undoubtedly, mathematical formulas and calculations are a rather complex issue, and not all users can cope with it straight away. However, if you dig a little deeper and look at the issue in a little more detail, it turns out that not everything is so sad. I hope you are convinced of this using the example of calculating the standard deviation.

Video to help

Dispersion. Standard deviation

Dispersion is the arithmetic mean of the squared deviations of each attribute value from the overall average. Depending on the source data, the variance can be unweighted (simple) or weighted.

The variance is calculated using the following formulas:

· for ungrouped data

· for grouped data

The procedure for calculating the weighted variance:

1. determine the arithmetic weighted average

2. deviations of the variant from the average are determined

3. square the deviation of each option from the average

4. multiply the squares of deviations by weights (frequencies)

5. summarize the resulting products

6. the resulting amount is divided by the sum of the scales

The formula for determining variance can be converted into the following formula:

- simple

The procedure for calculating variance is simple:

1. determine the arithmetic mean

2. square the arithmetic mean

3. square each option in the row

4. find the sum of squares option

5. divide the sum of squares by their number, i.e. determine the mean square

6. determine the difference between the mean square of the characteristic and the square of the average

Also, the formula for determining the weighted variance can be converted into the following formula:

those. the dispersion is equal to the difference between the average of the squared values ​​of the attribute and the square of the arithmetic mean. When using the transformed formula, the additional procedure for calculating deviations of individual values ​​of a characteristic from x is eliminated and the error in the calculation associated with rounding of deviations is eliminated

Dispersion has a number of properties, some of which make it easier to calculate:

1) the dispersion of a constant value is zero;

2) if all variants of attribute values ​​are reduced by the same number, then the variance will not decrease;

3) if all variants of attribute values ​​are reduced by the same number of times (fold), then the variance will decrease by a factor

Standard deviation S- represents the square root of the variance:

· for ungrouped data:

;

· for the variation series:

The range of variation, linear mean and standard deviation are named quantities. They have the same units of measurement as the individual characteristic values.

Variance and standard deviation are the most widely used measures of variation. This is explained by the fact that they are included in most theorems of probability theory, which serves as the foundation of mathematical statistics. In addition, the variance can be decomposed into its component elements, which make it possible to evaluate the influence of various factors that determine the variation of a trait.

The calculation of variation indicators for banks grouped by profit margin is shown in the table.

Profit amount, million rubles. Number of banks calculated indicators
3,7 - 4,6 (-) 4,15 8,30 -1,935 3,870 7,489
4,6 - 5,5 5,05 20,20 - 1,035 4,140 4,285
5,5 - 6,4 5,95 35,70 - 0,135 0,810 0,109
6,4 - 7,3 6,85 34,25 +0,765 3,825 2,926
7,3 - 8,2 7,75 23,25 +1,665 4,995 8,317
Total: 121,70 17,640 23,126

The average linear and standard deviation show how much the value of a characteristic fluctuates on average among units and the population under study. Yes, in this case average value fluctuations in the amount of profit are: according to the average linear deviation, 0.882 million rubles; by standard deviation - 1.075 million rubles. The standard deviation is always greater than the mean linear deviation. If the distribution of the characteristic is close to normal, then there is a relationship between S and d: S=1.25d, or d=0.8S. The standard deviation shows how the bulk of the population units are located relative to the arithmetic mean. Regardless of the shape of the distribution, 75 values ​​of the attribute fall into the interval x 2S, and at least 89 of all values ​​fall into the interval x 3S (P.L. Chebyshev’s theorem).