Statistical significance: definition, concept, significance, regression equations and hypothesis testing

Statistical significance: definition, concept, significance, regression equations and hypothesis testing
Statistical significance: definition, concept, significance, regression equations and hypothesis testing
Anonim

Statistics has long been an integral part of life. People face it everywhere. Based on statistics, conclusions are drawn about where and what diseases are common, what is more in demand in a particular region or among a certain segment of the population. Even the construction of political programs of candidates for government bodies is based on statistical data. They are also used by retail chains when purchasing goods, and manufacturers are guided by these data in their proposals.

Statistics plays an important role in the life of society and affects each of its individual members, even in small things. For example, if according to statistics, most people prefer dark colors in clothes in a particular city or region, then finding a bright yellow raincoat with a floral print in local outlets will be extremely difficult. But what quantitiesdo these data add up to have such an impact? For example, what is “statistically significant”? What exactly is meant by this definition?

What is this?

Statistics as a science is made up of a combination of different quantities and concepts. One of them is the concept of "statistical significance". This is the name of the value of variables, the probability of the appearance of other indicators in which is negligible.

Calculation of statistical indicators
Calculation of statistical indicators

For example, 9 out of 10 people put on rubber shoes on their feet during a morning walk for mushrooms in the autumn forest after a rainy night. The probability that at some point 8 of them put on canvas moccasins is negligible. Thus, in this particular example, the number 9 is what is called “statistical significance.”

Accordingly, if we develop the given practical example further, shoe stores buy rubber boots by the end of the summer season in greater quantities than at other times of the year. Thus, the magnitude of the statistical value has an impact on ordinary life.

Of course, in complex calculations, say, when predicting the spread of viruses, a large number of variables are taken into account. But the very essence of determining a significant indicator of statistical data is similar, regardless of the complexity of the calculations and the number of non-constant values.

How is it calculated?

Used when calculating the value of the "statistical significance" indicator of the equation. That is, it can be argued that in this case everything is decided by mathematics. The simplest calculation option is a chain of mathematical operations, in which the following parameters are involved:

  • two types of results obtained from surveys or the study of objective data, such as the amount of purchases, denoted by a and b;
  • sample size indicator for both groups – n;
  • value of the combined sample share - p;
  • standard error - SE.

The next step is to determine the overall test score - t, its value is compared with the number 1.96. 1.96 is the average value, conveying a range of 95%, according to the Student's t-distribution function.

Formula for simple calculation
Formula for simple calculation

The question often arises of what is the difference between the values of n and p. This nuance is easy to clarify with an example. Let's say that the statistical significance of loy alty to any product or brand of men and women is calculated.

In this case, the letters will be followed by the following:

  • n - number of respondents;
  • p - number of satisfied with the product.

The number of interviewed women in this case will be designated as n1. Accordingly, men - n2. The same value will have the numbers "1" and "2" of the symbol p.

Comparing the test score to the average of Student's spreadsheets becomes what is called "statistical significance".

What is meant by verification?

The results of any mathematical calculation can always be checked, this is taught to children in primary school. It is logical to assumethat since the statistics are determined using the chain of calculations, then they are checked.

However, testing for statistical significance is not just math. Statistics deals with a large number of variables and various probabilities, which are far from always amenable to calculation. That is, if we return to the example of rubber shoes given at the beginning of the article, then the logical construction of statistical data that buyers of goods for stores will rely on can be disrupted by dry and hot weather, which is not typical for autumn. As a result of this phenomenon, the number of people purchasing rubber boots will decrease, and outlets will suffer losses. Of course, a mathematical formula is not able to foresee a weather anomaly. This moment is called “mistake”.

Tools for statistical data visualization
Tools for statistical data visualization

That's just the probability of such errors and takes into account the check of the level of calculated significance. It takes into account both calculated indicators and accepted levels of significance, as well as quantities conventionally called hypotheses.

What is the significance level?

The concept of "level" is included in the main criteria for statistical significance. It is used in applied and practical statistics. This is a kind of value that takes into account the likelihood of possible deviations or errors.

The level is based on the identification of differences in ready-made samples, it allows you to establish their significance or, conversely, randomness. This concept has not only digital meanings, but also their peculiar interpretations. They explainhow you need to understand the value, and the level itself is determined by comparing the result with the average index, this reveals the degree of reliability of the differences.

Discussion of statistics
Discussion of statistics

Thus, we can imagine the concept of a level simply - it is an indicator of an acceptable, probable error or error in the conclusions drawn from the obtained statistical data.

What levels of significance are used?

The statistical significance of error probability coefficients in practice is based on three basic levels.

The first level is the threshold at which the value is 5%. That is, the probability of error does not exceed the significance level of 5%. This means that the confidence in the impeccability and infallibility of the conclusions made on the basis of statistical research data is 95%.

The second level is the 1% threshold. Accordingly, this figure means that one can be guided by the data obtained during statistical calculations with 99% confidence.

Third level - 0.1%. With this value, the probability of an error is equal to a fraction of a percent, that is, errors are practically eliminated.

What is a hypothesis in statistics?

Errors as a concept are divided into two areas, concerning the acceptance or rejection of the null hypothesis. A hypothesis is a concept behind which, according to the definition, a set of survey results, other data or statements is hidden. That is, a description of the probability distribution of something related to the subject of statistical accounting.

statistical significance of the regression
statistical significance of the regression

There are two hypotheses in simple calculations - zero and alternative. The difference between them is that the null hypothesis is based on the idea that there are no fundamental differences between the samples involved in determining the statistical significance, and the alternative one is completely opposite to it. That is, the alternative hypothesis is based on the presence of a significant difference in these samples.

What are the mistakes?

Errors as a concept in statistics are in direct proportion to the acceptance of this or that hypothesis as true. They can be divided into two directions or types:

  • the first type is due to the acceptance of the null hypothesis, which turned out to be incorrect;
  • second - caused by following the alternative.
Viewing Statistical Graphs
Viewing Statistical Graphs

The first type of error is called false positive and is quite common in all areas where statistics are used. Accordingly, the error of the second type is called a false negative.

Why do we need regression in statistics?

The statistical significance of regression is that with its help it is possible to establish how much the model of various dependencies calculated on the basis of the data corresponds to reality; allows you to identify the sufficiency or lack of factors for accounting and conclusions.

The regression value is determined by comparing the results with the data listed in the Fisher tables. Or using analysis of variance. Regression indicators are important whencomplex statistical studies and calculations involving a large number of variables, random data and probable changes.

Recommended: