Statistical power: Difference between revisions
m (Article improvement) |
m (Text cleaning) |
||
(4 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
'''Statistical power''' is a probability of not rejecting of null hypotesis when in reality that hypotesis is false. It's concept deriving from '''[[statistical hypothesis]] testing'''. There are two primary types of errors when verifing binary hypotesis<ref>Banerjee A., Chitnis U.B., Jadhav S.L., Bhawalkar J.S., Chadhury S., ''Hypothesis testing, type I and type II errors'', 2009, p. 127-131</ref>: | |||
'''Statistical power''' is a probability of not rejecting of null hypotesis when in reality that hypotesis is false. It's concept deriving from '''[[statistical hypothesis]] testing'''. There are two primary types of errors when verifing binary hypotesis<ref>Banerjee A., Chitnis U.B., Jadhav S.L., Bhawalkar J.S., Chadhury S., ''Hypothesis testing, type I and type II errors'', 2009, p. 127 - 131</ref>: | |||
* '''type I error''' - it's so called '''false positive''' error. It happens when a test is positive, but in reality the hypotesis is false. For example, when antivirus recognizes file as healty, but in reality it's infected by virus. | * '''type I error''' - it's so called '''false positive''' error. It happens when a test is positive, but in reality the hypotesis is false. For example, when antivirus recognizes file as healty, but in reality it's infected by virus. | ||
* '''type II error''' - it's so called '''false negative''' error. It happens when a test is negative, but in reality the hypotesis is true. For example, when pregnancy test shows negative value, but in reality patient is pregnant. | * '''type II error''' - it's so called '''false negative''' error. It happens when a test is negative, but in reality the hypotesis is true. For example, when pregnancy test shows negative value, but in reality patient is pregnant. | ||
Statistical power is equal to 1 - β, where β is probability of occurrence type II error. The greater '''statistical power''' in research means more reliable and thrustworthy results<ref>Banerjee A., Chitnis U.B., Jadhav S.L., Bhawalkar J.S., Chadhury S., ''Hypothesis testing, type I and type II errors'', 2009, p. 127 - 131</ref>. | Statistical power is equal to 1 - β, where β is probability of occurrence type II error. The greater '''statistical power''' in research means more reliable and thrustworthy results<ref>Banerjee A., Chitnis U.B., Jadhav S.L., Bhawalkar J.S., Chadhury S., ''Hypothesis testing, type I and type II errors'', 2009, p. 127-131</ref>. | ||
==Direct factors== | ==Direct factors== | ||
Line 27: | Line 11: | ||
==Mammography problem== | ==Mammography problem== | ||
Statistical factors such as '''statistical power''' or significance threshold have major impact on many areas. One of the known problems is mammography problem. Let's assume that 0.8% of women that are examined by mammograms have breast cancer. Estimated value of '''statistical power''' during mammogram test is 90% (it's estimated value, because there is very hard to tell how many cancers are not detected during examination). On the other hand, about 7% of tests gives false positive result. We can ask question: if mammogram test have a positive result what is the probability of actual breast cancer? In group of 1000 women - based on initial assumptions - only 8 have breast cancer. Considering number of false positive results the mammogram test will be false positive in 7% cases which is 70 tests. In total during examination 1000 women mammogram test result will be 77 times positive (one case will not be detected due to '''statistical power'''). On 77 positive tests there is 7 actual breast cancers which gives only 9% effectiveness. Based on these studies many countries developed recommendations regarding minimal age of examined women. They started recommending mammography exam only for women which are older than 50. In that population [[risk]] of breast cancer is significantly higher that's why tests gives results that more reflects reality<ref>Reinhart A., ''Statistics done wrong'', 2015, p. 42 - 43</ref>. | Statistical factors such as '''statistical power''' or significance threshold have major impact on many areas. One of the known problems is mammography problem. Let's assume that 0.8% of women that are examined by mammograms have breast cancer. Estimated value of '''statistical power''' during mammogram test is 90% (it's estimated value, because there is very hard to tell how many cancers are not detected during examination). On the other hand, about 7% of tests gives false positive result. We can ask question: if mammogram test have a positive result what is the probability of actual breast cancer? In group of 1000 women - based on initial assumptions - only 8 have breast cancer. Considering number of false positive results the mammogram test will be false positive in 7% cases which is 70 tests. In total during examination 1000 women mammogram test result will be 77 times positive (one case will not be detected due to '''statistical power'''). On 77 positive tests there is 7 actual breast cancers which gives only 9% effectiveness. Based on these studies many countries developed recommendations regarding minimal age of examined women. They started recommending mammography exam only for women which are older than 50. In that population [[risk]] of breast cancer is significantly higher that's why tests gives results that more reflects reality<ref>Reinhart A., ''Statistics done wrong'', 2015, p. 42-43</ref>. | ||
==Examples of Statistical power== | ==Examples of Statistical power== | ||
# | # A study conducted in a classroom setting that seeks to determine whether student performance is affected by the type of teaching methods used. The power of the study is determined by the ability of the researchers to detect a meaningful difference in performance between the two groups. | ||
# | # A pharmaceutical [[company]] conducting clinical trials to determine the efficacy of a new drug. The power of the study is determined by the ability of the researchers to detect a difference in outcomes between the test group and the control group. | ||
# | # A study to examine whether a new [[marketing]] [[strategy]] is more effective than an existing one. The power of the study is determined by the ability of the researchers to detect a difference in sales and [[customer]] satisfaction between the two strategies. | ||
==Advantages of Statistical power== | ==Advantages of Statistical power== | ||
The advantages of statistical power include: | The advantages of statistical power include: | ||
* Increased reliability in the statistical analysis process due to the reduction of Type I errors. This allows researchers to make more accurate conclusions based on their data, leading to more accurate research results. | * Increased [[reliability]] in the statistical analysis [[process]] due to the reduction of Type I errors. This allows researchers to make more accurate conclusions based on their data, leading to more accurate research results. | ||
* More accurate interpretation of the data by allowing researchers to identify the true effects of a variable without the interference of false positives. | * More accurate interpretation of the data by allowing researchers to identify the true effects of a variable without the interference of false positives. | ||
* Increased confidence in the results as a larger sample size provides a more accurate representation of the population. | * Increased confidence in the results as a larger sample size provides a more accurate representation of the population. | ||
Line 60: | Line 44: | ||
==Footnotes== | ==Footnotes== | ||
<references /> | <references /> | ||
{{infobox5|list1={{i5link|a=[[Experimental error]]}} — {{i5link|a=[[Random error]]}} — {{i5link|a=[[Adjusted mean]]}} — {{i5link|a=[[Correlational study]]}} — {{i5link|a=[[Heteroskedasticity]]}} — {{i5link|a=[[Cronbach Alpha]]}} — {{i5link|a=[[Lurking variable]]}} — {{i5link|a=[[Types of indicators]]}} — {{i5link|a=[[Leniency error]]}} }} | |||
==References== | ==References== |
Latest revision as of 05:00, 18 November 2023
Statistical power is a probability of not rejecting of null hypotesis when in reality that hypotesis is false. It's concept deriving from statistical hypothesis testing. There are two primary types of errors when verifing binary hypotesis[1]:
- type I error - it's so called false positive error. It happens when a test is positive, but in reality the hypotesis is false. For example, when antivirus recognizes file as healty, but in reality it's infected by virus.
- type II error - it's so called false negative error. It happens when a test is negative, but in reality the hypotesis is true. For example, when pregnancy test shows negative value, but in reality patient is pregnant.
Statistical power is equal to 1 - β, where β is probability of occurrence type II error. The greater statistical power in research means more reliable and thrustworthy results[2].
Direct factors
Factors that has direct influence on statistical power are[3]:
- sample size used in research study,
- size of bias,
- measurement errors - some tests are much easier to measure than the others. It's much easier determine result of flipping a coin than verify medical test result.
Mammography problem
Statistical factors such as statistical power or significance threshold have major impact on many areas. One of the known problems is mammography problem. Let's assume that 0.8% of women that are examined by mammograms have breast cancer. Estimated value of statistical power during mammogram test is 90% (it's estimated value, because there is very hard to tell how many cancers are not detected during examination). On the other hand, about 7% of tests gives false positive result. We can ask question: if mammogram test have a positive result what is the probability of actual breast cancer? In group of 1000 women - based on initial assumptions - only 8 have breast cancer. Considering number of false positive results the mammogram test will be false positive in 7% cases which is 70 tests. In total during examination 1000 women mammogram test result will be 77 times positive (one case will not be detected due to statistical power). On 77 positive tests there is 7 actual breast cancers which gives only 9% effectiveness. Based on these studies many countries developed recommendations regarding minimal age of examined women. They started recommending mammography exam only for women which are older than 50. In that population risk of breast cancer is significantly higher that's why tests gives results that more reflects reality[4].
Examples of Statistical power
- A study conducted in a classroom setting that seeks to determine whether student performance is affected by the type of teaching methods used. The power of the study is determined by the ability of the researchers to detect a meaningful difference in performance between the two groups.
- A pharmaceutical company conducting clinical trials to determine the efficacy of a new drug. The power of the study is determined by the ability of the researchers to detect a difference in outcomes between the test group and the control group.
- A study to examine whether a new marketing strategy is more effective than an existing one. The power of the study is determined by the ability of the researchers to detect a difference in sales and customer satisfaction between the two strategies.
Advantages of Statistical power
The advantages of statistical power include:
- Increased reliability in the statistical analysis process due to the reduction of Type I errors. This allows researchers to make more accurate conclusions based on their data, leading to more accurate research results.
- More accurate interpretation of the data by allowing researchers to identify the true effects of a variable without the interference of false positives.
- Increased confidence in the results as a larger sample size provides a more accurate representation of the population.
- Improved measures of effect size, which allows researchers to identify the strength of relationships between variables.
- Enhanced ability to identify subtle but meaningful differences between groups.
Limitations of Statistical power
Statistical power is an important concept in understanding the validity of a hypothesis, however, it has its limitations. Below are the main limitations of Statistical power:
- Lack of sample size: Statistical power is strongly dependent on the sample size. If the sample size is too small, the power of the test will be low and the results of the test may not be reliable.
- Assumption of normality: Statistical power depends on the assumption that the data is normally distributed. If the data is non-normal, the power of the test will be lower.
- Unclear effect size: Statistical power is dependent on the effect size, which is the magnitude of the difference between the null and alternative hypothesis. If the effect size is not clear, it is difficult to determine the power of the test.
- Multiple testing: If multiple tests are performed on the same data, the power of each test will be reduced. This is due to the fact that each test must be accepted or rejected independently from each other.
- Complexity of the test: Statistical power depends on the complexity of the test. If the test is too complex, the power of the test will be low.
Statistical power is a concept related to statistical hypothesis testing, which is the probability of not rejecting a null hypothesis when it is in fact false. There are two primary types of errors when verifying a binary hypothesis and other approaches related to this concept include:
- Effect size: This is the measure of the strength of the relationship between two variables, which is used to determine the sample size required to achieve a desired power in a study.
- Alpha level: This is the probability of rejecting the null hypothesis when it is true. It is usually set at 0.05 and is used to calculate the required sample size for a study.
- Sample size: This is the size of the sample that is required for a study in order to achieve the desired power.
In summary, statistical power is a probability of not rejecting a null hypothesis when it is in fact false, and other approaches related to this concept include effect size, alpha level, and sample size.
Footnotes
- ↑ Banerjee A., Chitnis U.B., Jadhav S.L., Bhawalkar J.S., Chadhury S., Hypothesis testing, type I and type II errors, 2009, p. 127-131
- ↑ Banerjee A., Chitnis U.B., Jadhav S.L., Bhawalkar J.S., Chadhury S., Hypothesis testing, type I and type II errors, 2009, p. 127-131
- ↑ Reinhart A., Statistics done wrong, 2015, p. 17
- ↑ Reinhart A., Statistics done wrong, 2015, p. 42-43
Statistical power — recommended articles |
Experimental error — Random error — Adjusted mean — Correlational study — Heteroskedasticity — Cronbach Alpha — Lurking variable — Types of indicators — Leniency error |
References
- Banerjee A., Chitnis U.B., Jadhav S.L., Bhawalkar J.S., (2009), Hypothesis testing, type I and type II errors, Industrial Psychiatry Journal
- Campelo F., Takahashi F., (2018), Sample size estimation for power and accuracy in the experimental comparison of algorithms, Universidade Federal de Minas Gerais
- Park H.M., (2008), Hypothesis Testing and Statistical Power of a Test, The Trustees of Indiana University
- Shen Y., Winget M., Yuan Y., (2018), The impact of false positive breast cancer screening mammograms on screening retention: A retrospective population cohort study in Alberta, Canada, Canadian journal of public health
Author: Agata Skalska