Primary Tool
Hypothesis Test Calculator
A dedicated calculator is fitting here because the page is about interpreting one output in context. The user should see the statistic, the -value, and the decision rule together.
What the number means
A -value answers a conditional question: if the null hypothesis were true, how surprising would this statistic, or something at least this extreme, be?
That means the -value is about the compatibility of the observed data with the null model. It is not a direct probability assigned to the null hypothesis itself.
What the number does not mean
A small -value does not prove the alternative hypothesis, and a large -value does not prove the null. It only reports how extreme the observed result looks under one model.
It also does not tell you whether the effect size matters in practice. Statistical significance and practical significance are separate questions.
How to read the calculator output
The test statistic measures how far the observed summary is from the null value after scaling by the expected variability. The calculator then turns that statistic into a -value using the assumed null model.
The decision line, such as 'reject' or 'fail to reject,' is a reporting convenience built from comparing the -value with the significance level . It is not an extra theorem beyond that comparison.
Why interpretation errors are so common
The language around tests sounds causal even when the procedure is conditional. That is why people slide from 'surprising under the null' to 'unlikely that the null is true,' even though those are different statements.
A good habit is to say the result in full: 'if the null were true, this outcome would be this surprising.' That wording keeps the logic of the test intact.
Common Pitfall
A -value is not the probability that the null hypothesis is true. It is the probability, assuming the null model, of seeing a result this extreme or more extreme.
Try a Variation
Keep the same sample mean but increase the sample size in the calculator. How does the -value change, and what does that say about evidence versus sample size?
Related Pages