• The null hypothesis, also known as the conjecture, is the initial claim about a population (or data generating process).
  • The outcome of the t-test produces the t-value.
  • This calculated t-value is then compared against a value obtained from a critical value table (called the T-Distribution Table).
  • If calculated t-value is greater than the table value at the given significance level. Then, it is safe to reject the null hypothesis and the population set will have intrinsic differences, and they are not by chance (instead the differences are outside the chance range).
T-Test Assumptions
  1. Scale of measurement – continuous or ordinal scale .
  2. Simple random sampling from representative population subset.(  each member of the subset has an equal probability of being chosen)
  3. Large sample size means the distribution of results should approach a normal bell-shaped curve.
  4. Normal distribution– When a normal distribution is assumed, one can specify a level of probability (alpha level, level of significance, p) as a criterion for acceptance. In most cases, a 5% value can be assumed.
  5. Homogeneous, or equal, varianceexists when the standard deviations of samples are approximately equal.

Calculating t-Tests

Calculating a t-test requires three key data values.

  1. Mean difference (difference between the mean values from each data set)
  2. Standard deviation of each group, and
  3. Number of data values of each group.

The t-test produces two values as its output:

  1. t-value 
  2. degrees of freedom
  • Let’s computed t-value is -2.24787. Since the minus sign can be ignored when comparing the two t-values, the computed value will be 2.24787.
  • And, the degrees of freedom value is 24.38 and is reduced to 24, owing to the formula definition requiring (rounding down of the value to the least possible integer value).
  • Using the degree of freedom value as 24 and a 5% level of significance, a look at the t-value distribution table gives a value of 2.064.
  • Comparing this value against the computed value of 2.247 indicates that the calculated t-value is greater than the table value at a significance level of 5%. Therefore, it is safe to reject the null hypothesis that there is no difference between means. The population set has intrinsic differences, and they are not by chance.
  • The t-value is a ratio of the difference between the mean of the two sample sets and the variation that exists within the sample sets.

*A large t-score indicates that the groups are different.

Type I & Type II errors

Type I Errors (α) : rejection of a true null hypothesis (“false positive“).

Type II Errors (β):retention of a false null hypothesis(“false negative” ).

Probability of type I error = α= significance level ( often 0.05).

  • If the level of significance that was set for the hypothesis testing is 0.05, there is a 5% chance that type I error may occur.

Examples of Type I Errors

  • For example, let’s look at the trail of an accused criminal. The null hypothesis is that the person is innocent, while the alternative is guilty. A Type I error in this case would mean that the person is not found innocent and is sent to jail, despite actually being innocent.
  • In medical testing, let’s say a lab is researching a new cancer drug. Their null hypothesis might be that the drug does not affect the growth rate of cancer cells. If after applying the drug to the cancer cells, the cancer cells stop growing. This would cause the researchers to reject their null hypothesis. However, if something else during the test caused the growth stoppage instead of the administered drug, this would be an example of Type I error.

* Taking steps that reduce the chances of encountering a type II error tends to increase the chances of a type I error.

  • Probability of type II error = one minus the power of the test, also known as beta.
  • The desired power level is typically 0.80, but the researcher performing power analysis can specify the higher level, such as 0.90, which means that there is a 90% probability the researcher will not commit a type II error.

Power of a test= probability of rejecting a false null hypothesis ( desired power level is typically 0.80)

How to increase power of a hypothesis test ?

  1. Use a larger sample.
  2. Use a higher significance level (also called Type I error or α).
  3. Use a directional hypothesis (also called one-tailed hypothesis).
  4. Choose a larger value for Differences.
  5. Improve your process.

One tailed vs Two-tailed t-Test

  • A one-tailed test is a statistical hypothesis test set up to show that the sample mean would be either higher or lower than the population mean, but not both ( test for possibility of a relationship in one direction of interest, and completely disregarding the possibility of the relationship in another direction).
  • The significance value used in a one-tailed test is either 1%, 5% or 10%, although any other probability measurement can be used at the discretion of the analyst or statistician. The probability value is calculated with the assumption that the null hypothesis is true.
  • two-tailed test is a method in which the critical area of a distribution is two-sided and tests whether a sample is greater than or less than a certain range of values.
  • By convention two-tailed tests are used to determine significance at the 5% level, meaning each side of the distribution is cut at 2.5%.
  • Click below for examples of one and two-sample t-Tests.


https://www.pinterest.es/pin/566609196840168575/ https://www.statisticshowto.com/confidence-level/
  • The lower the p-value, the stronger the evidence that the null hypothesis is false (or stronger the evidence in favor of alternative hypothesis) and it means that the difference are not easily explained by chance alone.

Making a decision based on confidence interval (CI)

  • Confidence level (CL): The probability that if a poll/test/survey were repeated over and over again, same results would be obtained.
  • Confidence level = 1 – alpha. 
  • Confidence interval (CI): A range of results from an experiment, or survey that would be expected to contain the population parameter of interest.
  • For one sample t-Test, if CI is given in terms of mean difference (μ-μο) ( e.g., -0.0235,0.0021) it should be converted for mean(μ) by adding mean value to both upper and lower endpoints of CI. (if μ=4, then CI will be (3.9765,4.0021). As CI includes “zero” for (μ-μο)  CI and CI for (μ) contains test value (4 here), it means Null hypothesis can’t be rejected (at given confidence level, 95% here) and there is no enough evidence that sample mean is different from fixed/specified/known value (4 here) treated as “gold standard” against  which mean was compared.
T Distribution
  • The T distribution, also known as the Student’s t-distribution, is a type of probability distribution.
  • The T distribution, like the normal distribution, is bell-shaped and symmetric, but it has heavier tails and higher kurtosis , which means it tends to produce values that fall far from its mean.
  • Tail heaviness is determined by a parameter of the T distribution called  degree of freedom (df) .
  • Smaller df values giving heavier tails, and with higher df values making the T distribution resemble a standard normal distribution with a mean of 0, and a standard deviation of 1.  
Degrees of Freedom
  • df is maximum number of logically independent values in the data sample that have the freedom to vary.
  • Consider a data sample consisting of five positive integers.
  • Four of the numbers in the sample are {3, 8, 5, and 4} and the average of the entire data sample is revealed to be 6.
  • This must mean that the fifth number has to be 10. It can be nothing else. It does not have the freedom to vary.
  • So the Degrees of Freedom for this data sample is 4.
  • Degree of freedom= sample size-1

Standard deviations

  • Because the results can be difficult to analyze, standard deviation is often used instead of variance.
  • Standard deviations are usually easier to picture and apply. The standard deviation is expressed in the same unit of measurement as the data, which isn’t necessarily the case with the variance.
  • The square root of the variance is the standard deviation (σ).
  • The biggest drawback of using standard deviation is that it can be impacted by outliers and extreme values. 
  • 68–95–99.7  rule:
  • If the data behaves in a normal curve, then 68% of the data points will fall within one standard deviation of the average, or mean data point.


  • Variance (σ2) is a measure of spread or variability from the mean (how far each number is from the mean and therefore from every other number in the data set)
  • A variance value of zero indicates that all values are identical within data set.
  • when calculating a sample variance to estimate a population variance, the denominator of the variance equation becomes N – 1 so that the estimation is unbiased and does not underestimate the population variance.
  • One drawback to variance is that it gives added weight to outliers, the numbers that are far from the mean. Squaring these numbers can skew the data.
  • The advantage of variance is that it treats all deviations from the mean the same regardless of their direction. Variance is the average of squared deviation from the mean and anything squared is never negative.

Coefficient of variation

  • The coefficient of variation  (CV) is simply the standard deviation divided by the mean. It is a measure of relative variability.
  • The CV allows to compare the amount of dispersion in two distributions that are on different scales. For instance, suppose you have data for students’ GPA (0 to 4) and SAT (0–800) and you want to know which is more dispersed.
  • When we want to compare the the variation of two data-sets with huge differences in means, it would be better to take the coefficient of variation, because it normalizes the standard deviation with respect to the mean. https://www.youtube.com/watch?v=Y_UB-XhkkMs
  • CV should typically be used for ratio data i.e., data should be continuopus and have a meaningful zero.

Accuracy Vs Bias

  • Accuracy is a qualitative term referring to whether there is agreement between a measurement made on an object and its true (target or reference) value. 
  • Bias is a quantitative term describing the difference between the average of measurements made on the same object and its true value.

Determining the Correct t-Test to Use

  1. Correlated (or Paired or dependent) T-Test : samples typically consist of matched pairs(samples are related in some manner or have matching characteristics, like a comparative analysis involving children, parents or siblings) of similar units (repeated measures). 
  2. Equal Variance (or Pooled) Independent T-Test: when number of samples in each group is the same, or the variance of the two data sets is similar. 
  3. Unequal Variance (Welch’s t-test) Independent T-Test: when the number of samples in each group is different, and the variance of the two data sets is also different.

# Equality of variances can be checked by graphing box-plots and visually checking box-plots overlaps (not sure shot method) or by directly reading both standard deviation values from  “Group statistics” table in Levene’s Test.