Saturday, 28 March 2020

Descriptive statistics & Inferential statistics


Descriptive statistics
Descriptive statistics is one of the two main branches of statistics.
Descriptive statistics provide a concise summary of data. We can summarize data numerically or graphically. For example, the manager of a fast food restaurant tracks the wait times for customers during the lunch hour for a week. Then, the manager summarizes the data.
Numeric descriptive statistics
The Researcher calculates the   numeric descriptive statistics:
Graphical descriptive statistics
The Researcher examines the  graphs to visualize the wait times.

Inferential statistics

Inferential statistics is one of the two main branches of statistics. It uses a random sample of data taken from a population to describe and make inferences about the population. Inferential statistics are valuable when examination of each member of an entire population is not convenient or possible. For example, to measure the diameter of each nail that is manufactured in a mill is impractical. We can measure the diameters of a representative random sample of nails. We can use the information from the sample to make generalizations about the diameters of all of the nails.

Difference between descriptive and inferential statistics :
1.     Descriptive statistics uses the data to provide descriptions of the population, either through numerical calculations or graphs or tables. Inferential statistics makes inferences and predictions about a population based on a sample of data taken from the population in question.
2.     Descriptive statistics consists of the collection, organization, summarization, and presentation of data. Inferential statistics consists of  generalizing from samples to populations,  performing estimations and hypothesis  tests, determining relationships among variables, and making predictions.


Parametric Tests: Population values are normally distributed.
Reasons to Use Parametric Tests
Reason 1: Parametric tests can perform well with skewed and nonnormal distributions
This may be a surprise but parametric tests can perform well with continuous data that are nonnormal if you satisfy the sample size guidelines in the table below.
etric analyses
Sample size guidelines for nonnormal data
1-sample t test
Greater than 20
2-sample t test
Each group should be greater than 15
One-Way ANOVA
  • If you have 2-9 groups, each group should be greater than 15.
  • If you have 10-12 groups, each group should be greater than 20.
Reason 2: Parametric tests can perform well when the spread of each group is different
While nonparametric tests don’t assume that  our data follow a normal distribution, they do have other assumptions that can be hard to meet. For nonparametric tests that compare groups, a common assumption is that the data for all groups must have the same spread (dispersion). If our groups have a different spread, the nonparametric tests might not provide valid results.
On the other hand, if we  use the 2-sample t test or One-Way ANOVA, we can simply go to the Options sub dialog and uncheck Assume equal variances.
Reason 3: Statistical power
Parametric tests usually have more statistical power than nonparametric tests. Thus, we are more likely to detect a significant effect when one truly exists.

Nonparametric tests are also called distribution-free tests because they don't assume that  our data follow a specific distribution. We should use nonparametric tests when our data don't meet the assumptions of the parametric test, especially the assumption about normally distributed data.
It’s safe to say that most people who use statistics are more familiar with parametric analyses than nonparametric analyses.
  • Parametric analysis to test group means.
  • Nonparametric analysis to test group medians.
Hypothesis Tests of the Mean and Median
Nonparametric tests are like a parallel universe to parametric tests. It is shown in below table.
Parametric tests (means)
Nonparametric tests (medians)
1-sample t test
1-sample Sign, 1-sample Wilcoxon
2-sample t test
Mann-Whitney test
One-Way ANOVA
Kruskal-Wallis, Mood’s median test
Factorial DOE with one factor and one blocking variable
Friedman test

 

 

 

 

 

 

 

Reasons to Use Nonparametric Tests

Reason 1: Our area of study is better represented by the median
This is my favorite reason to use a nonparametric test and the one that isn’t mentioned often enough! The fact that we can perform a parametric test with nonnormal data doesn’t imply that the mean is the best measure of the central tendency for our data.
For example, the center of a skewed distribution, like income, can be better measured by the median where 50% are above the median and 50% are below. If we add a few billionaires to a sample, the mathematical mean increases greatly even though the income for the typical person doesn’t change.
When our distribution is skewed enough, the mean is strongly affected by changes far out in the distribution’s tail whereas the median continues to more closely reflect the center of the distribution. For these two distributions, a random sample of 100 from each distribution produces means that are significantly different, but medians that are not significantly different.
Reason 2: We have a very small sample size
If we don’t meet the sample size guidelines for the parametric tests and we are not confident that we have normally distributed data, we should use a nonparametric test. When we have a really small sample, we might not even be able to ascertain the distribution of our data because the distribution tests will lack sufficient power to provide meaningful results.
In this scenario, we are   in a tough spot with no valid alternative. Nonparametric tests have less power to begin with and it’s a double whammy when we add a small sample size on top of that!
Reason 3: We have ordinal data, ranked data, or outliers that we can’t remove
Typical parametric tests can only assess continuous data and the results can be significantly affected by outliers. Conversely, some nonparametric tests can handle ordinal data, ranked data, and not be seriously affected by outliers. Be sure to check the assumptions for the nonparametric test because each one has its own data requirements.
.


No comments:

Post a Comment