Statistics 101 - Day 2
- supriyamalla
- Jun 20, 2021
- 1 min read
Multivariate Data
Using scatterplot - to show correlation between 2 variables.
Types of association (can be positive/negative):
Linear (line like)
Quadratic (parabola like)
No Association (random)
Also there is strength of association: weak, moderate and strong
Correlation is measured in terms of "Pearson correlation" which is "R"
which is between -1 and 1 indicating strength and sign

Credits: Coursera
BUT! CAUTION: Correlation doesn't imply causation. Just because one variable is heavily correlated to another variable, there can be other factors in play.
Outliers - are the points way beyond the trend line
Another phenomenon I read about was Simpsons Paradox
Simpson's Paradox is an interesting phenomenon in statistics.
What it essentially means is that the association between the variables is different at an aggregate level from when it is subdivided into sub-populations. This also serves as a reminder that Correlation doesn't always imply Causation.
Comments