top of page

Statistics 101 - Day 2

  • Writer: supriyamalla
    supriyamalla
  • Jun 20, 2021
  • 1 min read


Multivariate Data

Using scatterplot - to show correlation between 2 variables.


Types of association (can be positive/negative):

  1. Linear (line like)

  2. Quadratic (parabola like)

  3. No Association (random)

Also there is strength of association: weak, moderate and strong


Correlation is measured in terms of "Pearson correlation" which is "R"

which is between -1 and 1 indicating strength and sign


Credits: Coursera


BUT! CAUTION: Correlation doesn't imply causation. Just because one variable is heavily correlated to another variable, there can be other factors in play.


Outliers - are the points way beyond the trend line


Another phenomenon I read about was Simpsons Paradox

Simpson's Paradox is an interesting phenomenon in statistics.

What it essentially means is that the association between the variables is different at an aggregate level from when it is subdivided into sub-populations. This also serves as a reminder that Correlation doesn't always imply Causation.






Comments


Post: Blog2 Post

Subscribe Form

Thanks for submitting!

©2020 by Learn Data Science with me. Proudly created with Wix.com

bottom of page