ANOVA testing – Week1 Assignment

Hello All,

Today’s post is the weekly assignment for another Coursera venture : Data Visualization Tools from Wesleyan University.

The topic is as below:

Run an analysis of variance (ANOVA) with a quantitative response variable and a categorical explanatory variable. Run a post hoc test if the categorical variable has greater than two levels.

For this round of assignments, I decided to use a dataset I’ve already been playing with, rather than the ones provided by the university. (Note: the OOL dataset from the previous course has very no quant variables to do a meaningful analysis.)

Dataset details:

  • Bike sharing program data form the Univ of Porto, with  17389 instances and 16 attributes. Dataset Link is here.
  • Values for count of registered, casual and total bike rental users are provided based on month, season, weather and hour of day.


I have chosen to perform two sets of ANOVA analyses using SAS programming:

  1.  Relationship between number of casual renters based on season. (analysis 1)
    • Categorical variable = “season” with 4 levels,
      • 1:spring,
      • 2:summer,
      • 3:fall,
      • 4:winter.
    • Quantitative variable = “casual” i.e. number of unregistered members or casual renters, ranging from 2-3410.
  2. Relationship of total bike renters based on weather.  (analysis 2)
    • Categorical variable = “weathersit” with 4 levels,
      • 1: Clear or partially cloudy, referred to henceforth as “sunny”
      • 2: Misty and cloudy, referred to as “cloudy”
      • Light Snow or heavy Rain, referred to “harsh”.
    • Quantitative variable = “cnt” i.e total user count, ranging from 22-8714.


Program Code:

The complete code is available at this link w1-asgt-code . However, the essence of the ANOVA analysis is given below:

For analysis 1:

PROC ANOVA; CLASS season; MODEL casual=season;MEANS season/duncan;

For analysis 2:

PROC ANOVA; CLASS weathersit; MODEL cnt=weathersit; MEANS weathersit/duncan;


Results & Interpretation:

The complete results are also available in this pdf file: DV_w1-results (2)

Analysis for relation 1:

  1. The ANOVA association revealed that significantly more casual users rented bikes during Fall (Mean=1202.61) compared to winter (Mean=729.11) and spring (Mean=334.93). There was not much of a difference between fall and summer (Mean=1106.10)
  2. F(3, 1202.61)=80.80, p<0001. In this example 80.80 is the actual F value from the OLS table and p value is so small, that it is reported simply as <.0001.
  3. There was no significant statistical difference between the summer and fall count of casual users.
  4. Also, the count casual bike renters fell dramatically between spring and winter, which is logical since the winters are harsh seasons with snow and unsuitable weather conditions.

Analysis for relation 2:

  1. The ANOVA association revealed that significantly more users (both casual and registered) rented bikes during sunny (mean = 4876.78) and cloudy weather (mean = 4035.86) , as compared to harsh weather (mean = 1803.28).
  2. F(2, 4035.86) = 40.07, p<0001. In this example 40.07 is the actual F value from the OLS table and p value is again small enough to be listed as <.0001.
  3. The count for each weather condition was sufficiently different from one another.

Graphical interpretation:

For easier and more intuitive understanding, graphical results of the two analysis are added below:


Casual renters versus seasons (spring, summer, fall and winter)


Average number of Total renters based on weather (sunny, cloudy or harsh)

Thank you for taking a look at my analysis. Please feel free to add any suggestions for improvement or other feedback in the comments section.





2 thoughts on “ANOVA testing – Week1 Assignment

  1. Pingback: Chi – Square Test | Journey of Analytics

  2. Pingback: Pearson Correlation | Journey of Analytics

Comments are closed.