Moderator variable with Chi-Square test

Today's post is the assignment exercise for week 4 for the Coursera class on Data Visualization Tools from Wesleyan University.

The topic is as below:

Run an ANOVA, Chi-Square Test or correlation coefficient that includes a moderator.

For this round of assignments, I’m using the outlook on life dataset provided for the course, as available here. Today I am going to test the confidence to achieve secure retirement (var = W1_F4_B) based on incomegroup (INCOME, calculated from given var = PPINCIMP). The moderator variable is marital status (MARIT, computed from PPMARIT).

I am using the chi-square test for this assignment.

The hypothesis for this assignment is as follows:

  1. Ho = No relationship between INCOME and W1_F4_B.
  2. H1 = There is a significant relation between above two variables.


Procedure for Chi-Square test:

  1. INCOME variable has 5 levels :
    • 20 => income between 0 to 19,999
    • 40 => income between 20,000 to 39,999
    • 60 => income between 40,000 to 59,999
    • 80 => income between 60,000 to 99,999
    • 100 => income greater than 99,999
  2. MARIT variable has 4 levels:
    • 1 => Married or living with partner
    • 2 => widowed
    • 3 => separated or divorced
    • 4 => never married.
  3. W1_F4_B is modified to have only 2 levels :
    • 1 = Very hard or somewhat hard
    • 4 = Very easy or somewhat easy.
  4. The code for this program is located at my github SAS folder. The essence of the code is :


    TABLES W1_F4_B*INCOME/chisq;


  5. There are 5 levels in INCOME, so we need to make 10 comparisons. Hence Bonferoni adjusted p-value = 0.005.
  6. Code with moderator in the posthoc test comparisons:
  7. /* comparison set 1 */DATA COMPARISON1; SET temp_chk;TITLE ‘Comparison range 20 & 40’;IF INCOME=20 OR INCOME=40;PROC FREQ; TABLES W1_F4_B*INCOME/chisq;    BY MARIT;

  8. Code without moderator in the posthoc test comparisons:

    /* comparison set 1 */DATA COMPARISON1; SET temp_chk;TITLE ‘Comparison range 20 & 40’;IF INCOME=20 OR INCOME=40;PROC FREQ; TABLES W1_F4_B*INCOME/chisq;


Results & Interpretation:

The complete results are also available in thisW4-INCOME-WEALTH-MODVAR-MARIT-POSTHOC-MARIT

Based on the output, the following conclusions can be inferred:

  1. For the main chi-square test, we see that Ha = TRUE only for MARIT = 1 ( married or living together). So we accept an association between wealth confidence and income only for married couples. The null hypothesis is true for all other marital status.
  2. The other main trend is that majority of survey respondents show little confidence in achieving their wealth goals and a secure financial retirement status. At lower incomes, this is overwhelmingly so, but even at highest income levels, only about 35-40% respondents remain positive.


    % of users who show high/low confidence to achieve secure financial retirement

  3. Thus we see that major answer differences between lowest and highest income groups only for marital status MARIT = 1 . (married and those living with their partners)
  4. Based on adjusted p-value < 0.005, we see a statistical difference for income samples 20&100, 40&100, 60&100.  comparison-income-grps-20&amp;100
  5. I ran the program with posthoc tests both with & without considering marital status as moderator, but the trend again is seen only for marit = 1. If we do not use the moderator variable for posthoc tests, we only see one extra comparison group that is statistically different (income groups 20&80)

Thank you for taking a look at my analysis. Please feel free to add any suggestions for improvement or other feedback in the comments section.



