Week3 assignment – data management

Week3 assignment – Data Management

This week’s assignment was to clean up the data and account for missing values. The data for this is added for reference in the document anu-ool-pds-codebook-new The program code is in AssignmentW3_pgm_code. The final results are in pdf file anu_w3_asgt2-results (1).

Variables:

Like last week, frequency distributions are performed for the following 5 variables:

  • W1_F3: A basic American belief has been that if you work hard you can get ahead and reach the goals you set and more. Is this true or false today? The answer options were Extremely true (1), Moderately true (2), Slightly true (3), Neither (4), Slightly false (5), Moderately false (6), Extremely false (7) and Refused (-1).
  • W1_F4_B: [To have a financially secure retirement ] For yourself and people like you, how easy or hard is it to reach these goals? Answer options were Very hard (1),  Somewhat hard (2), Somewhat easy (3), Very easy (4), Refused (-1).
  • W1_F4_D: [To become wealthy ] For yourself and people like you, how easy or hard is it to reach these goals? Answer options were Very hard (1),  Somewhat hard (2), Somewhat easy (3), Very easy (4), Refused (-1).
  • W1_F5_A: [To own a home ] For your children or the children of people like yourself, how easy or hard will it be for your children to reach these goals in the future? Answer options were Very hard (1),  Somewhat hard (2), Somewhat easy (3), Very easy (4), Refused (-1).
  • W1_F6: How far along the road to your American Dream do you think you will ultimately get on a 10-point scale where 1 is not far at all and 10 nearly there?  The participants could also refuse this question (-1).

Data Management:

In terms of data management, the following two steps are performed:

  1. Coding unknown values as missing values
  2. Creation of new variable

Missing Values:

For all the 5 variables chosen for this research question, the option -1 stands for refused, which have been coded as missing values. The code snippet for the first variable W1_F3 is as follows:

IF W1_F3 = -1 THEN W1_F3 = .;

The results of the frequency distribution for W1_F3 are given below in table1 and table2, before and after this code is applied. In table 1, the frequency distribution includes the option “-1” even though it means survey respondents did not answer the question. In table 2, all 6 are coded as missing. All the other variables are also coded in a similar format. (as seen in the program code files)

Table 1 Without missing values

Belief in achieving American Dream
W1_F3 Frequency Percent Cumulative
Frequency
Cumulative
Percent
-1 6 2.08 6 2.08
1 48 16.61 54 18.69
2 67 23.18 121 41.87
3 79 27.34 200 69.20
4 44 15.22 244 84.43
5 20 6.92 264 91.35
6 11 3.81 275 95.16
7 14 4.84 289 100.00

Table 2 After Coding for missing values

Belief in achieving American Dream
W1_F3 Frequency Percent Cumulative
Frequency
Cumulative
Percent
Frequency Missing = 6
1 48 16.96 48 16.96
2 67 23.67 115 40.64
3 79 27.92 194 68.55
4 44 15.55 238 84.10
5 20 7.07 258 91.17
6 11 3.89 269 95.05
7 14 4.95 283 100.00

New variables:

A new variable “Wealth_conf”  which is a sum of the two variables W1_F4_B and W1_F4_D, since both questions are very similar. (Outlook on achieving a financially secure retirement and to become wealthy respectively)

WEALTH_CONF = SUM (OF W1_F4_B W1_F4_D);

Additionally, the program statement below ensures that if values for either of the two variables are missing, then the wealth_conf is also coded as “.” or missing:

IF W1_F4_B =. OR W1_F4_D = . THEN WEALTH_CONF = .;

Results:

Finally, the frequency for all the 5 original variables and the new variable were run, after applying the data processing techniques explained above. The results for wealth_Conf and two other variables are given below. The rest are included in the results file.

Table 3 Wealth_conf freq dbn

Confidence to achieve wealth and secure retirement
WEALTH_CONF Frequency Percent Cumulative
Frequency
Cumulative
Percent
2 79 28.42 79 28.42
3 68 24.46 147 52.88
4 72 25.90 219 78.78
5 32 11.51 251 90.29
6 15 5.40 266 95.68
7 6 2.16 272 97.84
8 6 2.16 278 100.00
Frequency Missing = 11

Table 4 W1_F4_B freq dbn

Achieving financially secure retirement
W1_F4_B Frequency Percent Cumulative
Frequency
Cumulative
Percent
1 93 33.10 93 33.10
2 129 45.91 222 79.00
3 48 17.08 270 96.09
4 11 3.91 281 100.00
Frequency Missing = 8

Table 5 W1_F3 Freq dbn

Belief in achieving American Dream
W1_F3 Frequency Percent Cumulative
Frequency
Cumulative
Percent
1 48 16.96 48 16.96
2 67 23.67 115 40.64
3 79 27.92 194 68.55
4 44 15.55 238 84.10
5 20 7.07 258 91.17
6 11 3.89 269 95.05
7 14 4.95 283 100.00
Frequency Missing = 6

Like last week, each option in the frequency distribution table stands for an answer in the survey (very hard, very easy, etc) which has already been discussed in the beginning of this post. The missing values are clearly shown at the bottom of the table, and this number includes the “refused” option.

Please also view the code in file AssignmentW3_pgm_code and rest of the results in pdf file anu_w3_asgt2-results (1) before submitting your peer review. Thanks!

Advertisements

Please share your feedback and opinions. Thanks!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s