Cheatsheet – Selecting Graphs for Statistical Analysis

One of the first steps with any statistical analysis, whether for hypothesis testing or predictive analytics or even a Kaggle competition, is checking the relationship between different variables. Checking if a pattern exists.

Graphs are a fantastic and visual way of identifying such relationships.

graph-matplotlib

MATPLOTLIB Graph

However, numerous readers kept getting stuck while selecting graphs for categorical variables and many friends asked if there was a standard rule for graph selection. With that in mind, please see below a cheatsheet for graphical selection for both quantitative (numeric) and categorical ( character -gender, disease type, etc.) variables.

 

 No.

Axis1

Axis2

Chart type

1.

Single quant

Histograms, Density plot, Box plot
2.

Single categorical

Bar chart (freq/ count), Pie chart (freq/ count/%)
2.

Categorical

Quant

Bar chart, pie chart, frequency table, line chart
3.

Quant

Quant

Scatterplot
4.

Categorical

Categorical

Stacked Column Chart, combination chart (typical bar chart with trendlines)
5.

2 categorical

Quant

Stacked or side-by-side bar charts, heat maps. Any basic graph, with Color/shape code for one of the quant variables.
6.

1 categorical

2 Quant

Stacked or side-by-side bar charts, Scatter plots. Any basic graph, with Color/shape code for one of the quant variables.
7.

3+ variables of any type

Please check if you really need so many variables in a single graph. Side-by-side graphs may be a better option, or graphs with filters (if possible based on the programming language)

These are merely guidelines and are language-agnostic, so you may choose to implement them in your choice of programming language ( R, Python, SAS, MATLAB, etc.) . However, if you prefer, code implementations in R and Python are provided in the links below:

  • Charts in R :
  • Charts in Python :
    • This link contains code and images to create stunning graphs (box plots, histograms, heatmaps, bubble charts, etc) using MATPLOTLIB library, like the one shown above.

Hope you find this cheatsheet useful! Feel free to share your thoughts and comments. Adieu!

June Projects

Another quick update to notify that the projects for month of June are now uploaded on the website.

This month’s projects include:

  1. Kaggle project – SFO Crime classification
  2. Interactive R presentations.

 

A. Kaggle Project:

In this Kaggle competition, the goal is to correctly classify the probabilities for crime categories (Arson, larceny, vehicle theft, drugs, etc.) based on factors like x/y coordinates, district, dates, streets and others.

So the github folder contains complete code for data exploration, graphical analysis and of course predictive analytics. Feature engineering is an important part of scoring well on the Kaggle leaderboard, but to do so you need to know which variables are important. Hence I’ve included graphical charts and chi-square hypothesis testing to help with testing just that.

A short explanation of the programs and their functions are given in the readme.md file, but here is a short summary:

  • relations.R = chi-square tests to check dependencies and code for correlation visualization. cov_variables
  • heatmaps.R = graphical analysis of crime categories by district (which was the most significant factor). heatmap_SFO
  • multinom_pgm.R = R program to calculate the predictive probabilities using multinomial regression, which is the best algorithm for such problems.

 

B. Attractive presentations:

NO matter how well you code, managers still expect presentations to show their bosses what you actually did or how you came up with a particular pattern in the data that everyone else missed! So you’ll often need to supplement your presentation with the codes, charts and analytical work that you’ve so painstakingly completed, without the luxury of running the code! (Ughhh… ) However, have no fears! This is where recent additions to the R ecosystem, namely RMarkdown documents and RPubs, come handy.

They allow you to create attractive powerpoint-style presentations where you have the option of hiding/showing your R-code. You can also add HTML code for headers, tables, bulleted lists. Essentially everything a webpage can have, plus the benefits of being able to run your R program without having RStudio! 🙂  Did I mention you can even embed Shiny webapps?  (A minute please, while I dance with joy! )

Basic Rmarkdown documents are provided on my RPubs account (free for all! ) at http://rpubs.com/anupamaprv while a ppt with embedded Shiny app is linked to my Shiny.io account here.

rpresn

 

Hope you find these projects useful and worthy additions to your own online portfolios. If you have any feedback or questions, please do leave a comment. If you just want help with your own projects, share those questions too or connect with me through the contact form and I’d be glad to help out.

 

‘Shiny’ World of R

Hello All,

Today’s post is a quick notification update to tell you that my Projects page has been updated with the projects for this month. This month’s work is all about Shiny, an incredibly easy way to make web applications to showcase your data analysis projects. If you are not very comfortable with the basics, then the Shiny RStudio website has an amazing tutorial to get you started.

This month’s code projects are available here, including:

  1. immi-sal : Interactive webcharts displaying average salaries of highly skilled immigrant workers,. Users can select views based on state, job title and visa status.  shiny-salary-tool
  2. Diamonds-explorer: interactive Scatter plots using the diamond dataset.
  3. Shiny basics – code for different types of widgets and interactive displays. Shiny apps require all ui files to be named ui.R, so when you download the code, please remember to rename the files accordingly. (e.g: ui_img.R should be saved as ui.R before running the app.)

 

If you want to take the code further or simply play with Shiny, then here are some other resources to help with your programming:

  1. Add a map using ‘leaflet’ functions. A great example is provided in R-bloggers website.
  2. Add more data tables, using this tutorial for help.
  3. Create attractive maps of your social networks using D3 javascript, as explained in this blog post.

 

Happy Coding! 🙂