June Projects

Another quick update to notify that the projects for month of June are now uploaded on the website.

This month’s projects include:

  1. Kaggle project – SFO Crime classification
  2. Interactive R presentations.


A. Kaggle Project:

In this Kaggle competition, the goal is to correctly classify the probabilities for crime categories (Arson, larceny, vehicle theft, drugs, etc.) based on factors like x/y coordinates, district, dates, streets and others.

So the github folder contains complete code for data exploration, graphical analysis and of course predictive analytics. Feature engineering is an important part of scoring well on the Kaggle leaderboard, but to do so you need to know which variables are important. Hence I’ve included graphical charts and chi-square hypothesis testing to help with testing just that.

A short explanation of the programs and their functions are given in the readme.md file, but here is a short summary:

  • relations.R = chi-square tests to check dependencies and code for correlation visualization. cov_variables
  • heatmaps.R = graphical analysis of crime categories by district (which was the most significant factor). heatmap_SFO
  • multinom_pgm.R = R program to calculate the predictive probabilities using multinomial regression, which is the best algorithm for such problems.


B. Attractive presentations:

NO matter how well you code, managers still expect presentations to show their bosses what you actually did or how you came up with a particular pattern in the data that everyone else missed! So you’ll often need to supplement your presentation with the codes, charts and analytical work that you’ve so painstakingly completed, without the luxury of running the code! (Ughhh… ) However, have no fears! This is where recent additions to the R ecosystem, namely RMarkdown documents and RPubs, come handy.

They allow you to create attractive powerpoint-style presentations where you have the option of hiding/showing your R-code. You can also add HTML code for headers, tables, bulleted lists. Essentially everything a webpage can have, plus the benefits of being able to run your R program without having RStudio! 🙂  Did I mention you can even embed Shiny webapps?  (A minute please, while I dance with joy! )

Basic Rmarkdown documents are provided on my RPubs account (free for all! ) at http://rpubs.com/anupamaprv while a ppt with embedded Shiny app is linked to my Shiny.io account here.



Hope you find these projects useful and worthy additions to your own online portfolios. If you have any feedback or questions, please do leave a comment. If you just want help with your own projects, share those questions too or connect with me through the contact form and I’d be glad to help out.


‘Shiny’ World of R

Hello All,

Today’s post is a quick notification update to tell you that my Projects page has been updated with the projects for this month. This month’s work is all about Shiny, an incredibly easy way to make web applications to showcase your data analysis projects. If you are not very comfortable with the basics, then the Shiny RStudio website has an amazing tutorial to get you started.

This month’s code projects are available here, including:

  1. immi-sal : Interactive webcharts displaying average salaries of highly skilled immigrant workers,. Users can select views based on state, job title and visa status.  shiny-salary-tool
  2. Diamonds-explorer: interactive Scatter plots using the diamond dataset.
  3. Shiny basics – code for different types of widgets and interactive displays. Shiny apps require all ui files to be named ui.R, so when you download the code, please remember to rename the files accordingly. (e.g: ui_img.R should be saved as ui.R before running the app.)


If you want to take the code further or simply play with Shiny, then here are some other resources to help with your programming:

  1. Add a map using ‘leaflet’ functions. A great example is provided in R-bloggers website.
  2. Add more data tables, using this tutorial for help.
  3. Create attractive maps of your social networks using D3 javascript, as explained in this blog post.


Happy Coding! 🙂

10 Free Resources to learn Python

Hello All,

Quite a few of my friends and colleagues recently raved about the merits of Python since it can be used for basic software development and not just data analysis. Meanwhile, some others lamented the lack of tutorials to learn Python other than CodeAcademy.

I am not convinced there are sites better than CodeAcademy to learn basic Python, but here is a quick list of links anyway, and all of them are FREE!

Learn Basics of Python:

10 FREE resources to learn Python.

10 FREE resources to learn Python programming.

  1. CodeAcademy Python course -I’d like to reiterate about that this (my favorite site) has an amazing interactive editor, helpful hints and excellent content for a range of programming languages. However, it does have a time commitment of about 13 hours for the Python course. Only one weekend if you are really eager to get going! unfortunately, it covers only the basics, not the data analysis libraries.
  2. Google Python Class – Yup! Learn python from the world’s biggest tech-company.
  3. LearnPython.org – Similar to CodeAcademy, this site also teaches other scripting and programming languages like C, Java, PHP, etc.
  4. CodeMentor.io – another site similar to codeacademy.
  5. Kaggle – An excellent tutorial that quickly walks you through Python basics (some, not much), numpy and panda packages and how to apply standard machine learning algorithms using the Titanic passengers dataset. A great resource if you want to learn practical applications, not merely the syntax.
  6. Coursera – Quite a few of the popular courses on this site are now paid ONLY (~79$/course), but there are still FREE courses remaining, as listed below. So enroll while you still can! 🙂
    • Learn to Program: The Fundamentals, from the Univ. of Toronto
    • An Introduction to Interactive Programming in Python (set of 2 courses), from Rice University. Free if you take audit-only option, without access to the quizzes. Links here for course1 and course2.
  7. EdX – Both the courses listed below allow free enrollment, but you can add a verified certificate for 49$.
    • Introduction to Python for Data Science, a hands-on course from Microsoft. Link here.
    • CS For All: Introduction to Computer Science and Python Programming. Course from Harvey Mudd College, but it starts only on Jun 7, 2016. you can view details here.
  8. Udemy – Only listing two free courses, i.e those with standardized content as well as good feedback from students.
    • Introduction To Python Programming, by Avinash Jain. Link here .
    • Introduction to Computer Science with Python Programming, by Mybringback Edutube. Link provided here.
  9. Others – If quick tutorials to various Python libraries (sci-kit, numpy, matplotlib, etc.) is all you need, then this link provides a great curated list.
  10. If you know some amount of basic Python syntax, and only want a targeted tutorial for data analysis, then here’s a great step-by-step tutorial.


So get started with Python! Happy Coding!