Want to learn programming for data analysis but don’t know where to start? Or just need a refresher? Or perhaps you are preparing for a BIG interview and want to brush up on the basics again.
Whatever be your reason, here are some great resources on learning the basics of data analysis.
- Coursera/JHU – They offer an amazing course on the basics of R, taught by the
legendary Roger Peng. It’s no longer available as a free course, but totally worth the 49$ (IMHO) and provides a great way to get started on R . Their data science specialization is really hands-on and well organized too!
- Coursera’s Essec Business School – They offer a “marketing analytics” course in R. which walks through data analysis for marketing domain like calculating customer lifetime value (CLV), customer segmentation, etc. Content is fairly comprehensive. It is FREE if you do not want a statement of accomplishment. [Open disclaimer – I took the paid version, to get the verified certificate.]
- “swirl” package – Another way to learn R for free, in an interactive fashio similar to CodeAcademy. You simply download and install the “swirl” package into the RStudio software. It was created by Roger Peng’s team (JHU) and it’s a nice way of learning from scratch. The lessons however teach only the fundamentals of R, so not very useful for intermediate or experienced users.
- Edx – Another free MOOC by Microsoft, titled Introduction to R programming.
- R in Action: Data analysis and graphics with R. Author – Robert Kabacoff. This one is my absolute favorite, something I refer from all the time.
- An introduction to statistical learning. Author – James, Witten, Hastie & Tibshirani. Mainly statistical theory, but there are interesting practice programs in R at the end of each chapter.
- R programming for Data Science. Author – Roger Peng.
Resources for Python:
- CodeAcademy.com – it is my favorite resource to learn or brush up programming skills. It is FREE, interactive and has a great UI! It is so good, one of my previous employers listed it as a resource for skill-building! Python is not the only course listed there, so do check out the others.
- Wesleyan University – offers some couple of in-depth data analysis courses on the Coursera platform, where the assignments have to be coded in either SAS or Python. The first week of the course walks you through the basics of both.
- Coursera – There are Python programming courses on Coursera too, but they teach overall programming, not necessarily the data analysis libraries. I haven’t personally enrolled in those classes, but hearing only good reviews from friends and colleagues. Make sure you get up to speed on “panda”, “numpy” and “ski-kit” – these are the most popular libraries for data science projects.
- Python for Data Science. – Another course by Microsoft/EdX platform.
- Python for Data analysis. Author – Wes McKinney. (This guy basically created the “panda” libs, so no point looking for a different book.)
[ Note, there is an endless online debate on R versus Python. My advice to newbies – pick any one and get the basics right, you can learn others later. Both are equally good.]
Resources for SQL:
SQL is not a programming language made for data analysis specifically. But the fact that all corporations have databases, and SQL’s long tenure make it a useful (and mandatory) skill to have.
- CodeAcademy – This heads my list again, esp after they added a fabulous course called SQL for Business Metrics.
- w3schools.com – This is like an inferior copy to Codeacademy, but works just as well.
- Learning SQL – Alan Beaulieu. Slightly older book, but like all O’Reilly publications, well worth the money.
Excel – silent overachiever:
Most people don’t equate Excel with programming, but it does have some “excellent” capabilities, esp if your dataset sizes are not too big. Newer versions allow some complicated statistical processing at the touch of a button. Not to mention the beautiful charts it can create. In my experience, IT Pros love to hate this simple tool, while most senior level executives still love it!
Anyway, if you are interested, the following resources can help you master some of the more advanced features:
- Excel 2013 for Dummies.
- Statistical analysis with Excel for Dummies.
- Mastering Data analysis with Excel – another Coursera gem, this time from Duke University.
Although, the title of the post is only programming resources, I decided to include basis statistics in this list, too. After all, how would you decide whether to implement randomforest or bootstrapping algorithm if you did not know what it stands for? This is where a strong stats background comes in – to make computations and algorithm selection easier. Currently, there are loads of amazing courses that couple statistics and programming. My top favorites are given below:
- Stanford MOOC– Here assignments are coded in R. Enrollment is still open, so do check it out. They are also giving away a free book!
- http://www.statmethods.net/advstats/glm.html – a useful website I often use to refer R functions and statistics.
- Datacamp.com – this is another great platform like Coursera and CodeAcademy. I have not personally used it, but I’ve heard 5-6 passionately positive reviews recently from close contacts.
- http://stattrek.com/tutorials/statistics-tutorial.aspx. This is a recent find that I stumbled upon while looking for resources for websites that teach high-school stats in a non-boring way. Very basic stuff, but the interactive tutorial is a nice way to dust off concepts we learned years ago.
Are there any other websites you liked better? If so, do share on the comments section.