Last week I met a colleague in the marketing team in Ireland. He is taking one of these popular online Data Analytics courses and needed help with a cluster analysis assignment. He was stuck, so I offered to take a look.
The exercise involved clustering insurance customers in terms of risk factors. He was following the instructor’s recipe, and got as far as running the k-means algorithm and creating some visuals. But then we got to the part where we look at output like this:
This is raw output from R representing cluster centers. For this example I’m clustering 150 flowers into four groups based on four size measures. The table shows the average for each measure across each group. The idea is that flowers in the same group are most similar to each other in terms of these size measures.
His table had many more columns, and he was trying to make sense of the cluster centers to tell the story of the different groups. Not easy to do on the ugly R console. So I suggested throwing it in Excel and putting some conditional formatting to make it easier.
But he did not know what that was. Turns out, he did not really know Excel.
And then it hit me.
EXCEL IS NOT SEXY – BUT YOU NEED IT
Take a look at some of the popular analytics training courses out there. Excel is not a prerequisite and rarely is it part of the curriculum. There are promises of a hot job market with high salaries. But no mention of the most famous and most successful analysis software on the planet. Perhaps it’s implied?
For all the crap Microsoft gets, they’ve done at least one thing well — Excel is a pretty damn good application. But the problem is, big data / machine learning / data analytics is such a sexy thing these days, and sadly Excel does not hit the sexiness thresholds. In many R instructors you will find highly impressive PhDs and “data scientists” (I hate that title) who are deep into the cool things like Python, R, Hadoop, yada yada. At best, they don’t consider Excel to be that useful, and at worst they will look down at it as an amateur playground.
CRAWL BEFORE YOU RUN
Ever since starting my R for Excel Users project, I have helped hundreds of professionals and students frustrated trying to learn R. I find that a key source of their frustration is skipping steps and not laying a solid foundation. Like my friend, who jumped into R without having much experience with data in the first place. Or learning to build regression models without having mastered data frames. Or not really knowing how to work with data frames because you don’t truly understand vectors.
So my advice to folks trying to learn R: crawl before you run.
Go ahead, learn R, but take it one step at a time. Build your baseline. And don’t forget about Excel. Used together, R and Excel can really jumpstart your entry into analytics with bigger and hairier data.