Contemplating a Career in Data Analysis? One Practitioner’s Perspective

By Jean-Patrick Tsang, PhD, MBA | October 4, 2017

Jean-Patrick Tsang, president of Bayser Consulting, highlights the power and influence of data.

Look around you: The world is awash in data, which powers more decisions than you might realize.

Take Uber, for instance. The reason the car shows up in just a matter of minutes is because historical data has established where people are most likely to hail a cab at any given time. Now, if you think the car assigned to you is the closest one, think again. The assignment algorithm takes into account not only proximity, but also the profile of the driver and destination.

When you buy an airplane ticket, book a hotel room, or rent a car, the price you pay is determined by a data analysis algorithm meant to maximize the profit of the seller. When you follow doctors’ order and take your medications, it’s more likely you’ll get better. That’s because the medication has undergone one or more clinical trials to generate the requisite data to substantiate the clinical claims of the drug.

Download Our Free Guide to Breaking Into Analytics

A guide to what you need to know, from the industry’s most popular positions to today’s sought-after data skills.

DOWNLOAD NOW

Data-driven decision making is even more conspicuous in the online world. Search results are customized to each individual based on searches completed in the past. The pesky banners that follow you, the spam emails that fill up your inbox, and the annoying ads at the start of every other YouTube video are all interventions meant to boost your purchases, thanks to data analysis. Granted, they are not yet as sophisticated, but it won’t be long until their predictions reach “Minority Report” level.

Data has exploded in the last decade and shows no sign of abating, which means that the demand for data analysis will continue to skyrocket. This is true across all sectors, from financial, medical, and housing to e-commerce and, social media. That’s why now is a great time to start a career in data analysis, and what better way to get started than with a graduate degree from a reputable institution?

Say you have an important business decision to make—one so important you decided to hire someone to help you think things through. Your administrative assistant was kind enough to arrange for you to meet three candidates:

Candidate A is a very personable and smart, and understands the problem right away. He or she genuinely wants to help, but unfortunately does not have the technical background to run the simulations you want.
Candidate B is truly a statistics wizard and can definitely run your simulations and more. Candidate B even suggested some additional hypothesis-testing analyses that you did not fully comprehend. He or she is a good listener but seems to be interested only in the statistics part of your problem.
Candidate C has a good grasp of statistics and clearly understands the finer points of what you want to achieve. Candidate C can not only run your simulations, but also interpret the results and articulate the implications of each decision you envision.

Now, who would you hire? Suggestion: Be that person.

If you’re still unsure whether data analysis is right for you, there is a treasure trove of information on the web that you can check out. The content ranges from short videos by Khan Academy to longer online courses from Udacity, Coursera, Udemy, and edX. Northeastern also offers a data analytics bootcamp, Level, which you can take part-time, by participating in a mix of online and on-ground classes.

Examples of Data Analysis in Action

My own path has led me to data analysis. I run a consulting firm, Bayser Consulting, and have done so for the past 25 years, which I absolutely love. What I enjoy most is the opportunity to be creative. I am grateful to my clients for allowing me to invent new ways to solve problems.

In one of the very first engagements, we deployed genetic algorithms to identify which tests a bedside device should support to maximize the odds that what a physician orders can actually be carried out on the device. The physician typically orders a battery of tests that contains between one and 20 tests. If one of the tests ordered is not supported by the bedside device, it has to be run in the central lab, which means a significant time delay. Running one test in the central lab, it turns out, incurs the same delay as running all the tests in the central lab, so the device has to support all the tests the physician requests in order to be relevant. The device we came up with had a hit rate of 42 percent, while the very best device on the market at the time was around 25 percent.

With new data sources come new techniques. Patient-level claims data appeared around 2001 and brought along different paradigms. We were lucky to be among the first to be exposed to this new type of data. What I realized right away is that patient-level data lends itself to identifying Key Opinion Leaders (KOL). Indeed, the data describes how a patient moves from one physician to another just like a bee hops between flowers. The bee here is the patient and the flower, the physician. The ID of the patient is encrypted in keeping with HIPPA regulations. The ID of the physician, on the other hand, is fully exposed, which is essential for targeting. The technique we developed is now the standard way of doing KOL identification and sphere of influence analysis, outside of primary research.

In another project, we deployed a gravity model to estimate the sales potential of a hospital. We employed a generalized version of Newton’s Universal Law of Gravitation—a model which allows us to estimate the sales potential of a hospital while being sensitive to the presence of other hospitals in the neighborhood. Intuitively, the model accounts for the fact that a patient has the choice between going to the hospital we are interested in or another hospital in the area.

In yet another project, we had to identify which accounts were good targets to purchase our client’s testing equipment. It turns out that accounts that just got a financial grant from the National Institutes of Health (NIH) were likely to be customers. So, we went on to analyze abstracts of successful grant proposals that were submitted to the NIH. The text analysis we deployed discovered that the presence of certain words, such as “knocked-out mouse,” were highly predictive of good targets. Thanks to this project, we became very popular with the client.

About 10 years ago, Google surprised the world by predicting flu trends in the U.S. with better accuracy than the Centers for Disease Control and Prevention(CDC). Although the predictions were off the mark in subsequent years, this got a client of ours wondering if we could use search data to predict drug prescription volume around the world. We built a model at the Metropolitan Statistical Area (MSA) level, which uses the words people type in search engines as the input and the Rx volume of the drug as the output. One interesting fact we quickly realized is that search words cluster around two needs: Before the visit to the doctor and after purchase of the drug. We calibrated the model for the U.S. and then applied it to the major cities around the world. The model lived up to expectations and provided very good estimates.

At this point, if you’re not 100 percent sure that data analysis is the way to go, look up in the sky just like Tycho Brahe and Johannes Kepler did some 400 years ago. Brahe knew the value of quality data and spent most of his life collecting it. Kepler then analyzed the data and gave us the laws of planetary motion, which most of us still study in school to this day. Without Brahe’s data, Kepler, despite his genius, would not have accomplished what he did.

There are two good lessons here. First, good data is essential to good data analysis. If your data isn’t clean, scrub it. Second, provide a modeling environment that allows the data to reveal its deeper truths to you. The more skilled you are, the faster the data will open up. And should the data be unwilling to cooperate, then I’ll let you in on a secret: Torture the data enough and it will confess.