So, let's start with a confession: data science and machine learning are fascinating but beyond what I ran into tangentially with a background in computer science and economics, they never seemed relevant in my years of consulting with associations and membership organizations. While they were interesting, it always seemed like they weren't all that practical for day to day use or were overkill for the problems I wanted to help my customers solve. I was lucky to have someone as passionate and experienced in these areas as Thomas - the other co-founder of Tasio - to open my eyes as we got up and running, but it was very clear I had some homework to do!
The good news is that there are a ton of materials and opportunities to get up to speed; the bad news is there are a ton of materials available. Picking between the pile of highly reviewed books or websites offering introductions to data science looked like an overwhelming task. It was tempting to dive straight into my comfort area by hitting code first but Thomas pointed me towards a couple great books as a better conceptual starting point before I blazed past the details of what we're doing and why.
With his recommendations, this became a three-pronged approach:
So where to start reading? I had some exposure to the statistical techniques, less to the artificial intelligence, and basically no practical application of any of it so I wanted to get an idea of what is possible and how it is generally being used. Our goal isn't to create new models from scratch but instead to leverage proven approaches and technology in effective ways so a good review of what is out there along with why they might be leveraged to solve particular problems was the way to go. Thomas had a great first step recommendation: Data Smart by John W. Foreman.
This book uses Excel as a way to illustrate each of the examples and it's pretty amazing how far you can go with even that common tool. He provides a great background on statistical methods that have been around for a long time like some clustering techniques, regression, and forecasting but also mixes in some of the more recently developed techniques leveraged in machine learning like using decision trees and boosting models. Regression has been used for a very long time and still has a ton of value for what we're trying to do in predicting behavior based on large data sets but seeing practical implementations and examples of the machine learning algorithms really got me excited for the possibilities of what we could provide.
I've continued my reading with Real-World Machine Learning and that is more focused on actual implementations of these models along with the common pitfalls. After that? Well, I'm unlikely to get to the end of my reading list any time soon but will likely bounce around depending what we're focusing on with our customers.
Seeing these examples really made me enthusiastic to dig in to a project, but since the first book was using Excel the implementation process seemed... tedious. And despite the way Excel was able to illustrate these models, it still has plenty of limitations for data sets as large and complex as associations are able to gather. That started to make me a bit nervous, but luckily I was pairing my theoretical research with some practical implementation of these models.
If you are brand new to machine learning and want to learn how it can actually be implemented in real-world environments, I can't recommend the content at kaggle enough. The courses are fantastic starting points that get you up to speed on Python and the standard libraries associated with data analysis, but it's also a huge community of people focused on and competing to solve problems using machine learning. Working through their curriculum alongside the same topics I covered in my reading was fantastic to help understand what tools I should leverage for each problem and how.
Doing the programming without the theoretical background would have still given me a good idea of what is involved in these models, but it probably would have made me more dangerous - more likely to throw technology at any given problem even when there were better ways to approach them. Doing the reading without the programming might have given me the impression that this stuff is relatively hard to implement, though. The programming exercises showed just how much work has been done in these areas and made it clear that at this point, we're building on top of a deep and wide foundation as we pull together actionable strategies for our customers based on their data.
The last point on my list really was the key, though. In my case, I had Thomas to bounce ideas off of as I encountered new content. He was also able to gradually introduce me to new topics and when he started walking me through some of the things we can do for our customers, I was able to see how they would work instead of dismissing them as magic or unbelievably expensive. There are communities all over the internet where similar people are happy to discuss and provide feedback as your go through your own journey. For programmers I'd refer you to the message boards associated with the site I mentioned above (kaggle) or reddit (r/machinelearning or r/learnmachinelearning). For anyone more on the business side trying to get a better grasp on what's possible or how best to communicate your needs I'd recommend finding a group local to your area - there are meetups (virtual and in person) of these groups all around the world and it's hard to beat a conversation with an enthusiastic expert!
So where did all of this leave me? Ultimately, while getting a deeper background in the models available and how to implement them was really important, all the different sources I've touched have ended with the same conclusion: technology can do remarkable things with your data but it still takes a lot of expertise to determine how to feed and leverage the technology properly. As I went through this process it was constantly reinforced how important it is to understand where you're applying these methods - the business, the data, and the people.
Given our experience working with associations for well over a decade, always seeking to understand all aspects of the business so we can implement the best solutions, we have built up significant expertise that we can leverage in our efforts to provide the most value to our customers. There are plenty of pitfalls and dangers in working with large data sets and they are extremely difficult to deal with if you don't know what you're looking at. Instead of spending days, weeks, or months trying to understand what might be important and how everything is related, we can move straight into the parts you're most interested in: actionable results pulled from your data.
Dray McFarlane