Getting good information is more than about static numbers that tell you what is happening in your association. Association leaders that are really focused on growth will tell you that the ultimate goal of data is to allow you to improve business outcomes. That means that information you collect now should help you do both in the present AND the future.
A recent 2021 data management survey of organizational leaders from Experian showed that:
So we all know that data is important. But without understanding the context behind what you need from data and why, odds are that you or someone in your association leadership team is making decisions based on information that isn't helping you to move towards your goals. And there's no guarantee that even with the right data they are making decisions that are helping your association membership.
In order to make data work in your favor, you need to understand the context behind your data and have a plan for how that information will help you make decisions.
People think that good data is a silver bullet for good decision-making. In reality, data is only useful if it helps you do something new, quicker, or better.
How then can you transform data into useful information? What is required to turn the digital information your association is collecting into a business asset?
To make data an asset, you need good data execution. There are 4 main ingredients to good data execution:
Without these four ingredients, data will provide limited returns on investment, which is something we hear association leaders complain about day in and day out.
The truth is, most organizations are underinvested in collecting the right data and empowering their people and processes to use it effectively for quick, powerful decision-making.
Good data execution is more than a nifty dashboard.
A cool dashboard may make you feel like you’re in a movie; there’s lots of shiny, blinking lights and sparkles, but it’s worthless if the information is not actionable.
What matters is:
In other words, what are the questions and challenges your organization is trying to answer using data? And is your data fit for these needs?
The first step is to make a list of things you want to know about your data department by department. For example, you may want your data to answer things like:
Your different departments have diverse data demands and may need to use data in ways they can’t clearly pinpoint right off the bat. Meeting these challenges requires some brainstorming and team discussions to identify departmental data needs, limitations, and priorities.
Establishing your data needs requires us to start with the end user in mind and then use data to fill in the steps. Think of it as design thinking for your information.
Good questions to ask:
For example, you know that your members want to improve their skill level through your association. So, one big question you need your data to answer is which of your members have actually leveled up at their job or in their profession during their membership. Then, look at your data to see which members have improved over the last three years. Which have increased their earning potential, gotten raises, received recognitions or certifications, or advanced in their career path in some way? This analysis will help you see correlations—through data like members’ activity levels, CE courses attended, books they’ve purchased, etc.
Once you've done all that leg work, you'll be able to design an action path for members who are on a similar trajectory. Instead of making decisions about courses, events, and membership fees based on static data, you are actually designing a pathway for growth for your members that will lead to higher engagement and more value for them personally.
Making good decisions and executing on data effectively requires us to keep our data human-centered (member-oriented) and aligned with our business objectives. To do so, we need to ensure our people are capable of balancing judgment and analysis, effectively taking the data we have and making decisions that are pointed towards action that leads to increased personal value to the members. It's also vital to have the right processes in place to keep pace with the information we collect and the shared goals we’re trying to achieve as an organization.
Learn more about data execution and how to best use your data to drive good decision-making in our recorded webinar, Is My Data Good Enough?
That is the question. Or, if not the question, at least an important one! Especially when paired with how should be impute? But before we go down that path and say the word impute another dozen times, maybe I should explain what it means in this context!
Basically imputation is part of the answer to how we handle incomplete data. For any number of reasons, when we get a set of data that we want to analyze, there are going to be some gaps. This includes surveys or forms left incomplete, fields that don't apply for everyone, events that haven't occurred or many other explanations but regardless of the reason, our models aren't going to handle blanks very well.
So what are our options? Do we just trash that field and move on to ones that are complete? Even if you're assuming any data will ever be complete - which might be a little optimistic - that still can leave you ignoring some really good, predictive information. If only a few records out of a large data set weren't filled in, we don't want to get rid of that element entirely. Do we trash just those records? Maybe! If it's a small number of records and wouldn't impact the model much, that is a reasonable option in some cases when you're more interested in the aggregate results but if you are attempting to predict behavior at an individual level it would be less than ideal if you just had to shrug your shoulders any time someone left off their birthday.
What else can we do if getting rid of it doesn't make sense? Here's where we get to imputation. Basically, we're going to populate those blank values with something so we can still include the record and the feature in our model.
So that's the first question - to impute or not to impute? If dropping the missing data makes sense in the context of what we're trying to achieve, maybe we don't need to impute; but if that means losing a significant amount of valuable data or risking not being able to produce any results at the level we're aiming for, let's impute.
And that takes us to the follow-up question: how do we actually fill in those blank values? What is going to be helpful in maintaining the integrity of the model and keeping that feature valuable? Well, context matters. Depending on the data, we get to choose between several different methods here.
We can use some numeric techniques - fill in blanks with the mean or median values from our populated data - which work well if you can assume people generally fall between a range. This can work pretty well for something like a level of satisfaction if you have that on some scale like 1-10. Basically if you don't have an answer, you would end up assuming something fairly neutral.
What if we're looking at something like days since the last time a person attended a meeting? A mean or median value there could grossly misrepresent reality. The blank value there might intentionally mean that someone never attended a meeting and that has drastically different predictive value than putting any value in there between the people who have attended meetings. In this case, we might use a constant value instead. Maybe something very large would work here and still allow the feature to be useful in our model and results.
A constant value is a pretty easy fallback but requires you to pick the value you want to use. A variation that is a bit of a compromise between the mean, median, and constant value approaches is to use the most commonly used value. Rather than something somewhere in the middle like mean or median or some value you picked like constant, this still lets the data inform the choice assuming that the records generally have similar preferences in this case if not otherwise stated.
It feels like for each of these methods there's a lot of context required to make the right choice - and that's absolutely true. But there's good news here: with current tools we can pretty much try all of them! You need to understand the pitfalls so you don't produce fake results (just populating blanks with values you think you lead to something interesting is frowned upon), but once operating within good statistical practices you can set up and train your models trying each approach and tailoring them to see what gives the best results.
Once you have these basics spun up, you can even start playing with more complex approaches that might include adding new fields to identify when imputation has occurred - maybe having a very large value for days since last meeting registration alone was ruining the predictive capabilities of that field but combining it with another field that is set to 1 whenever the first was blank? That allows the model to be even smarter since it now knows when you were correcting for missing data.
As always, there is a bit of an art to this process and experience is very valuable. Working with similar data structures over and over will allow you to get to the most effective way to handle missing data quickly rather than having to go through all of this every time.
So, let's start with a confession: data science and machine learning are fascinating but beyond what I ran into tangentially with a background in computer science and economics, they never seemed relevant in my years of consulting with associations and membership organizations. While they were interesting, it always seemed like they weren't all that practical for day to day use or were overkill for the problems I wanted to help my customers solve. I was lucky to have someone as passionate and experienced in these areas as Thomas - the other co-founder of Tasio - to open my eyes as we got up and running, but it was very clear I had some homework to do!
The good news is that there are a ton of materials and opportunities to get up to speed; the bad news is there are a ton of materials available. Picking between the pile of highly reviewed books or websites offering introductions to data science looked like an overwhelming task. It was tempting to dive straight into my comfort area by hitting code first but Thomas pointed me towards a couple great books as a better conceptual starting point before I blazed past the details of what we're doing and why.
With his recommendations, this became a three-pronged approach:
So where to start reading? I had some exposure to the statistical techniques, less to the artificial intelligence, and basically no practical application of any of it so I wanted to get an idea of what is possible and how it is generally being used. Our goal isn't to create new models from scratch but instead to leverage proven approaches and technology in effective ways so a good review of what is out there along with why they might be leveraged to solve particular problems was the way to go. Thomas had a great first step recommendation: Data Smart by John W. Foreman.
This book uses Excel as a way to illustrate each of the examples and it's pretty amazing how far you can go with even that common tool. He provides a great background on statistical methods that have been around for a long time like some clustering techniques, regression, and forecasting but also mixes in some of the more recently developed techniques leveraged in machine learning like using decision trees and boosting models. Regression has been used for a very long time and still has a ton of value for what we're trying to do in predicting behavior based on large data sets but seeing practical implementations and examples of the machine learning algorithms really got me excited for the possibilities of what we could provide.
I've continued my reading with Real-World Machine Learning and that is more focused on actual implementations of these models along with the common pitfalls. After that? Well, I'm unlikely to get to the end of my reading list any time soon but will likely bounce around depending what we're focusing on with our customers.
Seeing these examples really made me enthusiastic to dig in to a project, but since the first book was using Excel the implementation process seemed... tedious. And despite the way Excel was able to illustrate these models, it still has plenty of limitations for data sets as large and complex as associations are able to gather. That started to make me a bit nervous, but luckily I was pairing my theoretical research with some practical implementation of these models.
If you are brand new to machine learning and want to learn how it can actually be implemented in real-world environments, I can't recommend the content at kaggle enough. The courses are fantastic starting points that get you up to speed on Python and the standard libraries associated with data analysis, but it's also a huge community of people focused on and competing to solve problems using machine learning. Working through their curriculum alongside the same topics I covered in my reading was fantastic to help understand what tools I should leverage for each problem and how.
Doing the programming without the theoretical background would have still given me a good idea of what is involved in these models, but it probably would have made me more dangerous - more likely to throw technology at any given problem even when there were better ways to approach them. Doing the reading without the programming might have given me the impression that this stuff is relatively hard to implement, though. The programming exercises showed just how much work has been done in these areas and made it clear that at this point, we're building on top of a deep and wide foundation as we pull together actionable strategies for our customers based on their data.
The last point on my list really was the key, though. In my case, I had Thomas to bounce ideas off of as I encountered new content. He was also able to gradually introduce me to new topics and when he started walking me through some of the things we can do for our customers, I was able to see how they would work instead of dismissing them as magic or unbelievably expensive. There are communities all over the internet where similar people are happy to discuss and provide feedback as your go through your own journey. For programmers I'd refer you to the message boards associated with the site I mentioned above (kaggle) or reddit (r/machinelearning or r/learnmachinelearning). For anyone more on the business side trying to get a better grasp on what's possible or how best to communicate your needs I'd recommend finding a group local to your area - there are meetups (virtual and in person) of these groups all around the world and it's hard to beat a conversation with an enthusiastic expert!
So where did all of this leave me? Ultimately, while getting a deeper background in the models available and how to implement them was really important, all the different sources I've touched have ended with the same conclusion: technology can do remarkable things with your data but it still takes a lot of expertise to determine how to feed and leverage the technology properly. As I went through this process it was constantly reinforced how important it is to understand where you're applying these methods - the business, the data, and the people.
Given our experience working with associations for well over a decade, always seeking to understand all aspects of the business so we can implement the best solutions, we have built up significant expertise that we can leverage in our efforts to provide the most value to our customers. There are plenty of pitfalls and dangers in working with large data sets and they are extremely difficult to deal with if you don't know what you're looking at. Instead of spending days, weeks, or months trying to understand what might be important and how everything is related, we can move straight into the parts you're most interested in: actionable results pulled from your data.
What are organizations typically talking about when they discuss business intelligence? In my experience, they're generally referring to some way of reviewing and summarizing what they have collected - in various forms including databases, spreadsheets, and whatever else they use to track information - so people can readily understand what is. This is great! You can use this to help figure out what's happening right now, what you might need to pay more attention to, and even leverage it for some forecasting. Even without all that, I know I generally feel a warm glow as a really complex set of data produces a simple, easy to understand, and hopefully colorful chart.
However, when you start talking about what actions will be most effective in the future, this typical approach to business intelligence tends to require... creative interpretation. You can eyeball a good chart and - based on intuition, experience, third party consultants, or talking to a local fortune teller - declare what your priorities need to be to move that chart in a more positive direction the next time it gets run. But what do you do when you're dealing with so much complexity it's hard to get a good visual? What about finding gold in your data beyond what your experience plus instincts can determine?
Ultimately, BI is good at forecasting when expectations are pretty simple and general in nature. You probably expect meeting registration numbers to look pretty similar this year when compared to last year. You can even guess what the impact of major events might be and build that in, too! But are you able to answer really specific questions? Does an increase in e-learning consumption lead to a reduction for in-person meeting attendance? Are people more likely to renew if their renewal date falls near a particular deadline or event? Does that answer change based on another or even many other criteria? Your data knows, but working with tools that are designed to tell you what is means you're going to have some work left to do after that initial rush of seeing those pretty charts and graphs has faded.
Good news, though! There are plenty of methods to take your information to that next step. Instead of just describing, all that information you've been working so hard to collect and keep clean can start doing the predictive work for you. Even beyond that, with those predictions it can start calling out patterns in a way that help guide you towards a more effective approach to serving your constituents whether that means getting better content in their hands, better access to their existing benefits, or identifying those who are likely to sever their connections to you so you can work closely with them on an individual basis.
Data analytics can leverage powerful, statistically proven models to help with this and more - maybe the biggest benefit is finding things you never would have looked for! Various approaches to optimization, forecasting, and clustering can be easy to implement and provide benefits very quickly, but then you can expand further into more complex areas that allow machine learning to do its thing and report back on details that can be nearly impossible to tease out yourself.
These topics can definitely feel intimidating if you don't have any background with data analytics but we at Tasio are here to help! We will work with your organizations to get all of your data together (and possibly bring in some external sources where they're useful) and run them through these models. The results will show us not just the business intelligence of what is, but also why and what we can expect next. From there, we will engage with your business users to identify concrete actions they can take to impact those outcomes, making sure to allocate limited resources like time and money where they will be most effective in furthering your missions.