It’s always easiest to underestimate the level of effort for things you don’t know how to do yourself. We’ve all asked questions like “how hard could it be?” without having any real clue what the answer is.
This phenomenon drives the “breathlessness” factor in AI and ML. There are vendors who make claims like “just give us your data. We’ll drop it into our ML box and out will come insights!” As we all know, these “insights” are unlikely to inform decisions in any meaningful way.
The title of this post may not fully apply to you, but I’d bet a lot of money that you’re not ready to take full advantage of analytics. Below are some of the likely reasons.
You don’t have enough valuable data, or don’t understand the value inherent in your data
While many are, not every enterprise is bursting with data. Very often, data collected historically wasn’t collected as much for analysis as it was for record-keeping. Some of those data sets are valuable; some aren’t. You may have valuable data but no clear way to extract value from it. The second point often is the result of the first, or of any of the following issues.
In these cases, creating thoughtful, purpose-built data sets specifically intended for analysis, and structuring existing data to support that, are prerequisites for getting insights and value from data.
You don’t have the right people to get the most business value from your data
In this sentence, “people” applies to both analytics/data science teams and decision-makers. We’ve seen several situations with Clients where the right analysis team is operating against the wrong (or even no) business questions, and will likely never be able to provide value.
We’ve also seen the analytics team working hard to generate truly actionable and valuable insights that ultimately fall on deaf ears. Both are a little heartbreaking because the enterprise is so close to lasting success.
You’re missing data quality and/or reasonable privacy/governance programs
Garbage in, … well, you know what comes out. It’s really surprising how much legacy and even current incoming data is missing, or is clearly wrong, or worst of all, has been manipulated somehow before landing in your database without you knowing it. Data quality tools and programs exist that can address needs from lightweight (for much of the world’s data) to comprehensive (for some types of data protected by Federal regulations, such as healthcare claims).
A more vexing issue is overly-strict privacy policies at an enterprise level. No one questions the clear value of keeping data private when it should be. Federal and state law, public policy, common sense, and ethics all address many issues lobbying for keeping private data private. Privacy policies must be clear, communicated often, and enforced. Having said that, we’ve also seen organizations with draconian policies that hobble the use of data to gain insight and drive value. In these organizations, there’s often a feeling that it’s better to err on the side of being conservative.
I’d agree with the above sentiment but argue that it’s better not to err on either side, but to establish a policy of carefully considering each request for access to data on its merits; balancing the legitimate need to keep certain data private against the tremendous value to the business the data could provide.
Techniques like blinding, sampling, masking, alteration, and synthesis can support full compliance with privacy requirements such as HIPAA while also enabling robust analytics.
- In blinding, unique identifiers (such as member IDs and claim IDs in healthcare data) are masked, possibly by applying a different random integer to each in a “crosswalk” approach, where the crosswalk is not shared with the analyst.
- In some cases, it may be necessary to omit identifying characteristics from the data set.
- Date fields can be randomly altered, preserving relative date intervals (obfuscation).
- Selected fields can be synthesized by replacing each row with data pulled from the overall distribution using statistical techniques. This approach won’t work for all fields (that is, it will work but the results may be nonsensical in mimicking actual data), so it must be applied judiciously.
We can help!
Trexin has seen all of these issues in action across multiple Clients and has seen firsthand what works and what doesn’t. We can help you think through and execute a strategy to address each of these issues (plus three more discussed in my upcoming post, Six Reasons You’re Not Ready for Analytics, Part 2). If you’ve seen any of these three issues in your travels, I’d like to hear your opinion in the comments.