Just in the past few weeks, I’ve spoken with several customers about their Data Graveyard – the place where data goes to die and is never seen, again.
That’s, right, they have systems in-place that seem to do what they need but they can’t get any data out of them. Here in the year 2016 and we still focus our entire implementations on making sure the systems collect every bit of data and no time on how to get that data out. We used to call LIMS the Data Graveyard, but with the proliferation of acronyms, we now find plenty of people with ELNs and other types of Data Graveyards, as well.
I also heard someone talking about “unstructured data” and this is where I think we’re still having a problem and I will go so far as to say this: When you put unstructured data into your system, you are building a Data Graveyard.
You have also probably heard the phrase “garbage in garbage out” to indicate that you only get out of your system what you put into it.
For those of you doing research who are reading this, some of you possibly getting a bit steamed-up by what I said and thinking that you need your data to be unstructured so that you can do your work without restrictions.
Let me correct you on this – what you need isn’t “unstructured” but “flexible” and here’s one place we still have a problem. If you start tossing unstructured data into a system you’re not going to be able to find it. If you make it not special in any way then you have not helped your searching abilities to find.
Think of it, this way – at home, when you throw lots of things into your garbage can, suppose you realize you accidentally threw away something important – so important that you need to retrieve it. Possibly, you stick your arm in the can, aimlessly rooting-around for what you need. Or, maybe you dump the can onto some newspaper to pick through everything. Regardless, the entire task is a nasty one where you’re pawing through everything to find what you need. That’s “unstructured.”
The bottom line is this: if you don’t find some way to help categorize data, possibly some keyword, and if you make no effort at all to make give some structure to finding that data, you won’t find it. No-one I know goes back in to label things, after-the-fact. And, like the garbage can, there will be a LOT in there to paw through – finding anything won’t be easy.
So, please, force every implementer to make your process flexible but still give your data enough structure that you can find it when you need it.
The Bottom Line
If the customers don’t force the implementers of software to do a better job of being able to get data back out, they just won’t do it. For the software vendors who think that that isn’t their job, they’re wrong. It’s their job to implement the system not just to manage the process but to allow data to be retrieved from it. It’s just that it’s hard and no-one wants to do that – sometimes neither the customer nor the vendor.
And to the people who claim that they can’t do this because they don’t know what they’ll want in the future, I say that that’s not true. It’s hard but not impossible.
But, here we are in 2016 and still talking about yet another of the same issues we’ve talked about for years and years. Will there be a change? I don’t know the answer to that.
If you’re interested in reading more regarding the Data Graveyard, here are some past posts and articles:
LIMS: The Data Graveyard
LIMS – The Data Graveyard II
Laboratory Informatics Silos and Data Graveyards