Data: the final frontier. These are the voyages of big data and data analytics. Its mission is to explore strange new worlds, to find new ways to live and new ways to conduct civilizations and to boldly go where no man has gone before. In many cases, this means exploring dark data. What is it? What can you use it for? What are the untold dangers lurking therein? Let’s go where no man has gone before … on an exploration through the dark data hiding in the recesses of your systems.
Defining Dark Data
Dark data probably represents about 80 percent of all big data. It is data that is collected by organizations more or less unintentionally. It isn’t processed or analyzed because nobody really even thinks about it, and because it’s largely made up of unstructured data. Unstructured data doesn’t play nicely in the sandbox of the traditional database. Examples of dark data include:
- Documents (text documents, PowerPoint presentations, Excel or other spreadsheets, etc.) that are created in the ordinary course of conducting business
- Log files
- Employee records for workers who have left the company
- Raw survey data and other information gathered from customers
- Past financial statements and account information
- Historical and current email correspondence
- Project notes and early drafts
- Past marketing and advertising campaigns and related documents and information
- Information that companies typically hold on to but don’t actually do anything with
The Potential for Dark Data
Dark data usually resides in what techies call ‘data silos’. These are data reservoirs that are separated and not accessible by the organization’s collective or shared databases. For example, some of it is likely stored on local hard drives. Other dark data might be in print form only, never converted to digital format. Much of your dark data isn’t even known outside of a single department or team, and sometimes apart from a single employee. Yet it still holds promise and value. It holds information about your customers, your past products (plus the lessons you learned through developing, producing, marketing, and supporting those products), past failures (you need to know to avoid these, past successes (you need to know how to reproduce these), and much more.
The Dangers of Dark Data
While all that dark data is just sitting around, delivering no benefit to your organization, it’s actually serving as a risk. Much of your dark data likely contains regulated data, but it isn’t being properly regulated, because you don’t even know it’s there. Some could also be valuable in litigation, but unless it’s discovered and properly logged, it can’t help you. Dark data might also contain important or sensitive intellectual property, or information that could be damaging to your brand’s reputation. But the most significant risk is the lost opportunities your organization is giving up by neglecting all that dark data.
Accessing and utilizing dark data requires an offloading process, a data store capable of handling unstructured data (such as Hadoop, a NoSQL database, or a data lake), and the proper analytical solutions to make use of it all.
Are you looking for more ways to capitalize on your organization’s growing collections of data? Data Insider can help. Follow us on Twitter for the best tips, tricks, technologies, and trends in the industry.