Bernard Marr’s article “Finding Value in all Data Types,” is far more accurate than most of us realize. ‘Dark’ data is the massive amount of undocumented copy, written emails, DMs and collected data that exists but is never analyzed. If you start to consider how much information lives in your email alone, you have not comprehended a fraction of a percentage point of the untouched and valuable material across all industries.
Unstructured data comprises about 85% of all data recorded annually. The lack of consistent, efficient processes for handling dark data means that every year 90% of this unstructured data is never analyzed according to the IDC. It gets collected for compliance or security measures, but usually amounts to little more than a security precaution.
The sheer sprawl of data across different locations and systems means that businesses need to be intentional about developing strategies to analyze it and a plan for how it will be used. This is where Natural Language Processing (NLP) can play a key role in making unstructured data usable and readable. NLP works in tandem with artificial intelligence and machine learning to expedite data preparation which used to take weeks, even months when it was a human task. The applications are wide-ranging and customizable, from making sense of legal, health, scientific, and maintenance notes in contracts and regulatory documents, project schedules and PDFs, or scanning millions of pieces of internet social media content to locate similar and relevant material.
State-of-the-art NLP and AI like Plasticity visualizes connections among words, phrases, sentences, and documents. Its ability to recognize relationships and attributes helps machines “read” through terabytes of dark data at 6,000 sentences per second, and make sense of information like humans do; an exponential leap above existing data solutions. The creativity that developers invest in this kind of software gets optimized in the flexible and adaptive learning skills of NLP. Programming and data preparation require intensive creative effort, and the human-language connection behind NLP development is what allows us to have smarter, more complete solutions when we combine it with AI.
Consider the benefit to having AI/NLP when applied to electronic health records, legal documentation, fleet records, or decisions around military budgets, and endless other applications. Not only are we performing true due diligence around our records and data. We are also drilling a hole into the dark and casting a light on information that can give us more complete histories. Now, we can have the timeline, and material to understand why decisions were made; getting to know the full story on who was in the room, and the facts at the time when those decisions were made.
We also need to recognize the value of AI/ML/NLP in its application for public safety and security. The cyber world is a modern battleground, where ordinary citizens and our government are being targeted and infiltrated. The DoD and Intelligence Community are actively embracing AI to support their efforts to crawl and scrape social media and the Internet at large in search of bad actors, terrorist cells and other threats to our government.
While many taking the first important steps to bringing dark data into the light, the future of this path must also include intelligent solutions like NLP if we want to make the jump to actual breakthroughs and bring the speed and machine comprehension needed to harness dark data.