Over the past few weeks at a number of events and speaking engagements I’ve found myself talking about the multifaceted benefits of Data Profiling from the perspectives of:
- Complying with EU Data Protection regulations
- Ensuring Data Migrations actually succeed
- Enabling timely reporting of Regulatory risks
My mantra in these contexts seems to be distilling down to two bald statements:
- It’s the Information, Stupid.
- Profile early, profile often.
But what do I mean by “Data Profiling”? For the purposes of these conversations, I defined “Data Profiling” as being the analysis of the structure and content of  a data set against some pre-defined business rules and expectations. For example, we may want to know how many (or what percentage) of records in a data set are missing key data, or how many have inconsistencies in the data, or how many potential duplicates  there are in the data.
Why is this of benefit? While a journey of a 1000 miles starts with a single step, that journey must start from somewhere and be headed somewhere. The destination is encapsulated in the expected business rule outcomes and expectations. These outcomes and expectations are often defined by external factors such as Regulatory requirements (e.g. the need to keep information up to date under EU Data Protection principles, or the need to track bank accounts of minors in AML processes) or the strategic objectives of the organisation. The starting point is, therefore, a snapshot of how close you are (or how far you are) from your destination.
In my conversations, I advised people (none of whom were overly familiar with Information Quality principles or tools) that they should consider investing in a tool that allows them to build and edit and maintain Data Profiling rules and run them automatically. Regular Information Quality geeks will probably guess that the next thing I told them was about  how the profile snapshots could provide a very clear dashboard of how things are in the State of Data in their organisations.
Just as, when we are embarking on our journey of 1000 miles, it makes sense for us to regularly check our map against the landmarks to make sure we are heading in the right direction. The alternative is to meander down cul de sacs and dead end trails. Which equates in Information Management terms to wasted investment and scrap and rework. So, profile early and profile often seems to be a good philosophy to live by.
By applying  business rules that relate to your regulatory compliance, risk management, or data migration objectives, you can make Information Quality directly relevant to the goals of the organisation, increasing the likelihood of any changes you bring in becoming “part of the way things get done around here” rather than “yet another darned thing we have to do”.  Quality for the sake of quality was a luxury even in the pre-recession period. In today’s economy it is more important than ever to demonstrate clear value.
And that is the real profoundity of profiling. Without it you can’t actually know the true value of your Information Asset or determine if your current course of action might turn your Asset into a Liability.
It’s the Information, Stupid. So Profile Early and Profile Often.