"The data lifecycles collides with the system lifecycles. It’s a classic."
Let’s talk about the paradoxes of Data: Data Lifecyle, Search and Data Catalog!
What a fantastic chat Ole Olesen-Bagneux and I had! Ole is writing his O’Reilly book Enterprise Data Catalog, has newsletter Symphony of Search. He brings in a new perspective from Library and Information Science and is a great advocate for transforming the way we think around data and search.
Ole has worked as a specialist, as a leader and as an architect, and has an academical background as PhD in Information Science from University of Copenhagen.
Here are my key takeaways:
- Data Lifecycle was first mentioned as the POSMAD lifecycle
- Plan - Plan for creation
- Obtain - Acquire data
- Store - store it in a system
- Share - expose it and make it accessible
- Maintain - curate data, keep it accurate
- Apply - Use the data
- Dispose - Archive or delete
- Store, share and apply is where the business value is derived
- The points where you get value from data are normally not the same, we use to manage data.
- The work e.g. national archives do, in cataloguing, and readying data for research is done at the very last stage of the lifecycle. But the value resides much earlier in the lifecycle.
- Data-driven innovation, data-drive culture… What these terms actually mean is that we need to get better at utilizing the value insight data.
- Intangible assets hold the highest value - data is the key to value creation.
- One of the potentials of a data catalog is to push the high-level DM activities to earlier stages of the lifecycle.
- Catalogs are pushing inventory activities from the dispose phase to the store and share phase of the lifecycle.
- There is a huge difference in the perspective of an IT system lifecycle and data lifecycle.
- Data always resides in a system, and that system has its own lifecycle. These lifecycles do not match.
- If you do not maintain data in your systems, any potential data migration becomes exponentially more difficult. What do we migrate, what do we keep, what do we delete?
- The solution can be a Data Catalog and/or metadata repository with retention policies for data.
- The distinction between searching in and searching for data has become really important due to the rise of data science.
- When you search for data, you are looking at data sources with potential value to search in.
- Metadata is key in searching for data - that means we have to manage the metadata lifecycle as well.
- A data Catalog is basically just a search engine.
- Data Catalogs rely more and more on the same technology components as search engines for the web, e.g. knowledge graphs.
- The key capability of data catalogs is a metadata overview over the data in your company.
- Data catalogs have an untouched potential to ensure data lifecycle management