Administrative Data TooKit
Purpose
Administrative data are those that are routinely collected for operational purposes such as administering public services, rather than for a specific research objective. Administrative data could be used effectively for evidence-based policymaking, as it provides access to comprehensive and long-term information, greater accuracy of research insights, and a faster turnaround of analyses and results. To this end, this toolkit is intended to act as a resource for enabling effective and safe use of administrative data by strengthening administrative data systems and processes. The toolkit and the associated checklists are intended for the audience of the wider central and state government stakeholders like departments and line ministries that collect and digitally store administrative data.
In this toolkit, we aim to center our processes and checklists on the lifecycle of the administrative data in a digital information system. A data life cycle is the order of stages that data go through in an information system — from its initial generation or collection to its archival or destruction.
This practical toolkit is an accompaniment to the white paper titled “Administrative Data for Research and Evidence-based Policy” which summarizes the advantages of using administrative data and discusses some of the best practices that can ensure better quality and usability of administrative data. It also highlights via case studies and examples how government institutions have adopted some of these best practices and developed effective and replicable solutions to use administrative data.
The following key sections of the toolkit may enable the readers to set up actionable processes and checklists while working with any administrative data:
1. Data Quality - The quality of data reflects the fitness of data for usage. It entails the following dimensions – relevance, accuracy, timeliness, accessibility, interpretability, and coherence[1].
2. Data Handling - Data handling ensures that data are stored, archived or disposed of in a safe and secure manner before, during, and after its usage.
3. Data Security - Data security is critical for protection of confidential data[2] and prevention of breach of information or its misuse towards unintended objectives.
4. Data Privacy - Maintaining data privacy protects personal or sensitive information while enabling data access by using techniques such as de-identification of data, masking, anonymisation or pseudonymisation. While enabling data access by using techniques such as de-identification of data, masking, anonymisation or pseudonymisation[3].
Stages of data life cycle
The above defined themes of data standards may not be executed on any administrative data in a sequential manner. However, these themes are interspersed with the various stages of the data, and it is important to note that each data standard or theme (of data quality, handling, security and privacy) does not function independently at an individual stage. Practically, they would overlap across stages of the data lifecycle and must be executed in parallel. The data lifecycle comprises five phases in general: Collection, Storage, Usage, Sharing and Archiving. It is important to note that the overall efficacy of the administrative data depends on how the four cross-cutting data standards are executed across each of the five phases of the data lifecycle.
The first phase of the data life cycle is the collection or capture of data. At this stage, it is important to document how the data are collected. Metadata standards should be uniformly applied and information on transformations of data should be well-documented.
Since administrative data contains identifiers such as personal information, socio-economic markers, or other sensitive information, it is essential that security and privacy measures are followed at this stage.
In order to be used effectively for statistical analysis or informing forward-looking decision-making, administrative data must undergo processes that retain its quality and make it suitable for analysis such as outlier detection and linking of datasets. At this stage, ensuring the safe and appropriate usage of data remains an important process consideration.
Inter-agency data sharing can enable new and innovative uses of data, both within and beyond administrative systems. It is important to share data in a way that protects privacy and confidentiality while making the data useful to inform decision-makers.
Data archiving is the stage at which data that is no longer actively used, cataloged for long-term retention. At this stage, it’s important to have strong security and privacy measures in place.
When data agencies create high-quality policies and practices that govern the various phases of the administrative data life cycle, they can be confident they are on the right path to effectively and safely utilize administrative data to answer critical stakeholder questions and to inform decisions to support continuous improvement.