A Federated Data Infrastructure of Government Administrative Data

New scientific opportunities are emerging from increasingly effective data organization, access, and usage.  Many fields of study are transformed to a new level by new tools and data infrastructure.  For example, analysis of DNA sequence data has transformed how medical research is done.  But currently, much of these efforts are focused in the natural sciences where data is generated by digital instruments (e.g. satellite data. telescope data).  We need to push the frontier of social sciences to gain fundamental insights into the many facets of our society by doing the same with digital data available about our society.  One of the key sources of information about all aspects of our society resides in government administrative data. From the day we are born to our death, most all of our activities leave footprints in various government data systems.  The birth, marriage, and death certificates are filed with government, education records remain with departments of public instruction, and traces of employment can be found with the UI wage data.  Without a doubt, a well integrated data system that can encompass much of the government data systems will hold the footprints of our society, our social genome.  The two main hurdles to building such a system to transform social sciences is (1) privacy concerns and the laws in place to protect individual confidentiality and (2) the physiology of administrative data that is fragmented and short lived with limited data that have questionable reliability.

My research focuses on resolving these two barriers to building a federated data system of government administrative data. Once resolved, we can build the social genome center that could finally allow us to move toward understanding how current policies play out in our society and how to make better policies using information and knowledge.