Skip to main content

The Problem of Dark Data

A March New York Times article sounded warning bells for researchers: the scourge of dark data. Dark data doesn’t refer to anything secret or illegal, but rather data developed by the government and other organizations subject to loss. A more complete definition, often used in the corporate context, is "the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes.” Concern over the loss of data that could lead to new discoveries has been especially equated with the loss of scientific data stored by agencies and other organizations. Much of this data is stored on government servers, with no legal obligation to remain available. The Trump administration’s proposed cuts to scientific research and agency funding has only increased the alarm felt by scientists and other researchers.

An additional problem is that dark data, by definition, is unknown. It can’t be verified if it can’t be found, even though we know it’s there. Somewhere. Right now, data.gov is the central repository for government created databases, but it relies on agencies to self-report and is, by many researchers’ estimates, only a fraction of data created by the agencies. The use of proprietary code and data.gov’s practice of linking to data housed on websites, instead of the databases themselves, makes it even more difficult for researchers.

While there does not seem to be any federal legislation prohibiting the destruction or decentralization of these types of data, several non-profits have formed to save this data from going dark, by identifying and downloading  data viewed as vulnerable to deletion.

To learn more about dark data, here are some resources to get you started:




Dark Web: Exploring and Data Mining the Dark Side of the Web, Hsinchun Chen

Comments

Popular posts from this blog

The Congressional Report on the Executive Authority to Exclude Aliens Released Days Before Immigration Ban

On January 27 President Donald Trump signed an Executive Order, Protecting the Nation from Foreign Terrorist Entry Into the United States. Four days earlier, on January 24, the Congressional Research Service released its own report:  Executive Authority to Exclude Aliens: In Brief.
To those unfamiliar, the Congressional Research Service (CRS) is a federal legislative branch agency, housed inside the Library of Congress, charged with providing the United States Congress non-partisan advice on issues that may come before Congress, including immigration.
Included in the report are in-depth discussions on the operation of sections of the Immigration and Nationality Act (INA) in the context of the executive power . Discussions of sections 212(f),  214(a)(1) and 215(a)(1) report on how the sections have been used by Presidents, along with relevant case law and precedents. Most interesting is the list of executive orders excluding some groups of aliens during past presidencies; the table all…

GAO Launches Government Transition App

Want to learn more about the upcoming presidential and congressional transitions? There’s an app for that. 

The Government Accountability Office (GAO) recently launched its Priorities for Policy Makers app (available free of charge for iPhone or Android), which is intended to “help President-elect Donald Trump and the next Congresstackle critical challenges facing the nation, fix agency-specific problems, and scrutinize government areas with the potential for large savings,” according to Gene Dodaro, Comptroller General of the United States and head of the GAO. The app allows users to search by agency or topic, and provides brief summaries of relevant issues as well as links to more detailed GAO reports. 

You can also find GAO priority recommendations on the agency’s Presidential and Congressional Transition web pages.