Skip to main content

The Problem of Dark Data

A March New York Times article sounded warning bells for researchers: the scourge of dark data. Dark data doesn’t refer to anything secret or illegal, but rather data developed by the government and other organizations subject to loss. A more complete definition, often used in the corporate context, is "the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes.” Concern over the loss of data that could lead to new discoveries has been especially equated with the loss of scientific data stored by agencies and other organizations. Much of this data is stored on government servers, with no legal obligation to remain available. The Trump administration’s proposed cuts to scientific research and agency funding has only increased the alarm felt by scientists and other researchers.

An additional problem is that dark data, by definition, is unknown. It can’t be verified if it can’t be found, even though we know it’s there. Somewhere. Right now, data.gov is the central repository for government created databases, but it relies on agencies to self-report and is, by many researchers’ estimates, only a fraction of data created by the agencies. The use of proprietary code and data.gov’s practice of linking to data housed on websites, instead of the databases themselves, makes it even more difficult for researchers.

While there does not seem to be any federal legislation prohibiting the destruction or decentralization of these types of data, several non-profits have formed to save this data from going dark, by identifying and downloading  data viewed as vulnerable to deletion.

To learn more about dark data, here are some resources to get you started:




Dark Web: Exploring and Data Mining the Dark Side of the Web, Hsinchun Chen

Comments

Popular posts from this blog

Legal Research AI Gains Venture Capital

The legal research company Casetext has announced that it has acquired $12 million in venture capital to expand on its CARA ("Case Analysis Research Assistant") AI software, a virtual research assistant currently capable of scanning a legal brief and retrieving cases relevant to but not cited in the brief.

CARA is not alone in the world of legal AIs.  When it was created last year, it joined the ranks of AIs including ROSS, an IBM Watson-based legal research AI, DoNotPay, a website founded in 2015 to automate the preparation of parking ticket appeals, and an amateur AI judge capable of predicting European Court of Human Rights decisions with 79% accuracy.

The Congressional Report on the Executive Authority to Exclude Aliens Released Days Before Immigration Ban

On January 27 President Donald Trump signed an Executive Order, Protecting the Nation from Foreign Terrorist Entry Into the United States. Four days earlier, on January 24, the Congressional Research Service released its own report:  Executive Authority to Exclude Aliens: In Brief.
To those unfamiliar, the Congressional Research Service (CRS) is a federal legislative branch agency, housed inside the Library of Congress, charged with providing the United States Congress non-partisan advice on issues that may come before Congress, including immigration.
Included in the report are in-depth discussions on the operation of sections of the Immigration and Nationality Act (INA) in the context of the executive power . Discussions of sections 212(f),  214(a)(1) and 215(a)(1) report on how the sections have been used by Presidents, along with relevant case law and precedents. Most interesting is the list of executive orders excluding some groups of aliens during past presidencies; the table all…