A March New York Times article sounded warning bells for
researchers: the scourge of dark data. Dark data doesn’t refer to anything
secret or illegal, but rather data developed by the government and other
organizations subject to loss. A more complete definition, often used in the
corporate context, is "the information assets organizations collect, process and store during
regular business activities, but generally fail to use for other purposes.” Concern
over the loss of data that could lead to new discoveries has been especially
equated with the loss of scientific data stored by agencies and other
organizations. Much of this data is stored on government servers, with no legal
obligation to remain available. The Trump administration’s proposed cuts to
scientific research and agency funding has only increased the alarm felt by
scientists and other researchers.
An additional problem is that dark data, by
definition, is unknown. It can’t be verified if it can’t be found, even though
we know it’s there. Somewhere. Right now, data.gov
is the central repository for government created databases, but it relies on
agencies to self-report and is, by many researchers’ estimates, only a fraction
of data created by the agencies. The use of proprietary code and data.gov’s practice of linking to data housed
on websites, instead of the databases themselves, makes it even more difficult
for researchers.
While there does not seem to be any federal
legislation prohibiting the destruction or decentralization of these types of
data, several non-profits have formed to save this data from going dark, by identifying
and downloading data viewed as
vulnerable to deletion.
To learn more about dark data, here are some
resources to get you started:
Politics:
Turbulence Ahead (Nature)
A
Decade of Discovery (podcast)
Comments
Post a Comment