Skip to main content

The Problem of Dark Data

A March New York Times article sounded warning bells for researchers: the scourge of dark data. Dark data doesn’t refer to anything secret or illegal, but rather data developed by the government and other organizations subject to loss. A more complete definition, often used in the corporate context, is "the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes.” Concern over the loss of data that could lead to new discoveries has been especially equated with the loss of scientific data stored by agencies and other organizations. Much of this data is stored on government servers, with no legal obligation to remain available. The Trump administration’s proposed cuts to scientific research and agency funding has only increased the alarm felt by scientists and other researchers.

An additional problem is that dark data, by definition, is unknown. It can’t be verified if it can’t be found, even though we know it’s there. Somewhere. Right now, data.gov is the central repository for government created databases, but it relies on agencies to self-report and is, by many researchers’ estimates, only a fraction of data created by the agencies. The use of proprietary code and data.gov’s practice of linking to data housed on websites, instead of the databases themselves, makes it even more difficult for researchers.

While there does not seem to be any federal legislation prohibiting the destruction or decentralization of these types of data, several non-profits have formed to save this data from going dark, by identifying and downloading  data viewed as vulnerable to deletion.

To learn more about dark data, here are some resources to get you started:




Dark Web: Exploring and Data Mining the Dark Side of the Web, Hsinchun Chen

Comments

Popular posts from this blog

Spying and International Law

With increasing numbers of foreign governments officially objecting to now-widely publicized U.S. espionage activities, the topic of the legality of these activities has been raised both by the target governments and by the many news organizations reporting on the issue.For those interested in better understanding this controversy by learning more about international laws concerning espionage, here are some legal resources that may be useful.

The following is a list of multinational treaties relevant to spies and espionage:
Brussels Declaration concerning the Laws and Customs of War (1874).Although never ratified by the nations that drafted it, this declaration is one of the earliest modern examples of an international attempt to codify the laws of war.Articles 19-22 address the identification and treatment of spies during wartime.These articles served mainly to distinguish active spies from soldiers and former spies, and provided no protections for spies captured in the act.The Hagu…

Citing to Vernon's Texas Codes Annotated: Finding Accurate Publication Dates (without touching a book)

When citing to a current statute, both the Bluebook (rule 12.3.2) and Greenbook (rule 10.1.1) require a  practitioner to provide the publication date of the bound volume in which the cited code section appears. For example, let's cite to the codified statute section that prohibits Texans from hunting or selling bats, living or dead. Note, however, you may remove or hunt a bat that is inside or on a building occupied by people. The statute is silent as to Batman, who for his own safety, best stay in Gotham City.
This section of the Texas Parks and Wildlife code is 63.101. "Protection of Bats." After checking the pocket part and finding no updates in the supplement, my citation will be:
Tex. Parks & Wild. Code Ann. § 63.101 (West ___ ). When I look at the statute in my bound volume of the Texas Parks and Wildlife Code, I can clearly see that the volume's publication date is 2002. But, when I find the same citation on Westlaw or LexisNexis, all I can see is that the …