"Nota Bene" means "note this well" or "take particular notice." We at the O'Quinn Law Library will be posting tips on legal research techniques and resources, developments in the world of legal information, happenings at the Law Library, and legal news reports that deserve your particular attention. We look forward to sharing our thoughts and findings and to hearing from you.

N.B: Make a note to visit "Nota Bene" regularly.

-Spencer L. Simons, former Director, O'Quinn Law Library and Associate Professor of Law



Friday, May 19, 2017

LOC Makes 25 Million Catalog Records Available for Bulk Download


Earlier this week, the Library of Congress announced that it was making over 25 million of its catalog records available for free bulk download. These records will be available at data.gov and on the Library of Congress website at http://www.loc.gov/cds/products/marcDist.php. Previously these records were only available individually or by subscription. This new free service of the LOC will be an invaluable resource for anyone doing bibliographic research.

The records are in the MARC (Machine Readable Cataloging Records) format, the international standard for bibliographic data. To learn more about MARC records, see this tutorial on the LOC website.    

Thursday, May 11, 2017

The Problem of Dark Data

A March New York Times article sounded warning bells for researchers: the scourge of dark data. Dark data doesn’t refer to anything secret or illegal, but rather data developed by the government and other organizations subject to loss. A more complete definition, often used in the corporate context, is "the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes.” Concern over the loss of data that could lead to new discoveries has been especially equated with the loss of scientific data stored by agencies and other organizations. Much of this data is stored on government servers, with no legal obligation to remain available. The Trump administration’s proposed cuts to scientific research and agency funding has only increased the alarm felt by scientists and other researchers.

An additional problem is that dark data, by definition, is unknown. It can’t be verified if it can’t be found, even though we know it’s there. Somewhere. Right now, data.gov is the central repository for government created databases, but it relies on agencies to self-report and is, by many researchers’ estimates, only a fraction of data created by the agencies. The use of proprietary code and data.gov’s practice of linking to data housed on websites, instead of the databases themselves, makes it even more difficult for researchers.

While there does not seem to be any federal legislation prohibiting the destruction or decentralization of these types of data, several non-profits have formed to save this data from going dark, by identifying and downloading  data viewed as vulnerable to deletion.

To learn more about dark data, here are some resources to get you started:




Dark Web: Exploring and Data Mining the Dark Side of the Web, Hsinchun Chen