Monthly Archives: January 2012

SOPA, PIPA, and the Legal Content Industry

In honor of today’s web strike against two pending U.S. bills in Congress, namely the Stop Online Piracy Act (“SOPA”) in the U.S. House and the Protect IP Act (“PIPA”) in the U.S. Senate, I thought I write a little about how those bills would endanger legal content provider market in particular. The uninformed would be forgiven for thinking these bills would not impact the publishing of legal information because much of that information comes from the government and is not copyrighted — but you would be very wrong.

First, there are numerous situations in which the the public domain status of many government documents is not clear, examples include Arkansas’ publishing its laws electronically only through a commercial publisher, claims of copyright made by creators of model laws, as well as the copyrightability of foreign laws, pagination, digitization, copyrighted material republished by the government (e.g., in an appendix to a decision), documents produced by government contractors and many similar issues. One of the major problems with bills like SOPA and PIPA is that they assume copyright is obvious. Someone may claim a copyright over something but that doesn’t resolve many open questions and doesn’t even begin to deal with the situation in which an accussed infringer claims fair use. These problems have always plagued copyright law but these issues become devestating when combined with remedies that do not allow due process or impose liability for linking to infringing material. In short, even information most people assume to be safe from infringment claims (state codes, bills, court opinions, etc.) would not be safe.

Paper Addresses the Evolution of Legal Langauage, Legal Memetics

I finally got around to reading Legal N-Grams? A Simple Approach to Track the ‘Evolution’ of Legal Language, which I mentioned in an earlier post about the launch of LegalLanguageExplorer.com and I think the authors raise a number of interesting issues. It seems the focus of the research and the related Legal Language Explorer tool is the practical application of legal memetics, i.e., the study of the evolution and adoption of language and concepts in the legal culture. The authors cite, as an example, the development of Justice Holmes’ famous phrase “Clear and Present Danger” from Schenck v. United States, 249 U.S. 47 (1919).

The Legal Language Explorer appears to be the authors’ first step toward a more comprehensive system for the empirical study of legal memetics. The most interesting part of the paper was the authors’ speculation on the number of possible future directions in which the this research could proceed. It seems to me there are two items that should be at the top of the authors list (one easy, one not-so-much): including data on judicial authorship and including data from secondary law sources.

Authorship: Judicial authorship is something the authors specifically mention is available through the Supreme Court Database (at least for the period after the start of the Vinson Court). Integrating data on which judge authored particular opinions would more accurately relate legal concepts and language to the person actually using it — by the authors’ own description the evolution of legal terminology and concepts is tied to jurists not to the issuing body behind the document.

Secondary sources: I see one of the major problems of legal memetics being the fact that much of the cultural development of particular concepts and terms occurs in materials protected by intellectual property rights — law journals, law reviews, treatises, and model laws are just a few examples. Next to court opinions, these sources are the places where legal memes are born or evolve. For example, how can you study the development concepts and phrases related to product liability without tracking usage in the Restatement of Torts, Second (a model law published by a private company). Obviously, the authors may not be able to offer a solution to this problem, but I would have liked for the paper to acknowledge the exclusion of such sources and the impact on using only court opinions to study legal memetics.

The Competency of the Legal Information Industry at the “Entity Web”

Matthew Hurst (@ Data Mining) recently posted about the concept of the “entity web” to describe companies involved in web-based information retrieval that are evolving into more than search engines for retrieving textual documents. Hurst speculates about the corporate skill set that will be needed to deliver on this concept, which he terms the three competencies: Understanding (1) the Web (e.g., HTML, CSS, AJAX, and other web technologies); (2) the world (i.e., the real world relationships between data points, such as that a song has an artist); and (3) Web presence (e.g., how entities appear and interact on the web). Of course, competencies (2) and (3) include the ability to record and use this knowledge in some structured model. I characterize what Hurst is discussing as integrating semantic data into existing textual search services. I also think the term “entity” is a bit limited because is implies the data is focused only on the actors (people, organizations, websites, document sources, etc.) when the users information needs may not be focused on entities at all (e.g., asking a system how photosynthesis functions or the answer to 1 + 1). Whatever you label it, Hurst is right about the direction in which we seem to be headed and when you think about how the traditional legal information industry measures up on these competencies, things do not look very good.

the Web.— Hurst comments that this is an area in which the broad market players (Google, Facebook, etc.) have largely mastered (but have room to improve). On the legal side, I would say that large legal publishers have suffered from many of the same problems of other older companies when it comes to embracing web technologies. Namely, that they tend to lag too far behind in adopting the newest web technologies. They also have a hard time building institutional knowledge in this area because they often outsource this type of work to vendors and let some departments have too much influence (e.g., marketing and communication, public relations). Overall, I would say the legal information industry is obviously not as competent as the big tech companies in this area but they generally do well with deploying established web technologies and are on par with other older companies when it comes to adopting the newest technologies.

the World.— This is probably the area in which the traditional legal information industry is the most competent but even here I think there are many reasons to be worry about the future. There is a high degree of competency in this area because traditional legal publishers have spent a long time developing institutional knowledge related to all the intricacies of government data and distribution. Other than perhaps law librarians, there a very few places that foster this kind of knowledge. I think this institutional knowledge is, however, at risk because many legal publishers have increasingly outsourced or automated the very functions which gave rise to this knowledge-building.

Web Presence.— This is probably the area in which traditional legal publishers are the weakest. In the legal field, an complete understanding of web presence would involve all the various actors interact on the web (e.g., legislators, courts, state and federal agencies, lawyers, etc.). Although traditional legal publishers are most familiar with official entities involved in issuing documents (legislatures, courts, etc.), they are much less familiar with entities that discuss or debate the legal content (blogs/blawgs, social networking sites, law firms, political and legal discussions by non professionals, etc.). A future entity web information retrieval system might need to track these sources to know that ‘Obamacare’ refers to the Affordable Care Act or that while a particular judge has not ruled on an issue his wife belongs to a group on Facebook against the issue.