Tag Archives: search

Deleting the Law

Back in May, Christine Kirchberger posted an interesting quote from 1968 relating to the growth in the size of the law. Ms. Kirchberger goes on to briefly argue that perhaps a formal system of identifying and deleting “non-relevant legislation and case-law” could help improve the performance of legal information retrieval (IR) systems (i.e., something akin to the delete movement in the area of privacy regulation).

Although I understand the frustration of dealing with an ever-growing mountain of data, I think the solution to this challenge is in improvement of IR technology, not forcibly reducing the amount of content to be indexed. Further, the assumption that non-current legal information would be excluded from IR systems is simply wrong. In a common law system especially, case law is never really non-relevant no matter how much time has passed without it being cited or referenced. In addition, there are numerous research scenarios in which historical information (what the law used to be at a particular time in the past) is the goal (e.g., auditing, litigation over past actions, etc.). While I admit many laws could be simplified or reduced in size, much of the growth in the law is more likely due to an increasingly complex society and the incremental way in which the law grows.

Paper Addresses the Evolution of Legal Langauage, Legal Memetics

I finally got around to reading Legal N-Grams? A Simple Approach to Track the ‘Evolution’ of Legal Language, which I mentioned in an earlier post about the launch of LegalLanguageExplorer.com and I think the authors raise a number of interesting issues. It seems the focus of the research and the related Legal Language Explorer tool is the practical application of legal memetics, i.e., the study of the evolution and adoption of language and concepts in the legal culture. The authors cite, as an example, the development of Justice Holmes’ famous phrase “Clear and Present Danger” from Schenck v. United States, 249 U.S. 47 (1919).

The Legal Language Explorer appears to be the authors’ first step toward a more comprehensive system for the empirical study of legal memetics. The most interesting part of the paper was the authors’ speculation on the number of possible future directions in which the this research could proceed. It seems to me there are two items that should be at the top of the authors list (one easy, one not-so-much): including data on judicial authorship and including data from secondary law sources.

Authorship: Judicial authorship is something the authors specifically mention is available through the Supreme Court Database (at least for the period after the start of the Vinson Court). Integrating data on which judge authored particular opinions would more accurately relate legal concepts and language to the person actually using it — by the authors’ own description the evolution of legal terminology and concepts is tied to jurists not to the issuing body behind the document.

Secondary sources: I see one of the major problems of legal memetics being the fact that much of the cultural development of particular concepts and terms occurs in materials protected by intellectual property rights — law journals, law reviews, treatises, and model laws are just a few examples. Next to court opinions, these sources are the places where legal memes are born or evolve. For example, how can you study the development concepts and phrases related to product liability without tracking usage in the Restatement of Torts, Second (a model law published by a private company). Obviously, the authors may not be able to offer a solution to this problem, but I would have liked for the paper to acknowledge the exclusion of such sources and the impact on using only court opinions to study legal memetics.

The Competency of the Legal Information Industry at the “Entity Web”

Matthew Hurst (@ Data Mining) recently posted about the concept of the “entity web” to describe companies involved in web-based information retrieval that are evolving into more than search engines for retrieving textual documents. Hurst speculates about the corporate skill set that will be needed to deliver on this concept, which he terms the three competencies: Understanding (1) the Web (e.g., HTML, CSS, AJAX, and other web technologies); (2) the world (i.e., the real world relationships between data points, such as that a song has an artist); and (3) Web presence (e.g., how entities appear and interact on the web). Of course, competencies (2) and (3) include the ability to record and use this knowledge in some structured model. I characterize what Hurst is discussing as integrating semantic data into existing textual search services. I also think the term “entity” is a bit limited because is implies the data is focused only on the actors (people, organizations, websites, document sources, etc.) when the users information needs may not be focused on entities at all (e.g., asking a system how photosynthesis functions or the answer to 1 + 1). Whatever you label it, Hurst is right about the direction in which we seem to be headed and when you think about how the traditional legal information industry measures up on these competencies, things do not look very good.

the Web.— Hurst comments that this is an area in which the broad market players (Google, Facebook, etc.) have largely mastered (but have room to improve). On the legal side, I would say that large legal publishers have suffered from many of the same problems of other older companies when it comes to embracing web technologies. Namely, that they tend to lag too far behind in adopting the newest web technologies. They also have a hard time building institutional knowledge in this area because they often outsource this type of work to vendors and let some departments have too much influence (e.g., marketing and communication, public relations). Overall, I would say the legal information industry is obviously not as competent as the big tech companies in this area but they generally do well with deploying established web technologies and are on par with other older companies when it comes to adopting the newest technologies.

the World.— This is probably the area in which the traditional legal information industry is the most competent but even here I think there are many reasons to be worry about the future. There is a high degree of competency in this area because traditional legal publishers have spent a long time developing institutional knowledge related to all the intricacies of government data and distribution. Other than perhaps law librarians, there a very few places that foster this kind of knowledge. I think this institutional knowledge is, however, at risk because many legal publishers have increasingly outsourced or automated the very functions which gave rise to this knowledge-building.

Web Presence.— This is probably the area in which traditional legal publishers are the weakest. In the legal field, an complete understanding of web presence would involve all the various actors interact on the web (e.g., legislators, courts, state and federal agencies, lawyers, etc.). Although traditional legal publishers are most familiar with official entities involved in issuing documents (legislatures, courts, etc.), they are much less familiar with entities that discuss or debate the legal content (blogs/blawgs, social networking sites, law firms, political and legal discussions by non professionals, etc.). A future entity web information retrieval system might need to track these sources to know that ‘Obamacare’ refers to the Affordable Care Act or that while a particular judge has not ruled on an issue his wife belongs to a group on Facebook against the issue.

The High Costs of Search Illiteracy for Lawyers

“Search Literacy” is the term I have decided to use to refer to a person’s ability to use search technology and interpret queries and results. Search technology has become so integrated into our everyday lives that being search literate should be considered a basic skill. A recent post by Christopher Danzig @ Above the Law touched on this concept, which has also come up in a few of my recent posts. Danzig’s post highlights a recent decision by a federal District Judge that allowed a large pharmaceutical company to avoid the enormous costs it would have faced for its own lack of search literacy. According to Danzig:

During discovery, I-Med agreed to a forensic keyword search of its computer networks, servers, and storage devices. I-Med made the mistake of not limiting the search to active files or particular time periods. The company also allowed search of “unallocated space,” where deleted and temporary files are kept.

The uber-broad search criteria turned out to be a problem, unsurprisingly. Just in the unallocated space alone, the term generated more than 64 million hits, which represented somewhere around 95 million pages of data. Yeesh. I-Med realized they shouldn’t have agreed to the search conditions, because conducting privileged review on that scale would cost so much money and time. The company asked the magistrate judge in the case for relief from the stipulation.

The company was allowed to avoid the consequences of its lack of search literacy, but the debacle obviously highlights the risk of a lawyer not being search literate. I don’t know if the lawyers responsible were held accountable in some way but I would have liked to see the company face some consequences to create an incentive for others to not make such obvious mistakes. If a lawyer cost a client money because they failed to understand the accounting involved in a settlement agreement, I doubt a judge would be so understanding. The judge in the case sounded more search literate, however, than the lawyers for either party by including in his decision a footnote which complained about the way the parties made their arguments. Specifically, the judge noted that

“[I]t is troubling that the parties refer to the number of raw hits as though each represented a separate document. Given the volume of hits and search terms used … it stands to reason that at least some files mentioning product lines would make reference to more than one at the same time. Consequently, the [c]ourt is left to wonder whether the total hit and estimated page numbers are genuinely correct.”

This incident and the issues of search literacy raised in my previous posts seem to focus on two types of people with search literacy problems. The first type are those who tend to be younger and might be overly reliant on search technology and, therefore, lack some other research skills, such as using proper judgment in assessing the source of information. The second group (likely those in the example above) are those who tend to be older and might suffer from a lack of familiarity with search technology. I previously mentioned ways I think research platforms and other information retrieval systems can aid the first group but features that specifically aid the second group are somewhat tougher to envision. It seems entirely likely that the lawyers that actually signed off on the above e-discovery stipulation were senior lawyers who tend to delegate legal research and, therefore, have an even more limited exposure to search technology than the average lawyer.