Matthew Hurst (@ Data Mining) recently posted about the concept of the “entity web” to describe companies involved in web-based information retrieval that are evolving into more than search engines for retrieving textual documents. Hurst speculates about the corporate skill set that will be needed to deliver on this concept, which he terms the three competencies: Understanding (1) the Web (e.g., HTML, CSS, AJAX, and other web technologies); (2) the world (i.e., the real world relationships between data points, such as that a song has an artist); and (3) Web presence (e.g., how entities appear and interact on the web). Of course, competencies (2) and (3) include the ability to record and use this knowledge in some structured model. I characterize what Hurst is discussing as integrating semantic data into existing textual search services. I also think the term “entity” is a bit limited because is implies the data is focused only on the actors (people, organizations, websites, document sources, etc.) when the users information needs may not be focused on entities at all (e.g., asking a system how photosynthesis functions or the answer to 1 + 1). Whatever you label it, Hurst is right about the direction in which we seem to be headed and when you think about how the traditional legal information industry measures up on these competencies, things do not look very good.
the Web.— Hurst comments that this is an area in which the broad market players (Google, Facebook, etc.) have largely mastered (but have room to improve). On the legal side, I would say that large legal publishers have suffered from many of the same problems of other older companies when it comes to embracing web technologies. Namely, that they tend to lag too far behind in adopting the newest web technologies. They also have a hard time building institutional knowledge in this area because they often outsource this type of work to vendors and let some departments have too much influence (e.g., marketing and communication, public relations). Overall, I would say the legal information industry is obviously not as competent as the big tech companies in this area but they generally do well with deploying established web technologies and are on par with other older companies when it comes to adopting the newest technologies.
the World.— This is probably the area in which the traditional legal information industry is the most competent but even here I think there are many reasons to be worry about the future. There is a high degree of competency in this area because traditional legal publishers have spent a long time developing institutional knowledge related to all the intricacies of government data and distribution. Other than perhaps law librarians, there a very few places that foster this kind of knowledge. I think this institutional knowledge is, however, at risk because many legal publishers have increasingly outsourced or automated the very functions which gave rise to this knowledge-building.
Web Presence.— This is probably the area in which traditional legal publishers are the weakest. In the legal field, an complete understanding of web presence would involve all the various actors interact on the web (e.g., legislators, courts, state and federal agencies, lawyers, etc.). Although traditional legal publishers are most familiar with official entities involved in issuing documents (legislatures, courts, etc.), they are much less familiar with entities that discuss or debate the legal content (blogs/blawgs, social networking sites, law firms, political and legal discussions by non professionals, etc.). A future entity web information retrieval system might need to track these sources to know that ‘Obamacare’ refers to the Affordable Care Act or that while a particular judge has not ruled on an issue his wife belongs to a group on Facebook against the issue.