Today LexisNexis announced via email it is acquiring Knowledge Mosaic, a Seattle-based publisher that specializes in federal regulatory and disclosure information, and Thomson Reuters announced it has entered into an agreement to acquire Practical Law Company, a UK-based legal publishing company which targets transactional lawyers.
I recently blogged (at the WK Intelligent Solutions blog) about Prof. Daniel Katz’s recent analysis of the impact of quantitative prediction on the legal services industry (Quantitative Legal Prediction – or – How I Learned to Stop Worrying and Start Preparing for the Data Driven Future of the Legal Services Industry, 62 Emory Law Journal ___ (2013) (working draft)). My write-up summarizes some of the important points from his analysis and attempts to extend the question by considering how these changes might also change information providers (e.g., legal publishers, data providers, etc.) that serve the legal market.
I recently blogged (at the WK Intelligent Solutions blog) about Dr. Adam Wyner’s recent study on the utility of a crowdsourcing model to markup (or annotate) court decisions. My post attempts to detail some of the problems I expect Dr. Wyner will discover with this approach, including the use of a tool based on highlighting text, the specialization of annotators and the law, considering what data is worth annotating, and thinking about incentives for annotators in a crowdsourced model. Despite these criticisms, I am excited to see this model explored in more detail.
Back in May, Christine Kirchberger posted an interesting quote from 1968 relating to the growth in the size of the law. Ms. Kirchberger goes on to briefly argue that perhaps a formal system of identifying and deleting “non-relevant legislation and case-law” could help improve the performance of legal information retrieval (IR) systems (i.e., something akin to the delete movement in the area of privacy regulation).
Although I understand the frustration of dealing with an ever-growing mountain of data, I think the solution to this challenge is in improvement of IR technology, not forcibly reducing the amount of content to be indexed. Further, the assumption that non-current legal information would be excluded from IR systems is simply wrong. In a common law system especially, case law is never really non-relevant no matter how much time has passed without it being cited or referenced. In addition, there are numerous research scenarios in which historical information (what the law used to be at a particular time in the past) is the goal (e.g., auditing, litigation over past actions, etc.). While I admit many laws could be simplified or reduced in size, much of the growth in the law is more likely due to an increasingly complex society and the incremental way in which the law grows.
I recently attended the Conference on Internet Privacy, Social Networks, and Data Aggregation which was held at my old law school, Illinois Institute of Technology (IIT), Chicago-Kent College of Law. The conference was hosted by the Center for Information, Society, and Policy on Friday, March 23, 2012. There were a number of interesting speakers some of which I have listed below (see the complete conference agenda) with some of my thoughts on the respective issues they covered.
Continue reading “Conference on Internet Privacy, Social Networks and Data Aggregation (Mar. 23, 2012)” »
Today, I attended presentation about government data hosted by Data Science Chicago, a Chicago-based meetup group. The presentation was interesting both because of the personal background of the speaker, Brett Goldstein, as well as the number of interesting projects that were discussed that are using open government data. The speaker was the former IT director for OpenTable before joining the Chicago police department.
During his presentation he explained how his role as a police officer led to founding the Chicago Police Department’s Predictive Analytics Group, an effort to use patterns in incident-level crime data to predict future incidents of crime. According to Mr. Goldstein, the group’s predictions were able to focus police patrols on 1-2% of the city (down to the census block level) in which murders or other violent crimes were likely to occur.
Mr. Goldstein is now the City of Chicago’s Chief Data Officer and, at the event, he talked about the city’s effort to make government data open to the public and a number of projects using that data. According to Mr. Goldstein, the city’s data portal has already released the incident-level crime data going back 10 years — the biggest such collection of open data in the world. His more recent efforts have focused on using MongoDB for spatially-focused time series data and the release of the city’s 311 data. The speaker also touched on a number of related topics, including the use of regression analysis, the treatment effect, the need for more useful geographical boundaries other than census blocks, and advice for aspiring data scientists on the skills needed to be effective.
Overall, an interesting presentation and it made me want to take a closer look at the data sets available through these government portals. While I was already familiar with data.gov for federal level data, I was surprised to find so much data available at my city, county, and state level.
I have another blog post, titled Understanding Big Data in the Context of Legal Publishing on Wolters Kluwer’s Intelligent Solutions Blog. This post trys to think about the big data trend and how it might apply in the context of a legal publisher, especially the duel role of government data as content and as data that can reveal patterns in a publisher’s customer market.
The glossary has been updated with the following terms: Big Data • Compliance • Concordance • Mash-Up • Natural Language Processing • Point-of-Need.
I signed up for a free online class taught by two Stanford University professors, Dan Jurafsky and Chris Manning, on natural language processing offered through Coursera back in December. While it was originally set to start in January it was delayed and will now begin March 12, 2012 (registration is still open I believe). I’ll likely post about some of the topics covered in the class, especially how they may relate to applications using legal content.
I had a blog post, titled Law as an App and the Terminology Maze on Wolters Kluwer’s Intelligent Solutions Blog addressing the concepts and terminology surrounding legal app development.