LexPredict Challenge at LexHacks 2015 (Winner!)

This weekend I entered LexHacks 2015 (June 6–7), a hackathon focused on the legal industry and my team won (yay!) the LexPredict challenge, which was to build a parser that could scan a mix of unstructured contracts and identify/extract the names of the parties, the effective date, and the termination date or clause. The hackathon involved a series of specific challenges posted by the event sponsors. Attendees formed teams and, after the two days, submitted a solution.

lexhacks

I decided to focus on a $500 challenge posted by LexPredict, which involved scanning a corpus of over 30,000 unstructured contract documents and trying to extract the parties and certain dates. It seemed obvious from the beginning that the “proper” solution to the problem was to utilize natural language processing methods to extract the data. However, my team’s approach to having something useable within the two-day window of the hackathon involved mixing some basic NLP techniques with some standard text parsing (i.e., using Python).

The typical NLP workflow would have involved several steps, including sentence segmentation, word tokenization, part-of-speech tagging, named entity extraction, and relationship filtering. It would have also required establishing a training set (i.e., a subset of documents tagged with correct answers) and “training” the algorythms used for several of the steps (sentence segmentation, chunking/tokenizing, etc.). I quickly settled on trying to use the Stanford Natural Language Tool Kit (NLTK 3) as part of my solution. I had previously (years ago) taken the Coursera course offered by some of the same Stanford professors that created the toolkit, but I certainly am no NLP pro. While I expected to avoid using the NLP toolkit in favor of custom parsing rules in some places, I also expected to be able to rely on it for some items that were not especially unique to contracts (e.g., identifying names of people).

Luckily, I was able to form a team that included at least three other lawyers and two other developers. In the end, what we understood to be a difficult problem only seemed more so in practice. However, we managed to use the NLP toolkit to tokenize each contract and crafted several token parsing patterns that were able to find and extract some parties and dates. We found that without further parser training, the NLP toolkit did not extract people names as reliably as we wanted (too many false positives). The wide variety of documents in the corpus frequently became the source of pain in writing customized rules. Everyone on the team seemed to walk away with a deeper understanding of this type of NLP problem and their own ideas for how they might proceed – if they had more time.

By the end of the event, everyone seemed happy about what we were able to accomplish in only two days. Overall, it was an interesting event at which I met a lot of new people. Winning our challenge was just a extra bonus.

The team members were: Edward Bryant, Chase Hertel, Tomek Rabczak, Tetyana Rabczak, Bharat Lavania, and Jon Riley.

Starting New Online Course – HTML5 From WC3

I recently signed up to take my first edX course – Learn HTML5 from WC3. Although I felt like I knew HTML5 fairly well, I don’t have any formal training in web development, so I thought a course should be easy and give me something more concrete to put on my LinkedIn page. On top of that, I liked the idea of a course certified by the W3C itself. While I have taken many online courses, this is my first through edX. The first week of material was posted June 1 and I finished it already. My initial impression was that edX seems heavy on text materials (in contrast to the video-centric approach over at Udacity). I also discovered that there is always something else to learn about HTML5.

edx-logo

I knew about the new HTML elements and how to use them, but it turned out I didn’t know very much about microdata (for a good intro, see Chapter on Extensibility from Dive into HTML5 by Mark Pilgrim ). I had used the microformats approach before, but I wasn’t aware of the new microdata approach. I always wonder about the adoption rate for these types of extra semantic markup. However, I decided to take a closer look at microdata and try to find ways to add it to my HTML5 markup.

Chicago Legal Innovation and Technology Meetup

I recently attended a meetup by the Chicago Legal Innovation and Technology Meetup group (May 28, 2015), a local group that is part of a larger movement to incorporate more technology in the legal industry. The meetup, which was hosted at Chapman and Cutler, was a series of four speakers:

  • Jason Dirkx: White Glove to Brown Box: Overcoming Challenges in Developing Legal Products
  • Aileen Tien: Beyond Linear Review: Ideas for Advancing Technology within the Legal Industry
  • Michael Nogroski: Connection Not Reaction: How Real Teams Effectively Interact with Technology
  • Cathy Deng: Expunge.io: an Online Platform for Juvenile Expungement

For the most part, the material was all interesting but the mix of topics again made it clear that there are two distinct groups when it comes to technology applied to law. On one side, there are legal industry insiders who view technology as a way to improve the existing legal industry (e.g., law firms developing legal products, vendors selling software to law firms). On the other side are those developing technology with the aim of disrupting the legal industry, such as helping those the legal system has not served well (e.g., expunge.io app). This distinction is not limited to this meetup – it can be seen in the broader world of legal tech, but I thought it was interesting that you could see this divide among the meetup’s speakers. I thought the highlight of the night was when developer Cathy Deng called out lawyers for being good at focusing on accuracy and completeness when communicating, but not being very good (or very concerned) about clarity. She accurately characterized her own project (expunge.io) as more of a communication solution that uses technology – not a technology solution.

The group responsible for the meetup is (to their credit) also trying to move beyond presentations by hosting a hackathon (lexhacks.com) on June 6-7, 2015. I registered for the event and plan on attending, but I have some reservations about the specific tech challenges offered by the event’s sponsors (but more on that in another post).

Corporate Gauntlet Project Artwork, Prototype

So, while I was going through various old personal projects looking for work to include in my new portfolio, I came across the below pixel art. The project was an idea for a solitaire-type card game that was a combination of the game mechanics of a collectible card game and humor about the corporate world. It was titled “Corporate Gauntlet” and what I built was too incomplete for my portfolio, but I thought I would post the card back artwork (which I still love!) and a link to an early incomplete prototype. I would still like to return to this project, hopefully some day I will.

card back
card back for old prototype game

ABA Article on Hackathons Depressingly Out-of-Touch

Although media coverage of hackathons designed to spur innovation in the legal industry would seem to be an entirely positive thing, an ABA article on the topic somehow manages to sound dated and out-of-touch. The article, ‘Hackathon’ movement may jump-start a legal career, depressingly spends half of its focus addressing the problem of the negative stigma surrounding the word term “hacker.” This would indeed be a useful topic – about a decade ago. I am sure there are plenty of older lawyers that still view the term negatively, but they must be a minority (even in law) and it seems sad to waste half of the media coverage to address the fears of a uninformed lawyers. An article that offered more specifics on the utility of these events in this specific industry would have been a million times better. Too me, this article was depressing because it read like a piece about about sailing that wasted its time on the fear that people might sail off the horizon. In the end, if you are the type of lawyer that fears the term hacker, you probably will not be going to any hackathons anyway.

Oh Noooo!
Oh Noooo! Hackers!

Why Hackathons May Not Be the Answer for Legal Technology

I just found out about the upcoming Chicago LexHacks event here in Chicago, which will be held June 6–7, 2015. It is a hackathon focused on the legal industry. In light of my background as both a lawyer and a developer, this event sits right at the center of my interests. However, I have mixed feelings about the potential for hackathons targeting the legal industry. It seems clear that the legal industry is likely to face several disruptions due to the impact of technology, but I think there are several reasons technological innovation has moved slowly in this particular industry and why hackathon-type events may do little to speed things along.

  1. Crossover with General Purpose Software: Although law is a very specialized field, many tasks within law are addressed by software designed for a more general audience (e.g., word processors, spreadsheet software, billing/invoicing, file storage, etc.).
  2. Too Many Closed Proprietary Systems: In the few areas that the legal industry relies on specialized software (e.g., practice management, legal research), the market is dominated by private closed systems – and lawyers have not shown a preference for adopting open systems which would allow for more innovation.
  3. Fragmented Market: The greatest need for specialized tools seems to be at the level of specific legal specialties, but the number of such specialties, their differing needs, and the size of each, would seem to make the business of serving them mostly unprofitable.
  4. Complex Problems: Most problems faced by the legal industry that seem ripe for a technological solution are complex and require more advanced technologies (e.g., Natural Language Processing) which less likely to come out of a hackathon-type event.

On the upside, the market for software that goes around the traditional legal industry (e.g., RocketLawyer, Shake) seems to have the most potential.

Basic Project Demo Hosting with Github

Some of my fellow Udacity Nanodegree students were discussing ways to share github projects created during the program, so I wrote up this short explanation of how to use gh-pages to host a project demo.

There are numerous ways to share work on Github with others. My (very) basic approach is to link directly to the github pages when you want others to see the code itself, and use a gh-pages branch to host a demo version of any HTML components of a project. So, after a git repo is up and running (and after you successfully commit and push your changes), you can use:

The checkout command changes to the gh-pages branch (the -b switch is only needed the first time). The rebase command causes the gh-pages branch to reflect any changes to the master branch. The push command updates your gh-pages branch on github. Finally, the last command returns you to the master branch. Because of the need to keep the gh-pages branch up-to-date with changes, I generally wait until nearing the end of a project (if possible) before creating a gh-pages branch.

After running these commands you should find a link to your project in your repo’s settings on GitHub. However, please be aware that GitHub (by default) provides a link to index.html within your gh-pages branch. If you want to link to a different file, you will have to edit the URL accordingly.

Open Government, Chicago Municipal Code Meetup

I recently attended a meetup by Open Government Chicago(-land), a local open government / open data advocacy and networking group. The meetup was titled Process and Products Around the Chicago Municipal Code and it was probably one of the best meetups I have been to on these topics. There was a bunch of speakers that included the Chicago city clerk, Susana Mendoza, and Carl Malamud, the open law data activist (PublicResource.org), as well as a number of others.

open gov chicago badge

The highlight of the evening was probably hearing Carl Malamud blasting away (as he should) at the private entities that develop model codes and there attempts to fight the concept that a model code that is adopted as law becomes public domain. It was also interesting to hear Waldo Jaquith talk about his recent projects (with funding by the Knight Foundation), especially his new plan for a U.S. Open Data Institute.

While it was great to hear my local elected city clerk be supportive of the efforts of open government data activists, it was frustrating to hear them contradict themselves. On the one hand, they described their duties to publish the law how crucial the task of codification was (i.e., how lawsuits have been won or lost on the placement of a comma), yet they vaguely claimed they could not possibly put a bulk version of the law on a website themselves and not address why they choose to out-source codification of the code to a private vendor (American Legal Publishing Corporation). As someone who works at a private publisher that independently codifies the code of all 50 states (mostly because of the wait time involved in waiting for official codifications to be released), I think I have a good understanding of the resources and cost involved in that type of work and I can’t see any reason the city could not do this work itself.

A Slight Turn in the Road

I have come to a bit of a turning point in my career and the focus of this blog will soon change slightly as a result. I received all my formal education in law or law-related areas — i.e., majoring in political science as an undergraduate and getting my law degree. However, technology, computer science, and programming was also something I was always passionate about. Where possible, I have tried to combine these interests: studying technology policy and intellectual property law, working in Silicon Valley before the dot-com crash, and ultimately working in a transforming legal publishing industry. Along the way, I taught myself an assortment of technology skills (HTML, XHTML, HTML5, CSS 2/3, PHP, Javascript, Python, jQuery, XML/XSLT/XSD, RDF, etc.) and used those skills on my own personal projects and on those of friends.

Until recently, my legal day job and my technology background always stood apart from one another. In the last few years, I made a concerted effort to combine these interests by applying my technology background to my job in legal publishing, which included engaging in personal study of technologies important to publishing (NLP, search), new projects at work that included programing work, blogging about publishing-related technology through a company blog, and this personal blog. After some time, however, it has become clear that the legal publishing industry is simply the wrong place to apply or sharpen my skills.

As a result, this blog will start to focus more on programming and more specifically my current focus on studying front-end development technology. My current career goal is to move into a full-time front-end development position. I will, no doubt, discuss items related to law and areas where my interests in law and technology overlap. However, I expect to spend less time on these issues unless they relate to my own personal projects.

Do Lawyers Only Copy While Everyone Else is Original?

Jean O’Grady @ Dewey B Strategic posted about the recent dismissal of White v. West Publishing Corp., calling it a victory for fair use and common sense. I obviously disagree and I think her argument about why legal briefs should not be protected reveals one of my reasons why. She re-posted her earlier argument that:

Call me a cynic but aren’t a huge chunk of the legal opinions written in this country essentially “derivative works” based on other peoples arguments and analysis? Don’t judges (or their clerks) take whole paragraphs from briefs and drop them into opinions? Don’t lawyer’s draft briefs by selecting text from judicial opinions and legal memoranda?

Does she have a point? Yes, but that point could be made against all of copyright and is not at all unique to legal briefs. Couldn’t you say the same of programmers writing a program or marketing professionals putting together market research? Are these things also not due copyright protection? Name the last movie you saw with an original plot? Copyright reformers have been arguing for years that culture itself works in much the same manner as lawyers drafting legal briefs — it involves a lot of copying. Although it is seldom admitted openly much of what we produce as a culture is the result of copying. If we are lucky a few percent might be truly original expression. As Albert Einstein said, “The secret to creativity is knowing how to hide your sources.”

UPDATE: An additional problem with arguments focusing on the nature of legal briefs (i.e., the work) is that they are really arguing copyrightability not arguing fair use, which would have to focus on the activities of the alleged infringer (i.e., what Thomson Reuters and Lexis Nexis actually do with the works). From that view it is even harder to how the commercial sale of these works jives with the fair use factors. It still isn’t clear if this dismissal was based on the copyrightability of the work (is it eligible for protection) or on fair use (how it was copied was ok).

UPDATE 2: Now Joe Hodnicki over at The Law Librarian Blog has weighed in to agree with Jean O’Grady’s view and also not making the argument that what the legal publishers did was fair use but that legal briefs do not deserve copyright protection. Again, I highly doubt there is any logical way to support the argument that a written legal argument is not copyrightable (At least one that would not invalidate all of copyright). I am also troubled by how little understanding there is about the difference between copyrightability and fair use.