Today, I attended presentation about government data hosted by Data Science Chicago, a Chicago-based meetup group. The presentation was interesting both because of the personal background of the speaker, Brett Goldstein, as well as the number of interesting projects that were discussed that are using open government data. The speaker was the former IT director for OpenTable before joining the Chicago police department.
During his presentation he explained how his role as a police officer led to founding the Chicago Police Department’s Predictive Analytics Group, an effort to use patterns in incident-level crime data to predict future incidents of crime. According to Mr. Goldstein, the group’s predictions were able to focus police patrols on 1-2% of the city (down to the census block level) in which murders or other violent crimes were likely to occur.
Mr. Goldstein is now the City of Chicago’s Chief Data Officer and, at the event, he talked about the city’s effort to make government data open to the public and a number of projects using that data. According to Mr. Goldstein, the city’s data portal has already released the incident-level crime data going back 10 years — the biggest such collection of open data in the world. His more recent efforts have focused on using MongoDB for spatially-focused time series data and the release of the city’s 311 data. The speaker also touched on a number of related topics, including the use of regression analysis, the treatment effect, the need for more useful geographical boundaries other than census blocks, and advice for aspiring data scientists on the skills needed to be effective.
Overall, an interesting presentation and it made me want to take a closer look at the data sets available through these government portals. While I was already familiar with data.gov for federal level data, I was surprised to find so much data available at my city, county, and state level.