Conference on Internet Privacy, Social Networks and Data Aggregation (Mar. 23, 2012)

I recently attended the Conference on Internet Privacy, Social Networks, and Data Aggregation which was held at my old law school, Illinois Institute of Technology (IIT), Chicago-Kent College of Law. The conference was hosted by the Center for Information, Society, and Policy on Friday, March 23, 2012. There were a number of interesting speakers some of which I have listed below (see the complete conference agenda) with some of my thoughts on the respective issues they covered.

Prof. Harry Lewis (Prof. Computer Science, Harvard University): Prof. Lewis gave a presentation from the non-lawyer perspective about his worries for large scale data aggregation. He made an interesting point concerning the number of unregulated and unproblematic ways in which data can be collected but how, when aggregated, the same information may be very problematic. His primary example was the idea of a Where’s George-type project to collect license plate numbers and locations. While there does not seem to be any legal issues with a private citizen collecting a license plate number they see in public, an aggregated collection would be useful for spouses, parents, and others for seeing where someone has been and this data tends to make people feel like there has been an invasion of their privacy. Prof. Lewis’ example reminded me of something I stumbled across a few years ago in the Federal Register — a proposal to NHTSA to require a LED light on vehicles that would continuously broadcast the VIN number of the vehicle, to allow detectors at certain intersections to detect stolen vehicles. NHTSA rejected the proposal because of the privacy issues but the scenario presented by Prof. Lewis avoided these privacy issues because, under his scenario, the data was collected by private people not regulated by privacy laws in the same way as the government. Prof. Lewis did not advocate for any particular solution, he only highlighted some of the problems and the questions they raise.

Christopher Soghoian (Former FTC In-House Technologist, Division of Privacy and Identity Protection): Mr. Soghoian’s presentation, entitled Online Tracking 101, was very informative about the technologies on both sides of the battle between advertisers, data aggregators, browser makers, and privacy advocates, including cookies, flash cookies, behavior fingerprinting, opt-outs, and do not track. Although I found the presentation informative, it also made it obvious how much of this debate has focused on a increasingly narrow slice of were personal data may be collected — the browser. Increasingly, the lines distinguishing “online” behavior and “personal data” have blurred. What is the equivalent of “do not track” for an Xbox, DVR, vehicle GPS, or a internet-enabled electric meter? All these devices collect data and can be susceptible to the same types of tracking, however, listening to Mr. Soghoian and the other speakers at the conference one would think the issue is limited to internet browsers. I also can not help but think that the opening of government data (something I support) is aiding in the erosion of privacy. I was recently at a Chicago Data Science presentation at which the impending release of Chicago’s 311 service call data was discussed. I don’t expect that this data will contain the names and phone numbers of the callers, but another point made by Mr. Soghoian and other speakers was the difficulty in successfully de-identifying any data. Further, the 311 data release is yet another example of data collected outside of a browser.

Richard Warner (Prof. Law, Chicago-Kent) and Robert Sloan (Prof. Computer Science): In a presentation on tracking and behavioral advertising, Prof. Warner set the pro-privacy goal posts at a desire for more control without giving up the advantages (relevance, efficiency, security, and personalization). Personally, I think the biggest advantage is a number of free services that are subsidized by the value of user data. He asserts that people are generally willing to trade one for the other (but want a better trade-off), yet too often users give in to the deal offered by web sites. Robert Sloan equated the situation to the game-theory concept of a one-sided game of chicken, in which the rewards are greater for the advertisers-aggregators and the down-side is minimal. Prof. Warner then asserted that a more prefect technology (by a number of measures) to block tracking would cause a greater down-side and lead to a better trade-off deal. Prof. Warner thinks that “do not track” is close to prefect and that, if implemented correctly, it could give rise to a new set of social norms around privacy. I think, however, that the increase in non-browser-based internet-enabled devices has already made “do not track” ineffective. As a result, I think Prof. Warner’s estimation of the extent to which “do not track” is adaptable and the amount of enforcement needed to implement it is way off. I asked about the growth of non-browser tracking during the question and answer period to which Prof. Warner asserted that the new social norms would hopefully spread to these other situations. However, I still think this is a long shot given the pace at which social norms develop.

Jon M. Pena (Prof. Engineering and Public Policy / Electrical and Computer Engineering, Carnegie Mellon University): In advocating for a less adversarial approach between privacy advocates and advertisers and data aggregators, Prof. Pena highlighted the benefits of personal data collection beyond targeted advertising. I think Pena was correct on this point. My recent experiences in a course on natural language processing have made it clear how important good training data is to application development. For example, good data on the language used in real life email messages (e.g., including texting short hand, leet-speak, etc.) have a direct impact on how well features, such as spell-checkers, function. The availability to opt-out of data collection obviously has an impact on applications that rely on accurate data of real life usage like search engines, spell-checkers, and translation software.

Prof. Pena also talked about the extent to which users are in the dark about what data is being collected. I would add that many of the audience questions revealed the truth of Prof. Pena’s assumptions about user perceptions. For example, one attendee seemed concerned about who was reading emails that failed to show up in her inbox. The nature of the conference (i.e., free CLE credit for attorneys) ensured that many people in the audience were not already informed about the issues presented, so it was some-what interesting to see the reactions of the audience members (e.g., shock that Google uses the words in your emails to serve ads?). I think the extent of user ignorance further weakens Prof. Warner’s argument (see above) concerning the development of social norms on privacy. How can norms develop when most people are in the dark about what is actually happening with their data? Pena also presented a “systems view” and some projects at CMU that used network monitoring to detect spy-ware infections and P2P file-sharing to “gently” warn users away from certain behaviors.

Henry H. Perritt, Jr. (Prof. Law, Chicago-Kent): Prof. Perritt, for whom I used to be a research assistant when I was in law school, presented a series of propositions on which he thinks the debate around online privacy should focus. These points were a mixed bag — I agreed with some and disagreed with others. I think his call for greater enforcement of self-imposed privacy policies would be counter-productive because enforcement will create a strong disincentive to adopt strict or clear policies voluntarily. I think his call to have consumers take responsibility for the information they post online (citing the example of a Georgia high school teacher who was fired for posted images of her drinking on vacation on her Facebook page) is based on many false assumptions. Many of the horror story cases presented during the conference did not involve a user decision to post data online.