Princeton PhD candidate develops framework for measuring web privacy

Do we need more or less privacy?
Do we need more or less privacy?

Speaking at the first ever PrivacyCon in Washington DC, Steven Englehardt, a PhD candidate at Princeton University, unveiled an open source web measurement platform which he and his colleagues developed to measure the extent of browser tracking on the web.

In a blog post on Freedom to Tinker, Englehardt detailed the development of the platform, OpenWPM, explaining that it regularly collects data from a million websites with the aim of measuring privacy violations such as browser fingerprinting, cookie synchronisation and more.

According to Englehardt, OpenWPM has already been used by at least six other research groups, as well as journalists, regulators, and students for class projects. He attributes the success of the platform to its ability to allow researchers to track privacy violations over a long period of time –  something he says has proved crucial for meaningful changes in policy, which brings in new technical and legal solutions that would improve privacy for all.

OpenWPM “makes it possible to run large-scale measurements with Firefox, a real consumer browser", Englehardt explained. "Crawling with a real browser is important for two reasons: one, it's less likely to be detected as a bot, meaning we're less likely to receive different treatment from a normal user, and two, a real browser supports all the modern web features (e.g. WebRTC, HTML5 audio and video), plugins (e.g. Flash), and extensions (e.g. Ghostery, HTTPS Everywhere). Many of these additional features play a large role in the average user's privacy online."

Englehardt noted that while his team at Princeton uses OpenWPM to carry out the measurements they collect data in three categories:

  • Network traffic — all HTTP requests and response headers
  • Client-side state — cookies, Flash cookies
  • Execution traces — we trap and record targeted JavaScript API calls that have been known to be used for tracking

“In addition to releasing all of the raw data collected during the census, we'll release the results of our own automated analysis,” he said.

To provide additional insight into the privacy threats faced by real users, Englehardt and his team have set up smaller, targeted measurements with privacy extensions such as Ghostery and AdBlock Plus to run alongside the one million site measurement tracked through OpenWPM.

Englehardt mentioned various ways in which researchers and people with an interest in security can get involved including: using their measurement data for their own tools; Using the data collected during our measurements, and build their own analysis on top of it; Use OpenWPM to collect and release their own data; and contribute to OpenWPM through pull requests.

In the meantime, Englehardt has invited researchers to check out the projects framework code which can be found on GitHub, which is accompanied by this paper which has more technical details.