By | April 9, 2012
It isn't just privacy policies – what I'd envision is a browser plugin that displays for each website you visit a set of symbols informing you what the most important pieces of website legalese are. E.g. your data will be shared with xyz users, kept for 6 months and then deleted, 14-day return policy, no 'delete your account' option, etc. However the question is more complicated than this engineer portrays it.

See, 'personal information' is a legal term that has different meaning depending on what country you are dealing with. The same goes for a range of legal term definitions. This means you have to filter out the legal nuances after having identified the source country, and need to be aware of each countrie's laws prior to analysis. That's a lot of work.

I've started this as a personal trial project using deterministic heuristics for German sites – because German legalese is more restricted than US legalese, as sites are required to have certain information displayed. Haven't gotten far enough to show results yet – one day when I have time and nothing better to do!

This engineer scraped 5,000,000 privacy policies.

I've Read Every Privacy Policy on the Internet– This is What I've Learned

We’ve been working on a project to analyze and classify every privacy policy on the Internet. Yeah, that’s from the 5,000,000,000 websites included in AWS’s Common Crawl Corpus. We are classifying…

