See, 'personal information' is a legal term that has different meaning depending on what country you are dealing with. The same goes for a range of legal term definitions. This means you have to filter out the legal nuances after having identified the source country, and need to be aware of each countrie's laws prior to analysis. That's a lot of work.
I've started this as a personal trial project using deterministic heuristics for German sites – because German legalese is more restricted than US legalese, as sites are required to have certain information displayed. Haven't gotten far enough to show results yet – one day when I have time and nothing better to do!
Reshared post from +Carl Levinson
This engineer scraped 5,000,000 privacy policies.