The PrivaSeer corpus is a collection of 1,005,380 privacy policies described in the following paper
Mukund Srinath, Shomir Wilson and C. Lee Giles. Privacy at Scale: Introducing the PrivaSeer Corpus of Web Privacy Policies. In Proc. ACL 2021.
For technical questions about this data, please contact Mukund Srinath (mukund@psu.edu). For licensing questions, please contact Prof. Shomir Wilson (shomir@psu.edu).
For research, teaching, and scholarship purposes, the corpus is available under a CC BY-NC-SA license. Please contact us for any requests regarding commercial use.
Link to the corpus: https://git.psu.edu/hlt-lab/PrivaSeer-Corpus
Link to the privacy policy langauge model (PrivBERT): https://huggingface.co/mukund/privbert