PrivaSeer - Data & Code

Data and Code

PrivaSeer Corpus and Language Model (ACL, 2021)

The PrivaSeer corpus is a collection of 1,005,380 privacy policies described in the following paper

Mukund Srinath, Shomir Wilson and C. Lee Giles. Privacy at Scale: Introducing the PrivaSeer Corpus of Web Privacy Policies. In Proc. ACL 2021.

For technical questions about this data, please contact Mukund Srinath (mukund@psu.edu). For licensing questions, please contact Prof. Shomir Wilson (shomir@psu.edu).

For research, teaching, and scholarship purposes, the corpus is available under a CC BY-NC-SA license. Please contact us for any requests regarding commercial use.

Link to the corpus: https://git.psu.edu/hlt-lab/PrivaSeer-Corpus

Link to the privacy policy langauge model (PrivBERT): https://huggingface.co/mukund/privbert