Harvard releases massive public-domain book dataset for AI training, funded by tech giants.

Harvard University, with funding from Microsoft and OpenAI, has released a dataset of nearly one million public-domain books for training AI models. The Institutional Data Initiative aims to provide smaller developers with access to high-quality data, typically available only to tech giants, thereby leveling the playing field in AI development. The dataset includes books from the Google Books project and can be used by anyone to train AI, from hobbyists to corporations.

3 months ago
9 Articles