What You Will Find Here
Datasets of linguistic interest are provided in delimited text and spreadsheet formats. A goal for these dispirate datasets is to bring them together in a common RDF representation.
- Parallel Corpus
- Parts of Speech
Preliminary. Document artifacts are being collected, rediscovered, reviewed and sorted from various old harddrives and backup media. Following this collection phase the repository will most likely be split in two: one repository for datasets and one for e-books. Refinement of the assets will be ongoing as will be documentation of the assets.
While issues with the artifacts abound, specific defects are being tracked in the archive Issue Tracker. Please feel free to add to it.