Datasets¶
load (dataset, *[, base, fetch, force, …]) |
Load a dataset. |
load_ocr (**kwargs) |
Convenience function for loading the OCR dataset. |
load_conll00 (**kwargs) |
Convenience function for loading the CoNLL00 dataset. |
Pyconstruct provides methods for loading a number of datasets for standard tasks in structured-output prediction. The current list of available datasets includes:
- ocr : Ben Taskar’s ORC dataset
- conll00 : CoNLL 2000 Text Chunking dataset
- horseseg : HorseSeg dataset (coming soon)
- equations : OCR equations dataset
Datasets can be loaded using the load
function provided by this module. In
most cases, the dataset is downloaded upon first loading and stored in a local
directory on your computer for faster retrieval from the second loading onwards
(the actual directory depends on the operating system). The data is preprocessed
and made available in a format that can be already used for learning with any
algorithm provided by Pyconstruct.