Datasets

load(dataset, *[, base, fetch, force, …]) Load a dataset.
load_ocr(**kwargs) Convenience function for loading the OCR dataset.
load_conll00(**kwargs) Convenience function for loading the CoNLL00 dataset.

Pyconstruct provides methods for loading a number of datasets for standard tasks in structured-output prediction. The current list of available datasets includes:

  • ocr : Ben Taskar’s ORC dataset
  • conll00 : CoNLL 2000 Text Chunking dataset
  • horseseg : HorseSeg dataset (coming soon)
  • equations : OCR equations dataset

Datasets can be loaded using the load function provided by this module. In most cases, the dataset is downloaded upon first loading and stored in a local directory on your computer for faster retrieval from the second loading onwards (the actual directory depends on the operating system). The data is preprocessed and made available in a format that can be already used for learning with any algorithm provided by Pyconstruct.