IASAbout IASDesignHostingPromotionConsultingContact


    Datasets Machine Learning Artificial Intelligence













Datasets Machine Learning Artificial Intelligence


Datasets
Repositories of data used to test/validate machine learning algorithms.

    Top: Computers: Artificial Intelligence: Machine Learning: Datasets:

  • - AdEater is a program that learns to remove Internet advertisements. The machine learning dataset is available from this page.
  • - 20 Newsgroups for text categorization. Widely used dataset.
  • - Contains 353 English word pairs along with human-assigned similarity judgements.
  • - A collection of over 500 time series, maintained by Rob Hyndman. Time series are organized by subject.
  • - The first free collection of patents, in XML format, containing over 75,000 manually-classified patent documents in English from 1998-2002.
  • - A dataset of face images for face recognition algorithms.
  • - A set of data sets, where each data set is represented in first order logic. Maintained at the University of Dortmund, Germany.
  • - A classic benchmark for text categorization algorithms.
  • - Provides access to a wide variety of astrophysics, space physics, solar physics, lunar and planetary data from NASA space flight missions, in addition to selected other data and some models and software.
  • - Provides a large number of diverse test collections for use in text categorization research.
  • - Text datasets used in information retrieval and learning in text domains.
  • - Data for Evaluating Learning Valid Experiments: A standardized environment designed to evaluate the performance of methods that learn relationships based primarily on empirical data. Delve makes it possible for users to compare their learning methods with
  • - Archive of experimentally-determined, biological macromolecule 3-D structures from the Brookhaven National Laboratory.
  • - This NIST database of fingerprint images contains 2000 8- bit gray scale fingerprint image pairs.
  • - Repository of online information sources: test domains for information extraction and wrapper generation tools that learn extraction rules (extraction patterns).
  • - A repository of databases, domain theories and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms.
  • - A collection of public gene expression data sources maintained by A. Brazma.
  • - A corpus of parsed sentences. Used by many researchers for training data-driven parsing algorithms.
  • - Web pages partitioned into classes, with hyperlink data. The dataset has been used for text categorization and learning to extract symbolic knowledge from the World Wide Web.
  • - A freely downloadable, pre-classified, dataset of HTML web page documents categorized into 10 categories.
  • - A repository of datasets used in statistics and machine learning.
  • - Several hundred thousand economic time series, produced by the U.S. Government and distributed by the government in a variety of formats and media, have been put into a standard, highly efficient, easy-to- use form for personal computers.
  • - Datasets used for the experimental analysis of function approximation techniques and for training and demonstration by machine learning and statistics community.
  • - HS3D (Homo Sapiens Splice Sites Dataset) is a database of Homo Sapiens Exon, Intron and Splice regions extracted from GenBank primate sequences Rel.123. The aim of this data set is to give standardized material to train and to assess the prediction accu
  • - Datgen, formerly SCDS, is a computer program that generates data to systematically test programs that consume data. These synthetic datasets can be used to validate learning algorithms.


Top


Home | About IAS | Web Design | Web Hosting | Promotion | Consulting | Support | Contact IAS

Copyright © 1995-2008 Internet Advertising Solutions, Inc.
Copyright Notice | Privacy Policy | Site Map | APR









  MySQL - Cache Direct sec.