Skip to content

Latest commit

 

History

History
129 lines (101 loc) · 8.18 KB

datasets.md

File metadata and controls

129 lines (101 loc) · 8.18 KB

Datasets for Enterprise Use Cases

1. Customer Event Histories (Transactions/Purchases)

Instacart - Market Basket Analysis

The dataset is anonymized and contains a sample of over 3 million grocery orders from more than 200,000 Instacart users.

Dunnhumby - Retail Transactions (The Complete Journey dataset)

Household level transactions over two years from a group of 2,500 households who are frequent shoppers at a retailer All of a household’s purchases within the store, not just those from a limited number of categories Demographics and direct marketing contact history for select households

Give Me Some Credit

Historical data are provided on 250,000 borrowers.

KKBox - Churn Prediction Challenge

KKBOX offers subscription based music streaming service. The dataset includes user transaction and behavior features.

Olist - Brazilian E-Commerce Public Dataset

This is a Brazilian ecommerce public dataset of orders made at Olist Store. The dataset has information of 100k orders from 2016 to 2018 made at multiple marketplaces in Brazil.

UCI Online Retail

This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.

2. Product Recommendations

Yoochoose - RecSys Challenge 2015

The data represents six months of activities of a big e-commerce businesses in Europe selling all kinds of stuff such as garden tools, toys, clothes, electronics and much more.

MovieLens - 100K Dataset

MovieLens 100K movie ratings. Stable benchmark dataset. 100,000 ratings from 1000 users on 1700 movies. Released 4/1998.

MovieLens - 1M Dataset

MovieLens 1M movie ratings. Stable benchmark dataset. 1 million ratings from 6000 users on 4000 movies. Released 2/2003.

MovieLens - 25M Dataset

MovieLens 25M movie ratings. Stable benchmark dataset. 25 million ratings and one million tag applications applied to 62,000 movies by 162,000 users. Includes tag genome data with 15 million relevance scores across 1,129 tags. Released 12/2019

Elo - Merchant Category Recommendation

This dataset is created by Elo, one of the largest payment brands in Brazil. The datset contain contains up to 3 months' worth of transactions for every card.

Amazon - Review Data

This dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs).

3. Click Stream and Advertising

Criteo - Attribution Modeling for Bidding

This dataset represents a sample of 30 days of Criteo live traffic data. Each line corresponds to one impression (a banner) that was displayed to a user. For each banner we have detailed information about the context, if it was clicked, if it led to a conversion and if it led to a conversion that was attributed to Criteo or not. Data has been sub-sampled and anonymized so as not to disclose proprietary elements.

4. Anomaly Detection

Turbofan Engine Degradation Simulation Data Set

Engine degradation simulation was carried out using C-MAPSS. Four different were sets simulated under different combinations of operational conditions and fault modes. Records several sensor channels to characterize fault evolution. The data set was provided by the Prognostics CoE at NASA Ames.

5. Computer Vision

Clothing Dataset

Over 5,000 clothing images of 20 different classes.

Clothing Co-Parsing (CCP)

High-resolution street fashion photos with totally 59 tags. 1000+ images are with pixel-level annotations.

MVTec Anomaly Detection Dataset (MVTec AD)

MVTec AD is a dataset for benchmarking anomaly detection methods with a focus on industrial inspection. It contains over 5000 high-resolution images divided into fifteen different object and texture categories.

6. Sales data

Tableau User Group Superstore Sales Data