Skip to content

Latest commit

 

History

History
146 lines (84 loc) · 13.2 KB

datasets.md

File metadata and controls

146 lines (84 loc) · 13.2 KB

Awesome of Datesets for Computer CV

a list of datasets dedicated to the Face Recognition & Detection , OCR , Objection Detection, Gan , SLAM, Motion Track & Pose Estimation , ReID, etc. Any suggestions and pull request are welcome.

repository

Computer Vision

picture

general

Traffic

Surgery

[lesions] a large collection of multi-source dermatoscopic images of pigmented lesions classification

cloth

  • [DeepFashion2]: DeepFashion2 is a comprehensive fashion dataset. It contains 491K diverse images of 13 popular clothing categories from both commercial shopping stores and consumers. It totally has 801K clothing clothing items, where each item in an image is labeled with scale, occlusion, zoom-in, viewpoint, category, style, bounding box, dense landmarks and per-pixel mask.There are also 873K Commercial-Consumer clothes pairs cloth classification & detection

people

Face Recognition

Face Detection

Face Landmark

  • LS3D-W: A large-scale 3D face alignment dataset constructed by annotating the images from AFLW, 300VW, 300W and FDDB in a consistent manner with 68 points using the automatic method [paper] [dataset]
  • AFLW: Annotated Facial Landmarks in the Wild: A Large-scale, Real-world Database for Facial Landmark Localization(25k faces with 21 landmarks) [paper] [benchmark]

Face Attribute

  • CelebA: Deep Learning Face Attributes in the Wild(10k people in 202k images with 5 landmarks and 40 binary attributes per image) [paper] [dataset]

text

  • ICDAR 2015 1000 training images and 500 testing images
  • ICDAR 2017 Competition on Multi-lingual scene text detection and script identification
  • MLT 2017 7200 training, 1800 validation images
  • COCO-Text (Computer Vision Group, Cornell) 63,686 images, 173,589 text instances, 3 fine-grained text attributes.
  • Synthetic Word Dataset (Oxford, VGG) 9 million images covering 90k English words
  • IIIT 5000 images from Scene Texts and born-digital (2k training and 3k testing images) Each image is a cropped word image of scene text with case-insensitive labels
  • StanfordSynth Small single-character images of 62 characters (0-9, a-z, A-Z)
  • (MSRA-TD500)
  • Street View Text (SVT) 100 images for training and 250 images for testing
  • KAIST Scene_Text 3000 images of indoor and outdoor scenes containing text
  • Chars74k Small single-character images of 62 characters (0-9, a-z, A-Z) Over 74K images from natural images, as well as a set of synthetically generated characters

video

gerneral

  • [LaSOT] A High-quality Benchmark for Large-scale Single Object Tracking Object Tracking
  • [Moments in Time] Moments in Time: one million videos for event understanding) videos understanding
  • [UCF101] action recognition data set of realistic action videos, collected from YouTube, having 101 action categories. This data set is an extension of UCF50 data set which has 50 action categories action recognition
  • [DAVIS] DAVIS Challenge on Video Object Segmentation Video Object Segmentation

sports

  • [Sports1M] contains 1,133,158 video URLs which have been annotated automatically with 487 Sports labels using the YouTube Topics API video classification
  • [Kinetics] Kinetics consists of approximately 650,000 video clips, and covers 700 human action classes with at least 600 video clips for each action class. Each clip lasts around 10 seconds and is labeled with a single class. video understanding

car

  • [CityFlow] A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification Vehicle ReId
  • [Argoverse] 3D Tracking and Forecasting With Rich Maps Object Tracking

people

  • [CrowdPose] CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark Pose Estimation
  • [JHMDB] J-HMDB is, however, more than a dataset of human actions; it could also serve as a benchmark for pose estimation and human detection[motion understand](http://jhmdb.is.tue.mpg.de/dataset
  • [Kinetics] Kinetics consists of approximately 650,000 video clips, and covers 700 human action classes with at least 600 video clips for each action class. Each clip lasts around 10 seconds and is labeled with a single class. video understanding

Recommend