You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for submitting an application and welcome to Slingshot!
5
-
6
-
Our team will validate your entry shortly, to help us process this quickly please ensure all questions are complete, you have a [Filecoin Slack](https://filecoin.io/slack/) account and you have completed the [registration form](https://slingshot.filecoin.io/register-now/) on our Slingshot site.
7
-
8
-
firstPRMergeComment: |
9
-
:zap: Congratulations. Your application has been approved and merged as an official participant for the Slingshot competition! :zap:
10
-
11
-
It is your responsibility to follow the [rules](https://slingshot.filecoin.io/rules) and spirit of the competition to remain eligible for rewards.
12
-
13
-
Here are a few important resources to help you get started:
Copy file name to clipboardExpand all lines: datasets.md
+34-2
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@ Slingshot’s aim for using curated datasets is to ensure meaningful data is sto
3
3
4
4
There are a wide variety of public data sets that can be leveraged for this challenge - a sampling is shown in the table below.
5
5
6
-
If you would like to use a dataset that you don't see listed here, please submit a PR to add the dataset to this table. If you are using your own data that you are willing to make public but does not have a source URL, then feel free to write 'N/A' in the URL column.
6
+
If you would like to use a dataset that you don't see listed here, please submit an issue to add the dataset to this table. If you are using your own data that you are willing to make public but does not have a source URL, then feel free to write 'N/A' in the URL column.
7
7
8
8
9
9
@@ -64,4 +64,36 @@ If you would like to use a dataset that you don't see listed here, please submit
64
64
| TAO | TAO is a federated dataset for Tracking Any Object, containing 2,907 high resolution videos, captured in diverse environments, which are half a minute long on average. | 225G | video |http://taodataset.org/|
65
65
| OTW | The Out the Window (OTW) dataset is a crowdsourced activity dataset containing 5,668 instances of 17 activities from the NIST Activities in Extended Video (ActEV) challenge. | 48G | video |https://stresearch.github.io/otw/|
66
66
| Waymo | The Waymo Open Dataset is comprised of high resolution sensor data collected by Waymo self-driving cars in a wide variety of conditions. We are releasing this dataset publicly to aid the research community in making advancements in machine perception and self-driving technology. | 1.2T | point cloud, image |https://waymo.com/open/|
67
-
| IMDB-WIKI | IMDB-WIKI – 500k+ face images with age and gender labels | 276G | image |https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/|
67
+
| IMDB-WIKI | IMDB-WIKI – 500k+ face images with age and gender labels | 276G | image |https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/|
68
+
|Genomic Data Commons |Genomic, epigenomic, transcriptomic, and proteomic data from the National Genome Atlas Program |2.5 PB|JSON |https://portal.gdc.cancer.gov|
69
+
|OpenStreetMap |A collaborative project to create a free editable map of the world |40 GB|JSON |https://console.cloud.google.com/marketplace/product/openstreetmap/geo-openstreetmap?filter=solution-type%3Adataset&filter=category%3Atransportation&id=88e087d0-5f92-4407-8dcc-5577bd06d776|
70
+
|Wikipedia |A multilingual open-collaborative online encyclopedia created and maintained by a community of volunteer editors using a wiki-based editing system |18.9 GB|JSON |https://portal.gdc.cancer.gov|
71
+
|openFDA |Open datasets from the US Food and Drug Administration |N/A |JSON |https://open.fda.gov/data/downloads/|
72
+
|Amateur radio |Amateur Radio Software) |60.0 GB TB|JSON |https://bigquery.cloud.google.com/table/dataproc-fun:wsprnet.all_wsprnet_data?pli=1&tab=details|
73
+
|Reddit |Collection of Reddit posts and comments |546 GB|JSON |https://console.cloud.google.com/bigquery?utm_source=bqui&utm_medium=link&utm_campaign=classic|
74
+
|Dota 2 |Open data around the Dota Game platform |500 GB|JSON |https://www.opendota.com|
75
+
|AVSpeech: Large-scale Audio-Visual Speech Dataset |large-scale audio-visual dataset comprising speech video clips with no interfering background noises |1.50 TB GB|N/A |https://academictorrents.com/details/b078815ca447a3e4d17e8a2a34f13183ec5dec41|
76
+
|Google Open Images |9 million URLs to images that have been annotated with labels spanning over 6000 categories |456 GB|image |https://academictorrents.com/details/9e9194e21ce045deee8d811481b4cd676b20b06b|
|Functional Map of the World |Satellite images of the world |352 GB|image |https://academictorrents.com/details/9e9194e21ce045deee8d811481b4cd676b20b06b|
79
+
| NEAR-VI-Dataset | The NetEase AR Oriented Visual Inertial Dataset | 175G | gif |https://github.com/EZXR-Research/NEAR-VI-Dataset|
80
+
| Netease Cloud Music | Online music services lead playlists, social networking, brand recommendations and music fingerprints | - | Audio |https://music.163.com|
81
+
| Movie Heaven | Movie Paradise is a large online movie broadcasting platform in China | - | Video |https://www.dytt8.net|
82
+
| COCO | COCO is a large-scale object detection, segmentation, and captioning dataset. | - | ZIP |https://cocodataset.org|
83
+
| Google Cloud Public Datasets | Uncover new insights with high-demand public datasets | Varies |https://cloud.google.com/public-datasets|
84
+
| SmartCity Dataset | Noise data collected by fiber optic sensing equipment for research | 200T | WAV |http://api.sr2.glm2m.com/index.php?r=site/index|
85
+
| BigEarthNet | The BigEarthNet archive was constructed by the Remote Sensing Image Analysis (RSiM) Group and the Database Systems and Information Management (DIMA) Group at the Technische Universität Berlin (TU Berlin). | 66G | image |http://bigearth.net/#downloads|
86
+
| Mapillary | Train recognition models for street scenes | 150G | image |https://www.mapillary.com/datasets|
87
+
| Bair Robot Pushing | This data set contains roughly 44,000 examples of robot pushing motions, including one training set (train) and two test sets of previously seen (testseen) and unseen (testnovel) objects. This is the small 64x64 version. | 30G | video |https://www.tensorflow.org/datasets/catalog/bair_robot_pushing_small|
| Argoverse | Argoverse is provided free of charge under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public license. Argoverse code and APIs are provided under the MIT license. | 260G | fusion, point cloud, image |https://www.argoverse.org/data.html#download-link|
90
+
| Ford Autonomous Vehicle Dataset | We present a challenging multi-agent seasonal dataset collected by a fleet of Ford autonomous vehicles at different days and times during 2017-18. The vehicles were manually driven on a route in Michigan that included a mix of driving scenarios including the Detroit Airport, freeways, city-centers, university campus and suburban neighborhood. | 1.6T | image |https://avdata.ford.com/|
91
+
| PandaSet | High-quality open-source dataset for autonomous driving | 42G | fusion, point cloud, image |https://pandaset.org/|
| Celeb-DF | Celeb-DF (v2): A New Dataset for DeepFake Forensics | 9G | video |http://www.cs.albany.edu/~lsw/celeb-deepfakeforensics.html|
94
+
| LISA Traffic Sign | The LISA Traffic Sign Dataset is a set of videos and annotated frames containing US traffic signs. | 8G | image |http://cvrr.ucsd.edu/LISA/lisa-traffic-sign-dataset.html|
95
+
| InStereo2K | A large real dataset for stereo matching in indoor scenes | 8G | image |https://github.com/YuhuaXu/StereoDataset|
96
+
| CURE-TSD | Challenging Unreal and Real Environments for Traffic Sign Detection | 240G | video |https://github.com/olivesgatech/CURE-TSD|
97
+
| NightOwls | Pedestrians at night | 294G | image |https://www.nightowls-dataset.org/|
98
+
| Synscapes | How many cars are visible in a given image? Is the sky clear or cloudy? Synscapes provides a wide range of metadata which helps characterize each image. | 300G | image |https://synscapes.on.liu.se/index.html|
99
+
| Comma.ai driving | 7 and a quarter hours of largely highway driving. | 40G | video |https://github.com/commaai/research|
0 commit comments