Skip to content
This repository has been archived by the owner on Jul 18, 2024. It is now read-only.

[DataCap Application] Kernelogic - World Bank - Light Every Night #840

Closed
kernelogic opened this issue Aug 25, 2022 · 31 comments
Closed

[DataCap Application] Kernelogic - World Bank - Light Every Night #840

kernelogic opened this issue Aug 25, 2022 · 31 comments

Comments

@kernelogic
Copy link

kernelogic commented Aug 25, 2022

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

  • Organization Name: Fei Yan - Kernelogic
  • Website / Social Media: https://singularity-browser.kernelogic.ca/ Slack: Fei Yan
  • Total amount of DataCap being requested (between 500 TiB and 5 PiB): 5 PiB
  • Weekly allocation of DataCap requested (usually between 1-100TiB): 750 TiB
  • On-chain address for first allocation: f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

I have participated every Slingshot phase and is probably the best performing as a "small individual client". 

Even though Slingshot v2 has ended, there are still strong demand from SPs to onboard useful data. This application is to onboard open dataset from AWS.

I will provide a nice web UI to index all files onboarded and provide ways to retrieve.

I have successfully completed a few LDNs on other datasets and I have record to show I have been following the rules of decentralization and have zero self dealing.

https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/60
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/59
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/46
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/297
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/298
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/304

What is the primary source of funding for this project?

Self-funded, BigD exchange.

What other projects/ecosystem stakeholders is this project associated with?

enterprise-sp-wg, BigD exchange.

Use-case details

Describe the data being stored onto Filecoin

Light Every Night - World Bank Nightime Light Data – provides open access to all nightly imagery and data from the Visible Infrared Imaging Radiometer Suite Day-Night Band (VIIRS DNB) from 2012-2020 and the Defense Meteorological Satellite Program Operational Linescan System (DMSP-OLS) from 1992-2013. 

Where was the data in this dataset sourced from?

AWS Open dataset

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

https://registry.opendata.aws/wb-light-every-night/

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

AWS Open dataset

What is the expected retrieval frequency for this data?

Multiple times per year.

For how long do you plan to keep this dataset stored on Filecoin?

18 months.

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

All regions.

How will you be distributing your data to storage providers? Is there an offline data transfer process?

I will upload my prepared CAR files to a web server and coordinate with providers to download and propose offline deals.

Maximum 3 copies per SP entity and maximum of 10 copies for every pieceCID.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

Beside the previous SPs I have worked with, I also utilize bigD exchange to further decentralize the storage

To name a few from the community that I deal with regularly: PIKNIK, Holon, CabrinaHuang, HarryM, BigBear, j1v, XinAn Xu, WillTechMusing.

From BigD exchange: Mog Li, Devin Chen, DSS Nathanial Marsh, Rabinovitch, Vin K, arockpool Tony

How will you be distributing deals across storage providers?

Evenly across all providers I propose to, if they can handle. If a miner is a notary itself, this notary will receive no more than 20% of the total granted datacap.

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

I have all I need to start making deals.
@large-datacap-requests
Copy link

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!

@large-datacap-requests
Copy link

Thanks for your request!
Everything looks good. 👌

A Governance Team member will review the information provided and contact you back pretty soon.

@large-datacap-requests
Copy link

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!

@large-datacap-requests
Copy link

Thanks for your request!
Everything looks good. 👌

A Governance Team member will review the information provided and contact you back pretty soon.

@kernelogic
Copy link
Author

Reason to consider this application:

  1. This is AWS public open dataset
  2. I have good track record of no self dealing and transparent distribution
  3. I am one of the two developers of Singularity, capable of onboarding dataset in this scale
  4. I will provide a WEB UI for retrievals

@raghavrmadya
Copy link
Collaborator

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

750TiB

Client address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

@large-datacap-requests
Copy link

DataCap Allocation requested

Multisig Notary address

f01858410

Client address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

DataCap allocation requested

256TiB

@dannyob
Copy link

dannyob commented Aug 30, 2022

Hey @kernelogic this looks pretty exciting, and I'd like to approve this request. Could you point me to a source or code that shows the total size of this dataset? I've only been able to find a source that says it's "over 250 terabytes" (https://worldbank.github.io/OpenNightLights/tutorials/mod2_1_data_overview.html) , and you're asking for more data than that.

@kernelogic
Copy link
Author

Hi @dannyob Happy to answer this, I am using the s3 command to summarize size, in particular for this bucket:
aws s3 ls s3://globalnightlight --no-sign-request --summarize --human-readable --recursive

And the result comes back at 296.0 TiB with 4245327 files. Considering I plan to store 10 replicas and some overhead on the padding, therefore I am applying for 5 PiB.

Copy link

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceck7juh3fhqd5dmvyrs3geerdzsk7nvqkv7qfchck7anjb5prdebg

Address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

Datacap Allocated

256.00TiB

Signer Address

f1fg6jkxsr3twfnyhdlatmq36xca6sshptscds7xa

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceck7juh3fhqd5dmvyrs3geerdzsk7nvqkv7qfchck7anjb5prdebg

@large-datacap-requests
Copy link

DataCap Allocation requested

Request number 5

Multisig Notary address

f01858410

Client address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

DataCap allocation requested

1.25PiB

@large-datacap-requests
Copy link

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

Last two approvers

ipfscn & newwebgroup

Rule to calculate the allocation request amount

80% of total dc amount requested

DataCap allocation requested

1.25PiB

Total DataCap granted for client so far

3.75PiB

Datacap to be granted to reach the total amount requested by the client (5 PiB)

1.25PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
107697 20 2PiB 7.34 507.96TiB

Copy link

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceauu4pe5wipkuem7aofoca262nkeguboj6333n2w32arqmkxgll5y

Address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

Datacap Allocated

1.25PiB

Signer Address

f1e77zuityhvvw6u2t6tb5qlnsegy2s67qs4lbbbq

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceauu4pe5wipkuem7aofoca262nkeguboj6333n2w32arqmkxgll5y

Copy link

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceagitfsl5l2gns44hxziyqb2r22d3ruvqfu2vqabzisoicohb3jva

Address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

Datacap Allocated

1.25PiB

Signer Address

f1q6bpjlqia6iemqbrdaxr2uehrhpvoju3qh4lpga

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceagitfsl5l2gns44hxziyqb2r22d3ruvqfu2vqabzisoicohb3jva

@large-datacap-requests
Copy link

The issue reached the total datacap requested. This should be closed

@large-datacap-requests
Copy link

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

Last two approvers

llifezou & newwebgroup

Rule to calculate the allocation request amount

total dc reached

DataCap allocation requested

0

Total DataCap granted for client so far

5PiB

Datacap to be granted to reach the total amount requested by the client (5 PiB)

0B

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
158010 25 1.25PiB 6.05 310.71TiB

@filplus-checker
Copy link

DataCap and CID Checker Report1

  • Organization: Fei Yan - Kernelogic
  • Client: f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

  • Storage provider should not exceed 25% of total datacap.
  • Storage provider should not be storing duplicate data for more than 20%.
  • Storage provider should have published its public IP address.
  • All storage providers should be located in different regions.

✔️ Storage provider distribution looks healthy.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f0143858 Clifton, New Jersey, US 293.66 TiB 5.77% 293.66 TiB 0.00%
f03223 San Jose, California, US 292.31 TiB 5.74% 292.31 TiB 0.00%
f02301 San Jose, California, US 291.72 TiB 5.73% 291.72 TiB 0.00%
f0240185 Clifton, New Jersey, US 291.28 TiB 5.72% 291.28 TiB 0.00%
f01943663 Hong Kong, Central and Western, HK 287.47 TiB 5.65% 287.44 TiB 0.01%
f01964132 Bangkok, Bangkok, TH 277.47 TiB 5.45% 277.47 TiB 0.00%
f01907460 Seattle, Washington, US 271.79 TiB 5.34% 267.19 TiB 1.69%
f01928097 Hong Kong, Central and Western, HK 262.91 TiB 5.16% 262.91 TiB 0.00%
f01929565 Sydney, New South Wales, AU 238.16 TiB 4.68% 236.25 TiB 0.80%
f01859603 Shenzhen, Guangdong, CN 233.22 TiB 4.58% 201.66 TiB 13.53%
f01923787 Shenzhen, Guangdong, CN 218.55 TiB 4.29% 196.94 TiB 9.89%
f01923786 Hong Kong, Central and Western, HK 215.00 TiB 4.22% 194.66 TiB 9.46%
f01918046 Kuala Lumpur, Kuala Lumpur, MY 208.28 TiB 4.09% 168.44 TiB 19.13%
f01909705 Kuala Lumpur, Kuala Lumpur, MY 206.03 TiB 4.05% 168.50 TiB 18.22%
f01918045 Kuala Lumpur, Kuala Lumpur, MY 205.94 TiB 4.05% 168.44 TiB 18.21%
f01938671new Hong Kong, Central and Western, HK 187.70 TiB 3.69% 184.81 TiB 1.54%
f01938674new Shenzhen, Guangdong, CN 184.45 TiB 3.62% 177.16 TiB 3.96%
f01927554 Shenzhen, Guangdong, CN 183.23 TiB 3.60% 183.23 TiB 0.00%
f01928520 Maywood Park, Oregon, US 152.41 TiB 2.99% 148.72 TiB 2.42%
f01938601 Maywood Park, Oregon, US 151.91 TiB 2.98% 149.94 TiB 1.30%
f01222595 Moscow, Moscow, RU 95.66 TiB 1.88% 93.75 TiB 1.99%
f01926686 Hangzhou, Zhejiang, CN 92.72 TiB 1.82% 92.72 TiB 0.00%
f01970716new Shenzhen, Guangdong, CN 70.53 TiB 1.39% 70.53 TiB 0.00%
f01985775 Dallas, Texas, US 56.25 TiB 1.10% 56.25 TiB 0.00%
f01985745 Dallas, Texas, US 56.22 TiB 1.10% 56.22 TiB 0.00%
f033462 Dallas, Texas, US 55.56 TiB 1.09% 55.56 TiB 0.00%
f01660795 Shenzhen, Guangdong, CN 7.84 TiB 0.15% 7.84 TiB 0.00%
f047419 North Prairie, Wisconsin, US 2.59 TiB 0.05% 2.59 TiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

  • No more than 25% of unique data are stored with less than 4 providers.

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
1.97 TiB 1.97 TiB 1 0.04%
4.25 TiB 8.59 TiB 2 0.17%
39.31 TiB 119.75 TiB 3 2.35%
6.38 TiB 25.66 TiB 4 0.50%
7.42 TiB 37.27 TiB 5 0.73%
9.39 TiB 58.05 TiB 6 1.14%
32.66 TiB 252.06 TiB 7 4.95%
98.03 TiB 791.06 TiB 8 15.54%
132.06 TiB 1.20 PiB 9 24.07%
60.41 TiB 694.13 TiB 10 13.63%
101.19 TiB 1.09 PiB 11 21.98%
7.06 TiB 91.59 TiB 12 1.80%
14.66 TiB 202.75 TiB 13 3.98%
29.59 TiB 437.75 TiB 14 8.60%
1.31 TiB 22.72 TiB 15 0.45%
192.00 GiB 3.19 TiB 16 0.06%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients.
Usually different applications owns different data and should not resolve to the same CID.

✔️ No CID sharing has been observed.

Footnotes

  1. To manually trigger this report, add a comment with text checker:manualTrigger

@large-datacap-requests
Copy link

Thanks for your request!
❗ We have found some problems in the information provided.
We could not find Organization Name field in the information provided
We could not find Website / Social Media field in the information provided
We could not find Total amount of DataCap being requested (between 500 TiB and 5 PiB) field in the information provided
We could not find Weekly allocation of DataCap requested (usually between 1-100TiB) field in the information provided
We could not find On-chain address for first allocation field in the information provided

Please, take a look at the request and edit the body of the issue providing all the required information.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

13 participants