Skip to content
This repository has been archived by the owner on Jul 18, 2024. It is now read-only.

[DataCap Application] Baikal Seal Storage Technology #325

Closed
scharfstein opened this issue Apr 12, 2022 · 16 comments
Closed

[DataCap Application] Baikal Seal Storage Technology #325

scharfstein opened this issue Apr 12, 2022 · 16 comments
Assignees

Comments

@scharfstein
Copy link

scharfstein commented Apr 12, 2022

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

  • Organization Name: Seal Storage Technology
  • Website / Social Media: https://www.sealstorage.io/
  • Total amount of DataCap being requested (between 500 TiB and 5 PiB): 2 PiB
  • Weekly allocation of DataCap requested (usually between 1-100TiB): 400 TiB
  • On-chain address for first allocation: f1usscfxtogr5v4jmi32uzkckeql2mgvun72q37ga

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

Our customer is a Dark Matter Group within UC Berkeley and Seal is involved in a project with them to store the outputs of their scientific experiments. They would like to upload data to a distributed platform for other globally based researchers to be able to access this data. We will kicked off the project in early March 2022 with ingestion estimated to begin in mid April 2022 via portable disk unit. The customer’s data will not be encrypted, access controls will be implemented. They are looking for storage for at least the next three years.

Seal is a carbon-neutral, decentralized cloud storage provider. Seal's technical leadership brings decades of experience from traditional enterprise storage companies including Seagate and Oracle, as well as world-class experience on the Filecoin Network. Today, Seal operates data centers across the US and Canada with enterprise-grade infrastructure and data policies.

What is the primary source of funding for this project?

Seal is funding the project.

What other projects/ecosystem stakeholders is this project associated with?

None at this time.

Use-case details

Describe the data being stored onto Filecoin

The data sets are the original outputs of scientific experiments.

Where was the data in this dataset sourced from?

The data sets have been created by dark matter-related experiments and instrumentation.

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

Yes. A link will be added shortly.

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

The current data set requires permission based access.

A goal of the pilot project is for Seal to work with our customer to provide a permission based model to access data. Staged data for access will be supported on IPFS, Seaweed FS Open Source tools.

What is the expected retrieval frequency for this data?

Archival is primary. The data will be accessed by external collaborators and Researchers.

For how long do you plan to keep this dataset stored on Filecoin?

Three years, at least.

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

We plan to store five copies of the 400 TiB data set [total of 2 PiB] in five different cities, in three different countries and across two continents.

How will you be distributing your data to storage providers? Is there an offline data transfer process?

Seal Storage has dual 100 Gbps internet connections. SPs will download data from Seal. Offline data transfer may be possible.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

We are currently discussing capabilities and performing due diligence with several SPs and have chosen three SPs for this project. We chose these based on their current storage capacity, compute capabilities, enterprise-grade DCs and bandwidth.

How will you be distributing deals across storage providers?

Holon, 400 TiB
ElioVP, 400 TiB
PikNik, 400 TiB
Seal, 800 TiB

Seal will also be keeping a hot copy (400 TB) for the Customer available for access.

The data ingestion will follow this approximate schedule:

55 TB right away
by the end of year 1: 5 TB
by end of year 2: 50 TB
by end of year 3: 290 TB

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

Yes, we have the resources/funding to begin making deals once we receive DataCap. 

We currently have the support we need.
@large-datacap-requests
Copy link

Thanks for your request!
Everything looks good. 👌

A Governance Team member will review the information provided and contact you back pretty soon.

@galen-mcandrew
Copy link
Collaborator

Multisig Notary requested

Total DataCap requested

2PiB

Expected weekly DataCap usage rate

100TiB

@large-datacap-requests
Copy link

**Multisig created and sent to RKH f01838560

@large-datacap-requests
Copy link

DataCap Allocation requested

Multisig Notary address

f01838560

Client address

f1usscfxtogr5v4jmi32uzkckeql2mgvun72q37ga

DataCap allocation requested

50TiB

Copy link

dannyob commented May 17, 2022

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacediiie2ljjmt5ippxnt2iht4hjtr2si37yb7mxdl75kak4qauf23y

Address

f1usscfxtogr5v4jmi32uzkckeql2mgvun72q37ga

Datacap Allocated

50.00TiB

Signer Address

f1k6wwevxvp466ybil7y2scqlhtnrz5atjkkyvm4a

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacediiie2ljjmt5ippxnt2iht4hjtr2si37yb7mxdl75kak4qauf23y

Copy link

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceahnlilxqzokl2km27u2vxc3kovsqyinjg2ijx7oxmcwo7bs3tceu

Address

f1usscfxtogr5v4jmi32uzkckeql2mgvun72q37ga

Datacap Allocated

50.00TiB

Signer Address

f1fkxkfxgopjf3ufnfg5i3m6qlwf73kp4w5zz7nnq

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceahnlilxqzokl2km27u2vxc3kovsqyinjg2ijx7oxmcwo7bs3tceu

@dkkapur
Copy link
Collaborator

dkkapur commented Jun 3, 2022

This went through, clearing the warning.

Screen Shot 2022-06-03 at 12 13 05 PM

@large-datacap-requests
Copy link

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!

@large-datacap-requests
Copy link

Thanks for your request!
Everything looks good. 👌

A Governance Team member will review the information provided and contact you back pretty soon.

@salstorage
Copy link

@raghavrmadya waiting for auto bot allocation for next tranche

@dkkapur dkkapur self-assigned this Nov 4, 2022
@dkkapur
Copy link
Collaborator

dkkapur commented Nov 4, 2022

@salstorage -> as part of routine cleanup we were doing (cc @galen-mcandrew) this notary was actually deprecated and set to 0. see https://filplus.d.interplanetary.one/notaries?showInactive=true&filter=Baikal. this is likely because it got picked up in our filters for "inactive" notaries where we had latent DataCap. can you shed any light on recent progress for this application and we can get you started up again?

@galen-mcandrew @simonkim0515 @raghavrmadya what are your thoughts on getting this stood up via a new app following the latest guidelines (i.e., issue on notary governance + new app in this repo)? @kevzak is this a fit for E-Fil+ given private data?

@kevzak
Copy link
Collaborator

kevzak commented Nov 6, 2022

I think if Seal already had notaries that supported this application, there's no need to change the path to DataCap. If they need to start over, then it might be worth considering E-Fil

@filplus-checker
Copy link

DataCap and CID Checker Report1

  • Organization: Seal Storage Technology
  • Client: f1usscfxtogr5v4jmi32uzkckeql2mgvun72q37ga

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

  • Storage provider should not exceed 25% of total datacap.
  • Storage provider should not be storing duplicate data for more than 20%.
  • Storage provider should have published its public IP address.
  • All storage providers should be located in different regions.

⚠️ f01886710 has sealed 43.94% of total datacap.

⚠️ f01886710 has unknown IP location.

⚠️ f01873432 has sealed 30.95% of total datacap.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01886710 Unknown 14.01 TiB 43.94% 14.01 TiB 0.00%
f01873432 Las Vegas, Nevada, US 9.87 TiB 30.95% 9.87 TiB 0.00%
f01157018 Sydney, New South Wales, AU 2.69 TiB 8.43% 2.69 TiB 0.00%
f01157027 Sydney, New South Wales, AU 1.81 TiB 5.68% 1.81 TiB 0.00%
f01156901 Sydney, New South Wales, AU 1.67 TiB 5.23% 1.67 TiB 0.00%
f01156975 Sydney, New South Wales, AU 1.65 TiB 5.18% 1.65 TiB 0.00%
f01345523 Antwerpen, Flanders, BE 192.00 GiB 0.59% 192.00 GiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

  • No more than 25% of unique data are stored with less than 4 providers.

⚠️ 97.65% of deals are for data replicated across less than 4 storage providers.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
4.86 TiB 4.86 TiB 1 15.24%
2.39 TiB 4.78 TiB 2 15.00%
7.16 TiB 21.49 TiB 3 67.41%
192.00 GiB 768.00 GiB 4 2.35%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients.
Usually different applications owns different data and should not resolve to the same CID.

⚠️ CID sharing has been observed.

Other Client Application Total Deals Affected Unique CIDs Verifier
f1rovtu5m3gq7q5vu4kfh4oiiif643gqq7voi4ida Seal Storage Technology 6.38 TiB 190 LDN EFil+

Footnotes

  1. To manually trigger this report, add a comment with text checker:manualTrigger

@johansealstorage
Copy link

checker:manualTrigger

@filplus-checker-app
Copy link

DataCap and CID Checker Report1

  • Organization: Seal Storage Technology
  • Client: f1usscfxtogr5v4jmi32uzkckeql2mgvun72q37ga

Approvers

1dannyob
1TimWilliams00

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

Since this is the 2nd allocation, the following restrictions have been relaxed:

  • Storage provider should not exceed 90% of total datacap.
  • Storage provider should not be storing duplicate data for more than 20%.
  • Storage provider should have published its public IP address.
  • All storage providers should be located in different regions.

✔️ Storage provider distribution looks healthy.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01157018 Melbourne, Victoria, AU
Anycast Global Backbone
2.69 TiB 8.43% 2.69 TiB 0.00%
f01157027 Melbourne, Victoria, AU
Anycast Global Backbone
1.81 TiB 5.68% 1.81 TiB 0.00%
f01156901 Melbourne, Victoria, AU
Anycast Global Backbone
1.67 TiB 5.23% 1.67 TiB 0.00%
f01156975 Melbourne, Victoria, AU
Anycast Global Backbone
1.65 TiB 5.18% 1.65 TiB 0.00%
f01345523 Antwerpen, Flanders, BE
Cogent Communications
192.00 GiB 0.59% 192.00 GiB 0.00%
f01886710 Las Vegas, Nevada, US
GTT Communications Inc.
14.01 TiB 43.94% 14.01 TiB 0.00%
f01873432 Las Vegas, Nevada, US
PiKNiK & Company Inc.
9.87 TiB 30.95% 9.87 TiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

Since this is the 2nd allocation, the following restrictions have been relaxed:

  • No more than 90% of unique data are stored with less than 2 providers.

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
4.86 TiB 4.86 TiB 1 15.24%
2.39 TiB 4.78 TiB 2 15.00%
7.16 TiB 21.49 TiB 3 67.41%
192.00 GiB 768.00 GiB 4 2.35%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients.
Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.

⚠️ CID sharing has been observed.

Other Client Application Total Deals Affected Unique CIDs Approvers
f1rovtu5m3gq7q5vu4kfh4oiiif643gqq7voi4ida Seal Storage Technology 24.62 TiB 495 1cryptowhizzard
1Fenbushi-Filecoin
1flyworker
1UnionLabs2020

Footnotes

  1. To manually trigger this report, add a comment with text checker:manualTrigger

@salstorage
Copy link

@raghavrmadya @dkkapur @galen-mcandrew this application is in a deprecated state.
LDN Application #1212 is active and replaces application #325

Please close this application as inactive/void
Thanks
Sal - Seal Storage

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests