Skip to content
This repository has been archived by the owner on Jul 18, 2024. It is now read-only.

[DataCap Application] MongoStorage - CERN Opendata #1563

Closed
1 of 2 tasks
amughal opened this issue Jan 18, 2023 · 73 comments
Closed
1 of 2 tasks

[DataCap Application] MongoStorage - CERN Opendata #1563

amughal opened this issue Jan 18, 2023 · 73 comments
Assignees
Labels

Comments

@amughal
Copy link

amughal commented Jan 18, 2023

Data Owner Name

CERN

Data Owner Country/Region

Switzerland

Data Owner Industry

Education & Training

Website

http://opendata.cern.ch/

Social Media

https://twitter.com/cernopendata

Total amount of DataCap being requested

1PiB

Weekly allocation of DataCap requested

100TiB

On-chain address for first allocation

f1vurpgwsgteoi5ipdjf5akpvxrz7zvs7zm2oplyi

Custom multisig

  • Use Custom Multisig

Identifier

No response

Share a brief history of your project and organization

MongoStorage is an emerging FileCoin Service Provider. Based in Southern California, USA, and working through a plan, soon to be ESPA certified provider. The founders have vast experience in networks and systems, and have gone through multiple sessions at ESPA trainings organized by PikNik in Vegas.

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

The CERN Open Data portal is the access point to a growing range of data produced through the research performed at CERN. It disseminates the preserved output from various research activities and includes accompanying software and documentation needed to understand and analyze the data.
The portal adheres to established global standards in data preservation and Open Science: the products are shared under open licenses; they are issued with a Digital Object Identifier (DOI) to make them citable objects.

Where was the data currently stored in this dataset sourced from

Other

If you answered "Other" in the previous question, enter the details here

CERN data centers in Geneva, Switzerland.

How do you plan to prepare the dataset

singularity

If you answered "other/custom tool" in the previous question, enter the details here

No response

Please share a sample of the data

http://opendata.cern.ch/record/4900
http://opendata.cern.ch/record/24442

Confirm that this is a public dataset that can be retrieved by anyone on the Network

  • I confirm

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Weekly

For how long do you plan to keep this dataset stored on Filecoin

More than 3 years

In which geographies do you plan on making storage deals

North America, South America, Europe

How will you be distributing your data to storage providers

HTTP or FTP server, IPFS

How do you plan to choose storage providers

Slack, Partners

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

No response

How do you plan to make deals to your storage providers

Boost client, Singularity

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

@large-datacap-requests
Copy link

Thanks for your request!
Everything looks good. 👌

A Governance Team member will review the information provided and contact you back pretty soon.

@Sunnyiscoming
Copy link
Collaborator

Can you provide a detailed description of your organization, MongoStorage, such as the website, established time, etc.?

@Sunnyiscoming
Copy link
Collaborator

Relevant application
filecoin-project/filecoin-plus-client-onboarding#2662
@jamerduhgamer Hi, notary. If you have more information, please disclose it here.

@amughal
Copy link
Author

amughal commented Jan 19, 2023

We are still working on a website, as this is a new website. We have secured 50K FILs (about 700TiB raw) as collateral from Darma. PiKNiK has earlier approved a 100TiB, and we are in the process of getting that into the SP along with other FIL+ deals as being the ESPA participant. As we are buying bigger storage units, this LDN request is a continuation to the earlier approval by James.

@amughal
Copy link
Author

amughal commented Jan 19, 2023

Here is the miner id:
https://filfox.info/en/address/f01959735

@jamerduhgamer
Copy link

Hi @Sunnyiscoming, thanks for the tag! I have previously approved @amughal for 90 TiBs of this dataset as a proof of concept but then the client revealed to me that there will be > 100 TiBs of data that needs to be approved then I recommended them to submit an LDN to cover the full amount of the dataset.

@amughal is an ESPA participant so I can verify that they are a trustworthy client and SP.

@herrehesse
Copy link

herrehesse commented Jan 20, 2023

@amughal can you please resolve your application title: Organization - Project Name

@amughal amughal changed the title [DataCap Application] <Organization> - <Project Name> [DataCap Application] MongoStorage - CERN Opendata Jan 20, 2023
@large-datacap-requests
Copy link

Thanks for your request!
Everything looks good. 👌

A Governance Team member will review the information provided and contact you back pretty soon.

@amughal
Copy link
Author

amughal commented Jan 20, 2023

@herrehesse updated, please take a look.

@herrehesse
Copy link

Thank you!

@simonkim0515 simonkim0515 self-assigned this Jan 20, 2023
@simonkim0515
Copy link
Collaborator

Datacap Request Trigger

Total DataCap requested

1PiB

Expected weekly DataCap usage rate

100TiB

Client address

f1vurpgwsgteoi5ipdjf5akpvxrz7zvs7zm2oplyi

@large-datacap-requests
Copy link

DataCap Allocation requested

Multisig Notary address

f01858410

Client address

f1vurpgwsgteoi5ipdjf5akpvxrz7zvs7zm2oplyi

DataCap allocation requested

50TiB

Id

d2318a71-83c0-407c-8824-85c71723830a

@amughal
Copy link
Author

amughal commented Aug 10, 2023

CERN's datasets are by the ids, and were downloaded in individual numerical directories. These are mostly root files which are binary. A master single large tar file was created for all those datasets, singularity was used to create car files (930 CAR files). To verify it is CERN's dataset, "strings cern1.car > strings.out" can be run against that file in your URL, and patterns like this will come:

TBasket>recoTracks_generalTracks__RECO.obj.hitPattern_.hitPattern_[25]
TBasket>recoTracks_generalTracks__RECO.obj.hitPattern_.hitPattern_[25]
TBasket>recoTracks_generalTracks__RECO.obj.hitPattern_.hitPattern_[25]

Further, running this command "grep root strings.out", I get following:

root@storage:/bigdata-stor4# grep root strings.out
6048/98D1088B-866F-E211-864A-00304867908C.root
root
root
root
Merged.root
Merged.root
6048/DE30D494-B76E-E211-AD3A-0025905938D4.root
root
root
root
Merged.root
Merged.root
6048/221B16C0-076F-E211-98BA-003048FFD7A2.root
root
root
root
Merged.root
Merged.root
rroot
6048/56048C11-A16E-E211-B93D-00248C0BE014.root
root
root
root
Merged.root
Merged.root
6048/9A1B967B-C26E-E211-9CF5-0026189438C1.root
root
root
root
Merged.root
Merged.root
6048/6C7A2302-7E6E-E211-954A-0026189438A2.root
root
root
root
Merged.root
Merged.root
6048/D00F585C-E16E-E211-B8A8-002618943970.root
root
root
root
Merged.root
Merged.root
6048/8CAD6065-F36E-E211-A0DE-002618943894.root
root
root
root
Merged.root
Merged.root
rootYoOt
root@storage:/bigdata-stor4# 

@github-actions
Copy link

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

@github-actions github-actions bot added the Stale label Aug 25, 2023
@github-actions
Copy link

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 25, 2023
@clriesco clriesco removed the Stale label Aug 28, 2023
@clriesco
Copy link
Collaborator

Removed stale label and reopened issue :)

@clriesco clriesco reopened this Aug 28, 2023
@amughal
Copy link
Author

amughal commented Aug 29, 2023

Thank you @clriesco

@amughal
Copy link
Author

amughal commented Sep 7, 2023

checker:manualTrigger

@filplus-checker-app
Copy link

DataCap and CID Checker Report Summary1

Retrieval Statistics

  • Overall Graphsync retrieval success rate: 79.56%
  • Overall HTTP retrieval success rate: 0.00%
  • Overall Bitswap retrieval success rate: 47.17%

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 83.28% of deals are for data replicated across less than 3 storage providers.

Deal Data Shared with other Clients2

⚠️ CID sharing has been observed. (Top 3)

Full report

Click here to view the CID Checker report.
Click here to view the Retrieval Dashboard.
Click here to view the Retrieval report.

Footnotes

  1. To manually trigger this report, add a comment with text checker:manualTrigger

  2. To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

@github-actions
Copy link

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

--
Commented by Stale Bot.

@amughal
Copy link
Author

amughal commented Sep 19, 2023

Please keep it open. Thanks

@github-actions
Copy link

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

--
Commented by Stale Bot.

@amughal
Copy link
Author

amughal commented Sep 30, 2023

Please do not close the application.

@github-actions
Copy link

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

--
Commented by Stale Bot.

@amughal
Copy link
Author

amughal commented Oct 11, 2023

Please do not close this. Thank you

@github-actions
Copy link

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

--
Commented by Stale Bot.

@amughal
Copy link
Author

amughal commented Oct 23, 2023

Please keep it open. Thanks

Copy link

github-actions bot commented Nov 3, 2023

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

--
Commented by Stale Bot.

@amughal
Copy link
Author

amughal commented Nov 3, 2023

Please keep it open

Copy link

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

--
Commented by Stale Bot.

@amughal
Copy link
Author

amughal commented Nov 14, 2023

Please keep it open

@Sunnyiscoming
Copy link
Collaborator

Hello, @amughal per the filecoin-project/notary-governance#922 for Open, Public Dataset applicants, please complete the following Fil+ registration form to identify yourself as the applicant and also please add the contact information of the SP entities you are working with to store copies of the data.

This information will be reviewed by Fil+ Governance team to confirm validity and then the application will be allowed to move forward for additional notary review.

@ghost
Copy link

ghost commented Nov 22, 2023

closing until clear SP entity and distribution provided

@ghost ghost closed this as completed Nov 22, 2023
@amughal
Copy link
Author

amughal commented Dec 4, 2023

checker:manualTrigger

Copy link

DataCap and CID Checker Report Summary1

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 81.60% of deals are for data replicated across less than 3 storage providers.

Deal Data Shared with other Clients2

⚠️ CID sharing has been observed. (Top 3)

Full report

Click here to view the CID Checker report.
Click here to view the Retrieval Dashboard.

Footnotes

  1. To manually trigger this report, add a comment with text checker:manualTrigger

  2. To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

@data-programs data-programs added the kyc verified User has passed KYC check label Feb 20, 2024
@data-programs
Copy link
Collaborator

KYC

This user’s identity has been verified through filplus.storage

This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests