DataDEX open grants application by gravity-link · Pull Request #304 · w3f/Grants-Program

gravity-link · 2021-03-07T10:31:05Z

Grant Application Checklist

The application-template.md has been copied, renamed ( "project_name.md") and updated.
A BTC or Ethereum (DAI) address for the payment of the milestones is provided inside the application.
The software of the project will be released under the Apache license version 2.0 as specified in the terms and conditions.
The total funding amount of the project is below USD $30k for initial grant applications and $100k for follow-up grants.
The initial PR contains only one commit (squash if needed before submitting your PR).
The grant will only be announced once we've successfully delivered the first milestone.

Create DataDEX.md

CLAassistant · 2021-03-07T10:31:31Z

All committers have signed the CLA.

add license info

gravity-link · 2021-03-09T06:55:38Z

All committers have signed the CLA.

done.

Noc2

Thanks for the application. In general, I recommend to provide more technical details (functions, programming language, etc.) and to focus on the unique parts of your application. For example, we already have quite a few AMM based exchanges in our ecosystem, so we probably don’t fund just another AMM at this stage. But how do you for example proof that if you buy data, you actually receive the correct data? Could potentially provide an example? How does your solution compare to the ocean protocol?

gravity-link · 2021-03-10T07:08:31Z

Thanks for the application. In general, I recommend to provide more technical details (functions, programming language, etc.) and to focus on the unique parts of your application. For example, we already have quite a few AMM based exchanges in our ecosystem, so we probably don’t fund just another AMM at this stage. But how do you for example proof that if you buy data, you actually receive the correct data? Could potentially provide an example? How does your solution compare to the ocean protocol?

Thanks for your comment. I added a data registration and checking design in the application to ensure the basically security and correctness of data reading. pls check it.

We currently develop in Java, and be preparing refactor essential modules in Rust.

There are many differences compare to ocean protocol:

DataDEX is first personal privacy oriented AMM DEX. User data can be registered and processed on the edge side include mobile device and PC. So we need to cooperate with privacy computation network on edge side like Phala, Alita Network etc.
DataDEX has pricing for data based on computation costs. not only data volume or data variety. that means the more consumer use of CPU,MEM for processing data, they pay more.
DataDEX use simple liquidity AMM like constant function mechanism which would be more fair and friendly to data consumers.
The registration of data catalog in DataDEX is designed for personal data. pls refer to the sample of definition in application. We want to make everyone's capitalization of privacy and truly own their privacy. and as I know OP follows the schema.org, which is more suitable for web data.

Noc2

Thanks for the additional information. Another issue I see is how do you prevent people from “stealing” data. For example, I buy someone else's data, I change a very small part of it in a way that the hash of data changes and I just sell the same data? Maybe you could explain the DataGraph a little bit more. Additionally how do you ensure the availability of the data. For example, I buy data on the exchange, but it’s actually no longer available and I can not download it.

gravity-link · 2021-03-13T06:42:29Z

Thanks for the additional information. Another issue I see is how do you prevent people from “stealing” data. For example, I buy someone else's data, I change a very small part of it in a way that the hash of data changes and I just sell the same data? Maybe you could explain the DataGraph a little bit more. Additionally how do you ensure the availability of the data. For example, I buy data on the exchange, but it’s actually no longer available and I can not download it.

Thanks for comment. DataGraph is designed with multi-properties schema. After register the data, the Data Owner could be verified the ownership from multiple dimensions, not only the userid. If the malicious data owner just modifies a small part, other dimensions will conflict to the real data, the registration cannot be successfully submitted and cannot be successfully verified.

We will also reward community members who voluntarily find malicious data or errors (we hope that Dataset Maker will play a leading role), based on data characteristics analysis and relevant in data technology. and then owners with data faults should be punished. We have rich experience in anti fake products for e-commerce platforms with this similar technology.

If the data owner could not guarantee the data availability, including the data non-exists or the data unregistered etc. The tasks scheduled by the computing node(Phala, Alita network etc.) to the device of data owner will fail, DataDEX can not check the verification of task, so the DataOwner can not get the rewards.

Noc2

Okay thanks for the reply. Could you integrate this into the application, ideally even into the deliverables? We have quite some interest in supporting the development and research of on-chain mechanisms to identify “malicious data”, e.g. via a DAO or scheme to deal with data availability issues. In general, it would be helpful if the deliverables are more specific. For example, in which programming languages are you planning to develop the smart contract, front-end, app, etc. What are the functionalities of these deliverables? Btw. feel free to remove the deployment on roco, since this isn’t something necessarily helpful for others.

add DAO and DataGraph example

add development specifics

gravity-link · 2021-03-17T08:43:49Z

on-chain mechanisms to identify “malicious data”,

Thanks for your advice. I have integrated specifics in application, pls check it.

Noc2 · 2021-03-17T19:23:50Z

Thanks for the update. I have a few follow-up questions: Is this your previous project: https://github.com/ALITANetwork? Why do you want to start using substrate now? Are the UIs part of the repo the ones, which you plan to develop? And are you familiar with burstcoin?

gravity-link · 2021-03-18T02:36:38Z

Are the UIs part of the repo the ones, which you plan to develop

Yes, AlitaNetwork is our previous project, which is a privacy-preserving edge computing network. One of the component accounting networks is developed based on burst. But afterwards, we met the situation that we have too much work in the development of the public chain. and found that the openness and friendship of substrate can greatly help us reduce the workload and invest our time in data and computing that we are more familiar with. In addition, polka has rich ecological resources, and the staking economic model is more fitted for us to expand the community, so we plan to migrate to substrate.

Since the new project will be developed based on substrate, we hope that the previous UIs need to be totally refactored.

Thanks.

gravity-link · 2021-03-23T10:49:06Z

Hi @Noc2 , thanks for your time. It's a nice conversation.
Would you please review the comment above? And I'm glad to answer in details if there are any other issues.

Noc2 · 2021-03-23T14:35:11Z

Sorry for the late reply. So the UI will be based on your previous work, correct? I would still be interested in learning more about your relationship with burstcoin.

gravity-link · 2021-03-23T17:11:55Z

Sorry for the late reply. So the UI will be based on your previous work, correct? I would still be interested in learning more about your relationship with burstcoin.

Thanks for your reply.
No. UI will be rebuilt because it integrates burstcoin wallet sdk, which is unusable for substrate.
About the relationship with burstcoin, let me introduce the stack of the previous project.
There are three main components:

Computing network, PC version container service based on k3s and Task Executor based on Android. We also developed the container orchestration service for the business clients.
Storage network, the data to process is stored in the private forked instance of IPFS network, which is extended to read and write row based flat files.
Accounting network, token mining, payment and smart contract(in Java and based on CIYAM), private forked from burstcoin. We plan to port to substrate chain and refactor the contract in Ink!. And also UIs will be redesigned.

Additionally, we have no cooperation with the burstcoin project or the community.

gravity-link · 2021-03-26T06:55:45Z

Hi @Noc2 What are your main concerns about the previous project? I would be very glad to hear from you suggestions. Thanks.

Noc2 · 2021-03-26T16:14:08Z

Thanks for the additional information. In this case, I recommend to make sure that you properly point out in your repos which work was done by you and which was copied/cloned by burstcoin. I have another question: Are you planning to deliver: “Data Registeration Entry'' and “Data Graph DAO” as part of the grant? In the meantime I will share the application with the rest of the team.

semuelle · 2021-03-26T19:13:28Z

Hi @gravity-link, a few questions from my side:

I don't understand this bit: If the malicious data owner just modifies a small part, other dimensions will conflict to the real data, the registration cannot be successfully submitted and cannot be successfully verified.. If I were to buy a dataset, then add a character somewhere and recalculate all hashes, the end result is a new dataset with a different hash. Also, I could combine two datasets in a random order and recalculate hashes, and it would be very hard -- even for a human observer -- to detect that the data was stolen. Can you expand on that?
Can you also explain this bit: dataset pools everyone’s data supply and demand liquidity together and makes markets? How do you calculate supply and demand for a dataset? Is this demand for this particular dataset, or similar data on the platform, or all data? Can there by an unlimited number of data owners?
What does the mobile app do? Mobile seems an odd platform for data-centric applications.
Do you have an English version of your lightpaper?
Do you have a specific target audience in mind, e.g. medical companies? Do you already have potential users?

revise deliveries milestone.

gravity-link · 2021-03-27T05:52:24Z

In the meantime I will share the application with the rest of the team.

Thanks for the suggestions.
I have revised milestone including DAO and registeration entry.
and I will update the doc in repos of pervious project to point out the third party code reference.

gravity-link · 2021-03-27T07:31:19Z

Hi @gravity-link, a few questions from my side:

I don't understand this bit: If the malicious data owner just modifies a small part, other dimensions will conflict to the real data, the registration cannot be successfully submitted and cannot be successfully verified.. If I were to buy a dataset, then add a character somewhere and recalculate all hashes, the end result is a new dataset with a different hash. Also, I could combine two datasets in a random order and recalculate hashes, and it would be very hard -- even for a human observer -- to detect that the data was stolen. Can you expand on that?

Can you also explain this bit: dataset pools everyone’s data supply and demand liquidity together and makes markets? How do you calculate supply and demand for a dataset? Is this demand for this particular dataset, or similar data on the platform, or all data? Can there by an unlimited number of data owners?

What does the mobile app do? Mobile seems an odd platform for data-centric applications.

Do you have an English version of your lightpaper?

Do you have a specific target audience in mind, e.g. medical companies? Do you already have potential users?

Thanks for the comment.

Malicious data owner could not successfully register the data to data graph because registration tool would check out the conflicts to the real data. So the consumer could never to buy unregister data. Additionally, the verification for conflicts of data registration would also be multiple checked in DAO mechanism by other verifiers.
Dataset maker should firstly create new data token(deploy a token contract) and config the quantity of data token. After this maker could initial offer data token to collect registered data(readable privileges) from personal owners. Then maker would create data token and main token pairs as a dataset pool and initially liquidation. Data consumers would exchange data token on demand by main token in DataDEX. and the price would be recalculated with constant function mechanism. So It's for particular dataset, not all dataset. But owners could surely contribute to different pools to maximized the data value.
Because DataDEX is specially designed for personal private data, mobile and PC are the main entries. We believe that the edge side is the original source of data, and the first-time registration of data ownership through mobile app is of great value to real Data Economy.
Light-paper draft v0.3.
Yes. We already have business clients in the fields of Mar-tech, online finance and Bio-bigdata, and helping them to find flexible and compliant data source.

Noc2 · 2021-03-27T15:06:06Z

In the meantime I will share the application with the rest of the team.

Thanks for the suggestions.
I have revised milestone including DAO and registeration entry.
and I will update the doc in repos of pervious project to point out the third party code reference.

Thanks for the update. I’m happy to go ahead with it, once you have updated the repos. Feel free to ping me here. One additional suggestion from my side: It might make sense to split the deliveries into two milestones. We only pay you once you successfully delivered a milestone and currently you would need to deliver a lot before you get paid.

split milestone

gravity-link · 2021-03-28T05:11:35Z

In the meantime I will share the application with the rest of the team.

Thanks for the suggestions.
I have revised milestone including DAO and registeration entry.
and I will update the doc in repos of pervious project to point out the third party code reference.

Thanks for the update. I’m happy to go ahead with it, once you have updated the repos. Feel free to ping me here. One additional suggestion from my side: It might make sense to split the deliveries into two milestones. We only pay you once you successfully delivered a milestone and currently you would need to deliver a lot before you get paid.

Hi @Noc2 Thanks for your advice. I have split the deliveries into two milestones.
And the repos is also updated, pls check commit1 and commit2.

semuelle · 2021-03-29T08:00:43Z

Hi @gravity-link, thanks for your reply.

I have decided not to approve your application because (a) I think creating a new token contract for every dataset being sold seems inefficient, (b) linking a dataset to a limited amount of tokens sounds like artificial shortage, and (c) if I were to sell my personal data to the highest bidder, I would at least like to know who I'm selling to. I'd rather not have them bound to a token that might get passed around.

I also wouldn't want to encourage people to sell their personal data, although I do appreciate the idea of at least making it a transparent marketplace!

gravity-link · 2021-03-29T10:31:19Z

Hi @gravity-link, thanks for your reply.

I have decided not to approve your application because (a) I think creating a new token contract for every dataset being sold seems inefficient, (b) linking a dataset to a limited amount of tokens sounds like artificial shortage, and (c) if I were to sell my personal data to the highest bidder, I would at least like to know who I'm selling to. I'd rather not have them bound to a token that might get passed around.

I also wouldn't want to encourage people to sell their personal data, although I do appreciate the idea of at least making it a transparent marketplace!

Hi @semuelle I am very appreciated that you gave us such an in-depth and professional review opinions of the application.
For your concerns, I would like to explain some details:

a) Creating a token for each Dataset is mainly due to data pricing issue. Because the value of the data set is related to the data processing algorithm, scenario, sales scale, and even region and crowd, Dataset Maker will have capability to operate its own data supply and demand market. In our experience, if data pricing is too unified, the data market transaction volume would be very limited.

b) I would like to further explain that limited tokens do not mean limited data volume. Dataset Maker can continuously invite new Data Owners to contribute data(permissions), but according to the price of Token, the incentives for later data Owners are different. Therefore, the limited Token actually provides a control capability of token circulation for the Dataset Maker.

c) We not only hope that the Data Owner can gain data revenue, but also can trace the data usage, including who uses the data and how to use the data. Additionally, What tokenized is data readable privileges, not raw data. I think the mentioned above would be why the personal data should be tokenized.

As we know that have been already a fact that Internet providers used personal data on a large scale. We just hope to help Data Owners get back their benefits they lost.

I know that there is a huge difference between the Economic Model for Personal Data and Enterprise Confidential Data, and also hardworking on that, but I still believe that this is a matter of very social significance and commercial value, and I believe that we are one of the most experienced team to push it forward in progress. So if possible, please consider the feasibility of this application again, thank you very much!

BTW. We will upload the revised english version of Light-Paper tomorrow, in which will add specifics on pricing etc. and will post it here ASAP, please refer to it. Download here please.

alxs · 2021-03-30T13:17:15Z

Hi @gravity-link. After some consideration, I'm sorry to inform you that I will not support this proposal either due to the following reasons:

The application lacks significant details and fails to answer important questions related to your approach, such as:
- How you will handle encryption and computation within the TEE.
- How you will calculate the computing costs of a task.
- How you will integrate this into the pricing if the data can be bought through an AMM.
- How you will guarantee the legality and correctness of the data (simply claiming this will be solved through DAO is not enough).
- How the registration tool could prevent conflicts with existing data as per Sebastian's comment.
- ...and a long list of further inconsistencies and inaccuracies.
  
  See the Ruby Protocol application for a very good example of a proposal for a project with similar aim.
Your answers to previous comments are generally vague and do not provide concrete answers to the mentioned challenges.
I also think the idea of creating a separate token for each dataset is too convoluted and even unnecessary. I don't see why it wouldn't be enough to charge per computation since this would also lead to larger returns for datasets in higher demand.
The Alita repos have been dead for a while and only contain significant commits by one person. I do not think this is a strong reference.
Your future plans are requirements rather than plans and lack an appropriate strategy to achieve those.

add reference

gravity-link · 2021-03-31T16:38:35Z

Thanks for your honest and professional opinions. I would like to explain following:

Hi @gravity-link. After some consideration, I'm sorry to inform that I will not support this proposal either due to the following reasons:

The application lacks significant details and fails to answer important questions related to your approach, such as:

How you will handle encryption and computation within the TEE.

DataDEX is a marketplace separate from the computing network. Its main functions are data token exchange and data ownership registration. Therefore, there is no need for TEE to encrypt data, which should rely on other private computing networks such as Phala or Alita.

How you will calculate the computing costs of a task.

    Computing costs is calculated with concurrency scaling per minute. For example: Costs=(task1*t1+task2*t2+taskn*tn)*pt

How you will integrate this into the pricing if the data can be bought through an AMM.

   We will leverage Task Oracle to submit tasks and use DataToken to pay for the cost of accessing data. Task Oralce has a pallets communicates with other computing networks across the chain through XCMP. I will add rest components design details.  @alxs @semuelle pls refer to the update.

In addition, cost calculation and data price measurement are already common pricing strategies. I added reference about AWS data exchange pricing in the bottom of application.

How you will guarantee the legality and correctness of the data (simply claiming this will be solved through DAO is not enough).

 There is a Checker Node in Task Oracle to check Task read permissions. For correctness, we currently use the local pin of ipfs to ensure that the data is not collected locally, and to check the hash wether match with registered in DataGraph.

How the registration tool could prevent conflicts with existing data as per Sebastian's comment.

 DataGraph would be a well designed catalog for personal data. Register tool could check for conflict to other data with data properties in meta and the data hash.   @alxs @semuelle pls refer to the update.

...and a long list of further inconsistencies and inaccuracies.
sorry for the readable experience. I will recheck and be very grateful to let us know if there is any inconsistency.
See the Ruby Protocol application for a very good example of a proposal for a project with similar aim.

Awsome, it seems be a privacy computing project. But I am not sure why it aims on personal data, and it seems to be suitable for non-personal data too. I hope it would be a compute node of DataDEX. I will follow its progress and would like to learn from its scheme.

Your answers to previous comments are generally vague and do not provide concrete answers to the mentioned challenges.

I will improve this.

I also think the idea of creating a separate token for each dataset is too convoluted and even unnecessary. I don't see why it wouldn't be enough to charge per computation since this would also lead to larger returns for datasets in higher demand.

DataDEX designed for high quality and high volume of dataset, not only the number of dataset. I have added a reference that Google built one high quality dataset distributed to millions users mobile phones for Next Word model training.So I believe creating a token for a dataset of great value would be make sense.

The Alita repos have been dead for a while and only contain significant commits by one person. I do not think this is a strong reference.

There are 3~5 engineers working for Alita project. Some components have not open source but soon.

Your future plans are requirements rather than plans and lack an appropriate strategy to achieve those.

Thanks for your advice. I will continuously add specifics in application.

add Task Oracle specifics

alxs · 2021-04-01T18:45:13Z

Thank you for your clarifications @gravity-link. Your first point in specific is valid and I had misunderstood this aspect of your application. However, I don't see how any of your updates or other answers address my concerns and I stand by my decision. Others might be convinced though, let's give the application a bit more time.

add Scenario and external reference URLs

gravity-link · 2021-04-16T06:46:10Z

Hi @alxs , @semuelle ,
About the issue of creating an unique token for each dataset, I want to share a news. Google Chrome create new tracking method called FLoC, which means that everyone’s browsing history will be added to the same dataset, which is very large volume and valuable. We just would like invite such great dataset makers to operate high-quality and high-value-added data, not dataset flea markets. From this perspective, the token of each dataset is make sense.

alxs · 2021-04-26T15:38:49Z

Closing since there seems to be no interest from other members of the committee either. As previously stated, there are a number of issues you haven't addressed yet and the proposal lacks technical details, neither of which has been improved in your last update.

Feel free to reopen an application when you're further along with your project and are able to address these issues.

Open Square Network - BlockChain Based Crowdsourcing and Reputation Platform

gravity-link added 4 commits March 4, 2021 21:58

Create DataDEX.md

25587eb

Create DataDEX.md

Update DataDEX.md

761dabd

Update DataDEX.md

30a1680

Update DataDEX.md

7723e3b

Update DataDEX.md

293e5d6

add license info

Noc2 self-assigned this Mar 9, 2021

Noc2 added the changes requested The team needs to clarify a few things first. label Mar 9, 2021

Noc2 suggested changes Mar 9, 2021

View reviewed changes

gravity-link added 2 commits March 10, 2021 11:31

Update DataDEX.md

8b7d72e

Update DataDEX.md

566a524

gravity-link requested a review from Noc2 March 10, 2021 09:52

Noc2 suggested changes Mar 12, 2021

View reviewed changes

gravity-link requested a review from Noc2 March 13, 2021 06:44

Noc2 suggested changes Mar 15, 2021

View reviewed changes

gravity-link added 2 commits March 17, 2021 15:35

Update DataDEX.md

97f464d

add DAO and DataGraph example

Update DataDEX.md

37c6e64

add development specifics

gravity-link requested a review from Noc2 March 17, 2021 08:44

Noc2 added ready for review The project is ready to be reviewed by the committee members. and removed changes requested The team needs to clarify a few things first. labels Mar 26, 2021

Update DataDEX.md

c74ec9a

revise deliveries milestone.

Update DataDEX.md

cc49dbd

split milestone

gravity-link added 2 commits March 31, 2021 14:08

Update DataDEX.md

65646c2

Update DataDEX.md

fadbd15

add reference

Update DataDEX.md

6f93aec

add Task Oracle specifics

Update DataDEX.md

f97a089

add Scenario and external reference URLs

alxs closed this Apr 26, 2021

alxs pushed a commit that referenced this pull request Jul 20, 2021

Merge pull request #304 from wliyongfeng/master

b575029

Open Square Network - BlockChain Based Crowdsourcing and Reputation Platform

green-jay mentioned this pull request Dec 1, 2021

Subauction Milestone 0 w3f/Grant-Milestone-Delivery#313

Merged

5 tasks

Conversation

gravity-link commented Mar 7, 2021

Grant Application Checklist

Uh oh!

CLAassistant commented Mar 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gravity-link commented Mar 9, 2021

Uh oh!

Noc2 left a comment

Choose a reason for hiding this comment

Uh oh!

gravity-link commented Mar 10, 2021

Uh oh!

Noc2 left a comment

Choose a reason for hiding this comment

Uh oh!

gravity-link commented Mar 13, 2021

Uh oh!

Noc2 left a comment

Choose a reason for hiding this comment

Uh oh!

gravity-link commented Mar 17, 2021

Uh oh!

Noc2 commented Mar 17, 2021

Uh oh!

gravity-link commented Mar 18, 2021

Uh oh!

gravity-link commented Mar 23, 2021

Uh oh!

Noc2 commented Mar 23, 2021

Uh oh!

gravity-link commented Mar 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gravity-link commented Mar 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Noc2 commented Mar 26, 2021

Uh oh!

semuelle commented Mar 26, 2021

Uh oh!

gravity-link commented Mar 27, 2021

Uh oh!

gravity-link commented Mar 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Noc2 commented Mar 27, 2021

Uh oh!

gravity-link commented Mar 28, 2021

Uh oh!

semuelle commented Mar 29, 2021

Uh oh!

gravity-link commented Mar 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alxs commented Mar 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gravity-link commented Mar 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alxs commented Apr 1, 2021

Uh oh!

gravity-link commented Apr 16, 2021

Uh oh!

alxs commented Apr 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

CLAassistant commented Mar 7, 2021 •

edited

Loading

gravity-link commented Mar 23, 2021 •

edited

Loading

gravity-link commented Mar 26, 2021 •

edited

Loading

gravity-link commented Mar 27, 2021 •

edited

Loading

gravity-link commented Mar 29, 2021 •

edited

Loading

alxs commented Mar 30, 2021 •

edited

Loading

gravity-link commented Mar 31, 2021 •

edited

Loading

alxs commented Apr 26, 2021 •

edited

Loading