Skip to content

DataDEX open grants application#304

Closed
gravity-link wants to merge 15 commits intow3f:masterfrom
datadex-trade:master
Closed

DataDEX open grants application#304
gravity-link wants to merge 15 commits intow3f:masterfrom
datadex-trade:master

Conversation

@gravity-link
Copy link
Copy Markdown

Grant Application Checklist

  • The application-template.md has been copied, renamed ( "project_name.md") and updated.
  • A BTC or Ethereum (DAI) address for the payment of the milestones is provided inside the application.
  • The software of the project will be released under the Apache license version 2.0 as specified in the terms and conditions.
  • The total funding amount of the project is below USD $30k for initial grant applications and $100k for follow-up grants.
  • The initial PR contains only one commit (squash if needed before submitting your PR).
  • The grant will only be announced once we've successfully delivered the first milestone.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 7, 2021

CLA assistant check
All committers have signed the CLA.

add license info
@gravity-link
Copy link
Copy Markdown
Author

CLA assistant check
All committers have signed the CLA.

done.

@Noc2 Noc2 self-assigned this Mar 9, 2021
@Noc2 Noc2 added the changes requested The team needs to clarify a few things first. label Mar 9, 2021
Copy link
Copy Markdown
Contributor

@Noc2 Noc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the application. In general, I recommend to provide more technical details (functions, programming language, etc.) and to focus on the unique parts of your application. For example, we already have quite a few AMM based exchanges in our ecosystem, so we probably don’t fund just another AMM at this stage. But how do you for example proof that if you buy data, you actually receive the correct data? Could potentially provide an example? How does your solution compare to the ocean protocol?

@gravity-link
Copy link
Copy Markdown
Author

Thanks for the application. In general, I recommend to provide more technical details (functions, programming language, etc.) and to focus on the unique parts of your application. For example, we already have quite a few AMM based exchanges in our ecosystem, so we probably don’t fund just another AMM at this stage. But how do you for example proof that if you buy data, you actually receive the correct data? Could potentially provide an example? How does your solution compare to the ocean protocol?

Thanks for your comment. I added a data registration and checking design in the application to ensure the basically security and correctness of data reading. pls check it.

We currently develop in Java, and be preparing refactor essential modules in Rust.

There are many differences compare to ocean protocol:

  1. DataDEX is first personal privacy oriented AMM DEX. User data can be registered and processed on the edge side include mobile device and PC. So we need to cooperate with privacy computation network on edge side like Phala, Alita Network etc.
  2. DataDEX has pricing for data based on computation costs. not only data volume or data variety. that means the more consumer use of CPU,MEM for processing data, they pay more.
  3. DataDEX use simple liquidity AMM like constant function mechanism which would be more fair and friendly to data consumers.
  4. The registration of data catalog in DataDEX is designed for personal data. pls refer to the sample of definition in application. We want to make everyone's capitalization of privacy and truly own their privacy. and as I know OP follows the schema.org, which is more suitable for web data.

@gravity-link gravity-link requested a review from Noc2 March 10, 2021 09:52
Copy link
Copy Markdown
Contributor

@Noc2 Noc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additional information. Another issue I see is how do you prevent people from “stealing” data. For example, I buy someone else's data, I change a very small part of it in a way that the hash of data changes and I just sell the same data? Maybe you could explain the DataGraph a little bit more. Additionally how do you ensure the availability of the data. For example, I buy data on the exchange, but it’s actually no longer available and I can not download it.

@gravity-link
Copy link
Copy Markdown
Author

Thanks for the additional information. Another issue I see is how do you prevent people from “stealing” data. For example, I buy someone else's data, I change a very small part of it in a way that the hash of data changes and I just sell the same data? Maybe you could explain the DataGraph a little bit more. Additionally how do you ensure the availability of the data. For example, I buy data on the exchange, but it’s actually no longer available and I can not download it.

Thanks for comment. DataGraph is designed with multi-properties schema. After register the data, the Data Owner could be verified the ownership from multiple dimensions, not only the userid. If the malicious data owner just modifies a small part, other dimensions will conflict to the real data, the registration cannot be successfully submitted and cannot be successfully verified.

We will also reward community members who voluntarily find malicious data or errors (we hope that Dataset Maker will play a leading role), based on data characteristics analysis and relevant in data technology. and then owners with data faults should be punished. We have rich experience in anti fake products for e-commerce platforms with this similar technology.

If the data owner could not guarantee the data availability, including the data non-exists or the data unregistered etc. The tasks scheduled by the computing node(Phala, Alita network etc.) to the device of data owner will fail, DataDEX can not check the verification of task, so the DataOwner can not get the rewards.

@gravity-link gravity-link requested a review from Noc2 March 13, 2021 06:44
Copy link
Copy Markdown
Contributor

@Noc2 Noc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay thanks for the reply. Could you integrate this into the application, ideally even into the deliverables? We have quite some interest in supporting the development and research of on-chain mechanisms to identify “malicious data”, e.g. via a DAO or scheme to deal with data availability issues. In general, it would be helpful if the deliverables are more specific. For example, in which programming languages are you planning to develop the smart contract, front-end, app, etc. What are the functionalities of these deliverables? Btw. feel free to remove the deployment on roco, since this isn’t something necessarily helpful for others.

add DAO and DataGraph example
add development specifics
@gravity-link
Copy link
Copy Markdown
Author

on-chain mechanisms to identify “malicious data”,

Thanks for your advice. I have integrated specifics in application, pls check it.

@gravity-link gravity-link requested a review from Noc2 March 17, 2021 08:44
@Noc2
Copy link
Copy Markdown
Contributor

Noc2 commented Mar 17, 2021

Thanks for the update. I have a few follow-up questions: Is this your previous project: https://github.com/ALITANetwork? Why do you want to start using substrate now? Are the UIs part of the repo the ones, which you plan to develop? And are you familiar with burstcoin?

@gravity-link
Copy link
Copy Markdown
Author

Are the UIs part of the repo the ones, which you plan to develop

Yes, AlitaNetwork is our previous project, which is a privacy-preserving edge computing network. One of the component accounting networks is developed based on burst. But afterwards, we met the situation that we have too much work in the development of the public chain. and found that the openness and friendship of substrate can greatly help us reduce the workload and invest our time in data and computing that we are more familiar with. In addition, polka has rich ecological resources, and the staking economic model is more fitted for us to expand the community, so we plan to migrate to substrate.

Since the new project will be developed based on substrate, we hope that the previous UIs need to be totally refactored.

Thanks.

@gravity-link
Copy link
Copy Markdown
Author

Hi @Noc2 , thanks for your time. It's a nice conversation.
Would you please review the comment above? And I'm glad to answer in details if there are any other issues.

@Noc2
Copy link
Copy Markdown
Contributor

Noc2 commented Mar 23, 2021

Sorry for the late reply. So the UI will be based on your previous work, correct? I would still be interested in learning more about your relationship with burstcoin.

@gravity-link
Copy link
Copy Markdown
Author

gravity-link commented Mar 23, 2021

Sorry for the late reply. So the UI will be based on your previous work, correct? I would still be interested in learning more about your relationship with burstcoin.

Thanks for your reply.
No. UI will be rebuilt because it integrates burstcoin wallet sdk, which is unusable for substrate.
About the relationship with burstcoin, let me introduce the stack of the previous project.
There are three main components:

  1. Computing network, PC version container service based on k3s and Task Executor based on Android. We also developed the container orchestration service for the business clients.
  2. Storage network, the data to process is stored in the private forked instance of IPFS network, which is extended to read and write row based flat files.
  3. Accounting network, token mining, payment and smart contract(in Java and based on CIYAM), private forked from burstcoin. We plan to port to substrate chain and refactor the contract in Ink!. And also UIs will be redesigned.

Additionally, we have no cooperation with the burstcoin project or the community.

@gravity-link
Copy link
Copy Markdown
Author

gravity-link commented Mar 26, 2021

Hi @Noc2 What are your main concerns about the previous project? I would be very glad to hear from you suggestions. Thanks.

@Noc2 Noc2 added ready for review The project is ready to be reviewed by the committee members. and removed changes requested The team needs to clarify a few things first. labels Mar 26, 2021
@Noc2
Copy link
Copy Markdown
Contributor

Noc2 commented Mar 26, 2021

Thanks for the additional information. In this case, I recommend to make sure that you properly point out in your repos which work was done by you and which was copied/cloned by burstcoin. I have another question: Are you planning to deliver: “Data Registeration Entry'' and “Data Graph DAO” as part of the grant? In the meantime I will share the application with the rest of the team.

@semuelle
Copy link
Copy Markdown
Contributor

Hi @gravity-link, a few questions from my side:

  • I don't understand this bit: If the malicious data owner just modifies a small part, other dimensions will conflict to the real data, the registration cannot be successfully submitted and cannot be successfully verified.. If I were to buy a dataset, then add a character somewhere and recalculate all hashes, the end result is a new dataset with a different hash. Also, I could combine two datasets in a random order and recalculate hashes, and it would be very hard -- even for a human observer -- to detect that the data was stolen. Can you expand on that?
  • Can you also explain this bit: dataset pools everyone’s data supply and demand liquidity together and makes markets? How do you calculate supply and demand for a dataset? Is this demand for this particular dataset, or similar data on the platform, or all data? Can there by an unlimited number of data owners?
  • What does the mobile app do? Mobile seems an odd platform for data-centric applications.
  • Do you have an English version of your lightpaper?
  • Do you have a specific target audience in mind, e.g. medical companies? Do you already have potential users?

revise deliveries milestone.
@gravity-link
Copy link
Copy Markdown
Author

In the meantime I will share the application with the rest of the team.

Thanks for the suggestions.
I have revised milestone including DAO and registeration entry.
and I will update the doc in repos of pervious project to point out the third party code reference.

@gravity-link
Copy link
Copy Markdown
Author

gravity-link commented Mar 27, 2021

Hi @gravity-link, a few questions from my side:

  • I don't understand this bit: If the malicious data owner just modifies a small part, other dimensions will conflict to the real data, the registration cannot be successfully submitted and cannot be successfully verified.. If I were to buy a dataset, then add a character somewhere and recalculate all hashes, the end result is a new dataset with a different hash. Also, I could combine two datasets in a random order and recalculate hashes, and it would be very hard -- even for a human observer -- to detect that the data was stolen. Can you expand on that?
  • Can you also explain this bit: dataset pools everyone’s data supply and demand liquidity together and makes markets? How do you calculate supply and demand for a dataset? Is this demand for this particular dataset, or similar data on the platform, or all data? Can there by an unlimited number of data owners?
  • What does the mobile app do? Mobile seems an odd platform for data-centric applications.
  • Do you have an English version of your lightpaper?
  • Do you have a specific target audience in mind, e.g. medical companies? Do you already have potential users?

Thanks for the comment.

  • Malicious data owner could not successfully register the data to data graph because registration tool would check out the conflicts to the real data. So the consumer could never to buy unregister data. Additionally, the verification for conflicts of data registration would also be multiple checked in DAO mechanism by other verifiers.
  • Dataset maker should firstly create new data token(deploy a token contract) and config the quantity of data token. After this maker could initial offer data token to collect registered data(readable privileges) from personal owners. Then maker would create data token and main token pairs as a dataset pool and initially liquidation. Data consumers would exchange data token on demand by main token in DataDEX. and the price would be recalculated with constant function mechanism. So It's for particular dataset, not all dataset. But owners could surely contribute to different pools to maximized the data value.
  • Because DataDEX is specially designed for personal private data, mobile and PC are the main entries. We believe that the edge side is the original source of data, and the first-time registration of data ownership through mobile app is of great value to real Data Economy.
  • Light-paper draft v0.3.
  • Yes. We already have business clients in the fields of Mar-tech, online finance and Bio-bigdata, and helping them to find flexible and compliant data source.

@Noc2
Copy link
Copy Markdown
Contributor

Noc2 commented Mar 27, 2021

In the meantime I will share the application with the rest of the team.

Thanks for the suggestions.
I have revised milestone including DAO and registeration entry.
and I will update the doc in repos of pervious project to point out the third party code reference.

Thanks for the update. I’m happy to go ahead with it, once you have updated the repos. Feel free to ping me here. One additional suggestion from my side: It might make sense to split the deliveries into two milestones. We only pay you once you successfully delivered a milestone and currently you would need to deliver a lot before you get paid.

split milestone
@gravity-link
Copy link
Copy Markdown
Author

In the meantime I will share the application with the rest of the team.

Thanks for the suggestions.
I have revised milestone including DAO and registeration entry.
and I will update the doc in repos of pervious project to point out the third party code reference.

Thanks for the update. I’m happy to go ahead with it, once you have updated the repos. Feel free to ping me here. One additional suggestion from my side: It might make sense to split the deliveries into two milestones. We only pay you once you successfully delivered a milestone and currently you would need to deliver a lot before you get paid.

Hi @Noc2 Thanks for your advice. I have split the deliveries into two milestones.
And the repos is also updated, pls check commit1 and commit2.

@semuelle
Copy link
Copy Markdown
Contributor

Hi @gravity-link, thanks for your reply.

I have decided not to approve your application because (a) I think creating a new token contract for every dataset being sold seems inefficient, (b) linking a dataset to a limited amount of tokens sounds like artificial shortage, and (c) if I were to sell my personal data to the highest bidder, I would at least like to know who I'm selling to. I'd rather not have them bound to a token that might get passed around.

I also wouldn't want to encourage people to sell their personal data, although I do appreciate the idea of at least making it a transparent marketplace!

@gravity-link
Copy link
Copy Markdown
Author

gravity-link commented Mar 29, 2021

Hi @gravity-link, thanks for your reply.

I have decided not to approve your application because (a) I think creating a new token contract for every dataset being sold seems inefficient, (b) linking a dataset to a limited amount of tokens sounds like artificial shortage, and (c) if I were to sell my personal data to the highest bidder, I would at least like to know who I'm selling to. I'd rather not have them bound to a token that might get passed around.

I also wouldn't want to encourage people to sell their personal data, although I do appreciate the idea of at least making it a transparent marketplace!

Hi @semuelle I am very appreciated that you gave us such an in-depth and professional review opinions of the application.
For your concerns, I would like to explain some details:

a) Creating a token for each Dataset is mainly due to data pricing issue. Because the value of the data set is related to the data processing algorithm, scenario, sales scale, and even region and crowd, Dataset Maker will have capability to operate its own data supply and demand market. In our experience, if data pricing is too unified, the data market transaction volume would be very limited.

b) I would like to further explain that limited tokens do not mean limited data volume. Dataset Maker can continuously invite new Data Owners to contribute data(permissions), but according to the price of Token, the incentives for later data Owners are different. Therefore, the limited Token actually provides a control capability of token circulation for the Dataset Maker.

c) We not only hope that the Data Owner can gain data revenue, but also can trace the data usage, including who uses the data and how to use the data. Additionally, What tokenized is data readable privileges, not raw data. I think the mentioned above would be why the personal data should be tokenized.

As we know that have been already a fact that Internet providers used personal data on a large scale. We just hope to help Data Owners get back their benefits they lost.

I know that there is a huge difference between the Economic Model for Personal Data and Enterprise Confidential Data, and also hardworking on that, but I still believe that this is a matter of very social significance and commercial value, and I believe that we are one of the most experienced team to push it forward in progress. So if possible, please consider the feasibility of this application again, thank you very much!

BTW. We will upload the revised english version of Light-Paper tomorrow, in which will add specifics on pricing etc. and will post it here ASAP, please refer to it. Download here please.

@alxs
Copy link
Copy Markdown
Contributor

alxs commented Mar 30, 2021

Hi @gravity-link. After some consideration, I'm sorry to inform you that I will not support this proposal either due to the following reasons:

  • The application lacks significant details and fails to answer important questions related to your approach, such as:
    • How you will handle encryption and computation within the TEE.

    • How you will calculate the computing costs of a task.

    • How you will integrate this into the pricing if the data can be bought through an AMM.

    • How you will guarantee the legality and correctness of the data (simply claiming this will be solved through DAO is not enough).

    • How the registration tool could prevent conflicts with existing data as per Sebastian's comment.

    • ...and a long list of further inconsistencies and inaccuracies.

      See the Ruby Protocol application for a very good example of a proposal for a project with similar aim.

  • Your answers to previous comments are generally vague and do not provide concrete answers to the mentioned challenges.
  • I also think the idea of creating a separate token for each dataset is too convoluted and even unnecessary. I don't see why it wouldn't be enough to charge per computation since this would also lead to larger returns for datasets in higher demand.
  • The Alita repos have been dead for a while and only contain significant commits by one person. I do not think this is a strong reference.
  • Your future plans are requirements rather than plans and lack an appropriate strategy to achieve those.

@gravity-link
Copy link
Copy Markdown
Author

gravity-link commented Mar 31, 2021

Thanks for your honest and professional opinions. I would like to explain following:

Hi @gravity-link. After some consideration, I'm sorry to inform that I will not support this proposal either due to the following reasons:

  • The application lacks significant details and fails to answer important questions related to your approach, such as:
  • How you will handle encryption and computation within the TEE.
DataDEX is a marketplace separate from the computing network. Its main functions are data token exchange and data ownership registration. Therefore, there is no need for TEE to encrypt data, which should rely on other private computing networks such as Phala or Alita.
  • How you will calculate the computing costs of a task.
    Computing costs is calculated with concurrency scaling per minute. For example: Costs=(task1*t1+task2*t2+taskn*tn)*pt
  • How you will integrate this into the pricing if the data can be bought through an AMM.
   We will leverage Task Oracle to submit tasks and use DataToken to pay for the cost of accessing data. Task Oralce has a pallets communicates with other computing networks across the chain through XCMP. I will add rest components design details.  @alxs @semuelle pls refer to the update.

In addition, cost calculation and data price measurement are already common pricing strategies. I added reference about AWS data exchange pricing in the bottom of application.

  • How you will guarantee the legality and correctness of the data (simply claiming this will be solved through DAO is not enough).
 There is a Checker Node in Task Oracle to check Task read permissions. For correctness, we currently use the local pin of ipfs to ensure that the data is not collected locally, and to check the hash wether match with registered in DataGraph.
  • How the registration tool could prevent conflicts with existing data as per Sebastian's comment.
 DataGraph would be a well designed catalog for personal data. Register tool could check for conflict to other data with data properties in meta and the data hash.   @alxs @semuelle pls refer to the update.
  • ...and a long list of further inconsistencies and inaccuracies.
    sorry for the readable experience. I will recheck and be very grateful to let us know if there is any inconsistency.
    See the Ruby Protocol application for a very good example of a proposal for a project with similar aim.

Awsome, it seems be a privacy computing project. But I am not sure why it aims on personal data, and it seems to be suitable for non-personal data too. I hope it would be a compute node of DataDEX. I will follow its progress and would like to learn from its scheme.

  • Your answers to previous comments are generally vague and do not provide concrete answers to the mentioned challenges.

I will improve this.

  • I also think the idea of creating a separate token for each dataset is too convoluted and even unnecessary. I don't see why it wouldn't be enough to charge per computation since this would also lead to larger returns for datasets in higher demand.

DataDEX designed for high quality and high volume of dataset, not only the number of dataset. I have added a reference that Google built one high quality dataset distributed to millions users mobile phones for Next Word model training.So I believe creating a token for a dataset of great value would be make sense.

  • The Alita repos have been dead for a while and only contain significant commits by one person. I do not think this is a strong reference.

There are 3~5 engineers working for Alita project. Some components have not open source but soon.

  • Your future plans are requirements rather than plans and lack an appropriate strategy to achieve those.

Thanks for your advice. I will continuously add specifics in application.

add Task Oracle specifics
@alxs
Copy link
Copy Markdown
Contributor

alxs commented Apr 1, 2021

Thank you for your clarifications @gravity-link. Your first point in specific is valid and I had misunderstood this aspect of your application. However, I don't see how any of your updates or other answers address my concerns and I stand by my decision. Others might be convinced though, let's give the application a bit more time.

add Scenario and external reference URLs
@gravity-link
Copy link
Copy Markdown
Author

Hi @alxs , @semuelle ,
About the issue of creating an unique token for each dataset, I want to share a news. Google Chrome create new tracking method called FLoC, which means that everyone’s browsing history will be added to the same dataset, which is very large volume and valuable. We just would like invite such great dataset makers to operate high-quality and high-value-added data, not dataset flea markets. From this perspective, the token of each dataset is make sense.

@alxs
Copy link
Copy Markdown
Contributor

alxs commented Apr 26, 2021

Closing since there seems to be no interest from other members of the committee either. As previously stated, there are a number of issues you haven't addressed yet and the proposal lacks technical details, neither of which has been improved in your last update.

Feel free to reopen an application when you're further along with your project and are able to address these issues.

@alxs alxs closed this Apr 26, 2021
alxs pushed a commit that referenced this pull request Jul 20, 2021
Open Square Network - BlockChain Based Crowdsourcing and Reputation Platform
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready for review The project is ready to be reviewed by the committee members.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants