Skip to content

Introduce eth_getBlockReceiptsByHash and eth_getBlockReceiptsByNumber JSON RPC methods#1300

Closed
jakublipinski wants to merge 2 commits into
ethereum:masterfrom
jakublipinski:eip-introduce-eth_getblockreceiptsbyhash-and-eth_getblockreceiptsbynumber-methods
Closed

Introduce eth_getBlockReceiptsByHash and eth_getBlockReceiptsByNumber JSON RPC methods#1300
jakublipinski wants to merge 2 commits into
ethereum:masterfrom
jakublipinski:eip-introduce-eth_getblockreceiptsbyhash-and-eth_getblockreceiptsbynumber-methods

Conversation

@jakublipinski
Copy link
Copy Markdown

This proposal provides the clients with an ability to easily get the receipts for all the transactions from a particular block.

Introduce eth_getBlockReceiptsByHash and eth_getBlockReceiptsByNumber JSON RPC methods - first draft
@MicahZoltu
Copy link
Copy Markdown

You can fetch multiple receipts in a single call by batching requests (which I believe both Geth and Parity support). There still may be value in this due to the way clients process batch requests, though it feels like they could just optimize for batched requests of this nature?

@jakublipinski
Copy link
Copy Markdown
Author

As per @MicahZoltu answer and my analysis, it seems that batching is the way to go. Closing the PR.

@jakublipinski jakublipinski deleted the eip-introduce-eth_getblockreceiptsbyhash-and-eth_getblockreceiptsbynumber-methods branch August 11, 2018 14:55
@medvedev1088
Copy link
Copy Markdown

medvedev1088 commented Aug 19, 2018

Linking these related pull requests and issues:
openethereum/parity-ethereum#9075
ethereum/go-ethereum#17044
openethereum/parity-ethereum#9126

Even with JSON RPC batching, retrieving receipts is very slow. I guess the issue is that the raw block needs to be read from disk (not sure caching helps here) and parsed for every receipt in the batch. This adds significant overhead.

In my tests, exporting 6k blocks with transactions takes 2 minutes. Exporting all the receipts from those blocks in batches of 100 takes 20 minutes. The amount of data is only 2 times bigger, that is receipts are 500MB, blocks+transactions are 270MB.

To be 100% sure I'd need to do some benchmarking using the code from this PR openethereum/parity-ethereum#9126

@MicahZoltu
Copy link
Copy Markdown

My comment wasn't intended to suggest this isn't a good idea, just to provide a temporary workaround. :) I recently started doing some work that requires this functionality and fetching all of the receipts individually is painfully slow. 😢

@jakublipinski
Copy link
Copy Markdown
Author

@MicahZoltu you may like my comparison. I'm happy to exchange experiences and ideas. I suffer from the same issue.

@tjayrush
Copy link
Copy Markdown

@jakublipinski and @MicahZoltu For my (and other's) education, why exactly did you want to pull those receipts? What piece of data were you looking for exactly? I'm curious about the comment above about caching. With QuickBlocks, I've found significant performance speed-ups with caching depending on what exactly I'm looking for and how I cache the data.

@jakublipinski
Copy link
Copy Markdown
Author

@tjayrush For each Ethereum address I want to know how many particular ERC20 tokens it owns. In order to do that, I scan all the receipts for each block and look for the Transfer event there.

@medvedev1088
Copy link
Copy Markdown

medvedev1088 commented Aug 22, 2018

@jakublipinski have you tried eth_getfilterlogs? It's really fast (comparing to scraping receipts). You can try this simple script that relies on eth_getFilterLogs to export all ERC20 transfers: https://github.com/medvedev1088/ethereum-etl#export_token_transferspy

@medvedev1088
Copy link
Copy Markdown

@tjayrush another piece of data I found useful in receipts is gasUsed (useful for gas analysis) and contractAddress (useful for extracting most of the contracts) fields.

@jakublipinski
Copy link
Copy Markdown
Author

@medvedev1088 You made my day! I've just sped up my script 5x by using eth_getFilterLogs instead of async `get_TransactionReceipt. Thank you for your suggestion!

Do you have a suggestion on how to speed up querying the ERC20 tokens for the balance of a particular address? I currently asynchronously execute eth_call with balanceOf(address) as a parameter.

@medvedev1088
Copy link
Copy Markdown

One way of doing it is to dump the transfers to an SQL database then run a query to get all the balances. An example query is in this article https://medium.com/@medvedev1088/ethereum-blockchain-on-google-bigquery-283fb300f579 This approach works only for tokens that emit the Transfer event on everything that changes balance including token minting and burning. Can't think of anything else other than what you're doing.

@MicahZoltu
Copy link
Copy Markdown

I'm pulling all receipts for all blocks and putting them in a data store that is easier to do adhoc queries against.

Also, when doing stream processing of the blockchain on things like gas usage, transactions failure inspection, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants