Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR - Add Delta Lake Backend for Cryptofeed #1054

Open
wants to merge 90 commits into
base: master
Choose a base branch
from

Conversation

tommy-ca
Copy link

@tommy-ca tommy-ca commented Sep 9, 2024

Add Delta Lake Backend for Cryptofeed

Description

This PR introduces a comprehensive Delta Lake backend for Cryptofeed. It adds the DeltaLakeCallback class and several data-specific subclasses to handle various types of cryptocurrency data. The new backend allows for efficient storage and retrieval of high-frequency trading data using Delta Lake technology.

Key Features and Improvements

  1. Implemented DeltaLakeCallback base class with configurable options for Delta Lake integration.
  2. Added support for batch processing and flushing of data.
  3. Implemented data transformation and validation methods to ensure data integrity.
  4. Added retry mechanism for write operations to improve reliability.
  5. Implemented table optimization and time travel features.
  6. Created specific Delta Lake callback classes for different data types (e.g., trades, funding, ticker, open interest, etc.).
  7. Added logging throughout the code for better debugging and monitoring.

Checklist

  • Tested
  • Changelog updated
  • Tests run and pass
  • Flake8 run and all errors/warnings resolved
  • Contributors file updated (optional)

Additional Notes

This PR significantly enhances Cryptofeed's capabilities by adding a robust Delta Lake backend. It allows users to store and analyze cryptocurrency market data using Delta Lake's features such as ACID transactions, schema evolution, and time travel.

The implementation includes proper error handling, logging, and configuration options to make it adaptable to various use cases. Each data type (trades, funding, ticker, etc.) has its own specialized class to handle specific schema requirements.

Before merging, we should ensure comprehensive testing of all Delta Lake operations, update the changelog to reflect this major feature addition, and run Flake8 to catch any remaining style issues.

Affected Components

  • cryptofeed/backends/deltalake.py

tommy-ca and others added 30 commits August 31, 2024 23:37
- Add DeltaLakeCallback class with support for various data types
- Implement partitioning, Z-ordering, and time travel features
- Add schema documentation for each data type
- Include Delta Lake dependencies in setup.py
- Create demo file for Delta Lake usage with S3 configuration
- Update extras_require in setup.py to include deltalake option
@tommy-ca
Copy link
Author

tommy-ca commented Sep 9, 2024

Hi @bmoscon,

Thank you very much for the project. I built my backend for delta lake and I would like to ask for a review for PR.

The code is following postgres and kafka backends with support for queued updates.

I have tested with cryptostore project with this backend in test env.

Thank you.

Br,

@bmoscon
Copy link
Owner

bmoscon commented Sep 22, 2024

@tommy-ca thanks for the PR, let me review and give it a try and I'll merge it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants