Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: Data Exchange Optimization #11

Open
lovelynewlife opened this issue Dec 15, 2023 · 0 comments
Open

[Enhancement]: Data Exchange Optimization #11

lovelynewlife opened this issue Dec 15, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@lovelynewlife
Copy link
Owner

Background

Since the structure and memory layout of intermediate data required by OB vanilla query engine does not match with that required by inference engine e.g. OB datums and numpy ndarray , it's necessary to transform data from database to the inference specified data type. We support OB tuple datums-> numpy data transform as default option for now. However, there are several issues in current implementation which may incur query efficiency degradation:

  • Repetitive target data allocation and construction: target data means the arguments data transferred to Python world. Under the **Prediction Operator Context, ** we are able to get a continuous memory and have its full control but the target data objects still need to allocate and construct for each query iteration.
  • Inefficient string data exchange: string transformation is tricky. Maybe due to the differences in memory layouts or encoding methods. Calling different inference engines may incur redundant string copy. e.g. calling ONNXRuntime using numpy string ndarray have two string transformation processes.

Flame Graph Reference

Using Sklearn

perf_sklearn_ob

Using ONNXRuntime

perf_onnx_compiled_python

Solution for each Issue

  • Construct target data directly on the buffer memory.
  • A unified string intermediate data view for string exchange.
@lovelynewlife lovelynewlife added the enhancement New feature or request label Dec 15, 2023
lovelynewlife pushed a commit that referenced this issue May 29, 2024
update:support load python code from .py file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants