Skip to content

Commit

Permalink
feat: High performance pandas integration. (#24)
Browse files Browse the repository at this point in the history
  • Loading branch information
amunra authored Jan 4, 2023
1 parent d9fa199 commit ec28b97
Show file tree
Hide file tree
Showing 46 changed files with 11,360 additions and 399 deletions.
12 changes: 11 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,12 +1,22 @@
src/questdb/ingress.html
src/questdb/ingress.c
src/questdb/*.html
rustup-init.exe

# Linux Perf profiles
perf.data*
perf/*.svg

# Atheris Crash/OOM and other files
fuzz-artifact/

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# Parquet files generated as part of example runs
*.parquet

# C extensions
*.so

Expand Down
6 changes: 5 additions & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
{
"esbonio.sphinx.confDir": ""
"esbonio.sphinx.confDir": "",
"cmake.configureOnOpen": false,
"files.associations": {
"ingress_helper.h": "c"
}
}
60 changes: 57 additions & 3 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,47 @@
Changelog
=========

1.1.0 (2023-01-04)
------------------

Features
~~~~~~~~

* High-performance ingestion of `Pandas <https://pandas.pydata.org/>`_
dataframes into QuestDB via ILP.
We now support most Pandas column types. The logic is implemented in native
code and is orders of magnitude faster than iterating the dataframe
in Python and calling the ``Buffer.row()`` or ``Sender.row()`` methods: The
``Buffer`` can be written from Pandas at hundreds of MiB/s per CPU core.
The new ``dataframe()`` method continues working with the ``auto_flush``
feature.
See API documentation and examples for the new ``dataframe()`` method
available on both the ``Sender`` and ``Buffer`` classes.

* New ``TimestampNanos.now()`` and ``TimestampMicros.now()`` methods.
*These are the new recommended way of getting the current timestamp.*

* The Python GIL is now released during calls to ``Sender.flush()`` and when
``auto_flush`` is triggered. This should improve throughput when using the
``Sender`` from multiple threads.

Errata
~~~~~~

* In previous releases the documentation for the ``from_datetime()`` methods of
the ``TimestampNanos`` and ``TimestampMicros`` types recommended calling
``datetime.datetime.utcnow()`` to get the current timestamp. This is incorrect
as it will (confusinly) return object with the local timezone instead of UTC.
This documentation has been corrected and now recommends calling
``datetime.datetime.now(tz=datetime.timezone.utc)`` or (more efficiently) the
new ``TimestampNanos.now()`` and ``TimestampMicros.now()`` methods.

1.0.2 (2022-10-31)
------------------

Features
~~~~~~~~

* Support for Python 3.11.
* Updated to version 2.1.1 of the ``c-questdb-client`` library:

Expand All @@ -14,20 +52,30 @@ Changelog
1.0.1 (2022-08-16)
------------------

Features
~~~~~~~~

* As a matter of convenience, the ``Buffer.row`` method can now take ``None`` column
values. This has the same semantics as skipping the column altogether.
Closes `#3 <https://github.com/questdb/py-questdb-client/issues/3>`_.

Bugfixes
~~~~~~~~

* Fixed a major bug where Python ``int`` and ``float`` types were handled with
32-bit instead of 64-bit precision. This caused certain ``int`` values to be
rejected and other ``float`` values to be rounded incorrectly.
Closes `#13 <https://github.com/questdb/py-questdb-client/issues/13>`_.
* As a matter of convenience, the ``Buffer.row`` method can now take ``None`` column
values. This has the same semantics as skipping the column altogether.
Closes `#3 <https://github.com/questdb/py-questdb-client/issues/3>`_.
* Fixed a minor bug where an error auto-flush caused a second clean-up error.
Closes `#4 <https://github.com/questdb/py-questdb-client/issues/4>`_.


1.0.0 (2022-07-15)
------------------

Features
~~~~~~~~

* First stable release.
* Insert data into QuestDB via ILP.
* Sender and Buffer APIs.
Expand All @@ -38,6 +86,9 @@ Changelog
0.0.3 (2022-07-14)
------------------

Features
~~~~~~~~

* Initial set of features to connect to the database.
* ``Buffer`` and ``Sender`` classes.
* First release where ``pip install questdb`` should work.
Expand All @@ -46,4 +97,7 @@ Changelog
0.0.1 (2022-07-08)
------------------

Features
~~~~~~~~

* First release on PyPI.
16 changes: 16 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,22 @@ The latest version of the library is 1.0.2.
columns={'temperature': 20.0, 'humidity': 0.5})
sender.flush()
You can also send Pandas dataframes:

.. code-block:: python
import pandas as pd
from questdb.ingress import Sender
df = pd.DataFrame({
'id': pd.Categorical(['toronto1', 'paris3']),
'temperature': [20.0, 21.0],
'humidity': [0.5, 0.6],
'timestamp': pd.to_datetime(['2021-01-01', '2021-01-02'])'})
with Sender('localhost', 9009) as sender:
sender.dataframe(df, table_name='sensors')
Docs
====
Expand Down
12 changes: 0 additions & 12 deletions TODO.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,6 @@ TODO
Build Tooling
=============

* **[HIGH]** Transition to Azure, move Linux arm to ARM pipeline without QEMU.

* **[MEDIUM]** Automate Apple Silicon as part of CI.

* **[LOW]** Release to PyPI from CI.
Expand All @@ -19,13 +17,3 @@ Docs
* **[MEDIUM]** Examples should be tested as part of the unit tests (as they
are in the C client). This is to ensure they don't "bit rot" as the code
changes.

* **[MEDIUM]** Document on a per-version basis.

Development
===========

* **[HIGH]** Implement ``tabular()`` API in the buffer.

* **[MEDIUM]** Implement ``pandas()`` API in the buffer.
*This can probably wait for a future release.*
16 changes: 8 additions & 8 deletions ci/cibuildwheel.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ stages:
- bash: |
set -o errexit
python3 -m pip install --upgrade pip
pip3 install cibuildwheel==2.11.1
python3 -m pip install cibuildwheel==2.11.2
displayName: Install dependencies
- bash: cibuildwheel --output-dir wheelhouse .
displayName: Build wheels
Expand All @@ -83,7 +83,7 @@ stages:
- bash: |
set -o errexit
python3 -m pip install --upgrade pip
pip3 install cibuildwheel==2.11.1
python3 -m pip install cibuildwheel==2.11.2
displayName: Install dependencies
- bash: cibuildwheel --output-dir wheelhouse .
displayName: Build wheels
Expand All @@ -100,7 +100,7 @@ stages:
- bash: |
set -o errexit
python3 -m pip install --upgrade pip
pip3 install cibuildwheel==2.11.1
python3 -m pip install cibuildwheel==2.11.2
displayName: Install dependencies
- bash: cibuildwheel --output-dir wheelhouse .
displayName: Build wheels
Expand All @@ -117,7 +117,7 @@ stages:
- bash: |
set -o errexit
python3 -m pip install --upgrade pip
pip3 install cibuildwheel==2.11.1
python3 -m pip install cibuildwheel==2.11.2
displayName: Install dependencies
- bash: cibuildwheel --output-dir wheelhouse .
displayName: Build wheels
Expand All @@ -134,7 +134,7 @@ stages:
- bash: |
set -o errexit
python3 -m pip install --upgrade pip
pip3 install cibuildwheel==2.11.1
python3 -m pip install cibuildwheel==2.11.2
displayName: Install dependencies
- bash: cibuildwheel --output-dir wheelhouse .
displayName: Build wheels
Expand All @@ -151,7 +151,7 @@ stages:
- bash: |
set -o errexit
python3 -m pip install --upgrade pip
python3 -m pip install cibuildwheel==2.11.1
python3 -m pip install cibuildwheel==2.11.2
displayName: Install dependencies
- bash: cibuildwheel --output-dir wheelhouse .
displayName: Build wheels
Expand All @@ -165,8 +165,8 @@ stages:
- task: UsePythonVersion@0
- bash: |
set -o errexit
python -m pip install --upgrade pip
pip install cibuildwheel==2.11.1
python3 -m pip install --upgrade pip
python3 -m pip install cibuildwheel==2.11.2
displayName: Install dependencies
- bash: cibuildwheel --output-dir wheelhouse .
displayName: Build wheels
Expand Down
74 changes: 74 additions & 0 deletions ci/pip_install_deps.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
import sys
import subprocess
import shlex
import textwrap
import platform


class UnsupportedDependency(Exception):
pass


def pip_install(package):
args = [
sys.executable,
'-m', 'pip', 'install',
'--upgrade',
'--only-binary', ':all:',
package]
args_s = ' '.join(shlex.quote(arg) for arg in args)
sys.stderr.write(args_s + '\n')
res = subprocess.run(
args,
stderr=subprocess.STDOUT,
stdout=subprocess.PIPE)
if res.returncode == 0:
return
output = res.stdout.decode('utf-8')
if 'Could not find a version that satisfies the requirement' in output:
raise UnsupportedDependency(output)
else:
sys.stderr.write(output + '\n')
sys.exit(res.returncode)


def try_pip_install(package):
try:
pip_install(package)
except UnsupportedDependency as e:
msg = textwrap.indent(str(e), ' ' * 8)
sys.stderr.write(f' Ignored unsatisfiable dependency:\n{msg}\n')


def ensure_timezone():
try:
import zoneinfo
if platform.system() == 'Windows':
pip_install('tzdata') # for zoneinfo
except ImportError:
pip_install('pytz')


def main():
ensure_timezone()
try_pip_install('fastparquet>=2022.12.0')
try_pip_install('pandas')
try_pip_install('numpy')
try_pip_install('pyarrow')

on_linux_is_glibc = (
(not platform.system() == 'Linux') or
(platform.libc_ver()[0] == 'glibc'))
is_64bits = sys.maxsize > 2**32
is_cpython = platform.python_implementation() == 'CPython'
if on_linux_is_glibc and is_64bits and is_cpython:
# Ensure that we've managed to install the expected dependencies.
import pandas
import numpy
import pyarrow
if sys.version_info >= (3, 8):
import fastparquet


if __name__ == "__main__":
main()
4 changes: 3 additions & 1 deletion ci/run_tests_pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,9 @@ stages:
submodules: true
- task: UsePythonVersion@0
- script: python3 --version
- script: python3 -m pip install cython
- script: |
python3 -m pip install cython
python3 ci/pip_install_deps.py
displayName: Installing Python dependencies
- script: python3 proj.py build
displayName: "Build"
Expand Down
6 changes: 5 additions & 1 deletion dev_requirements.txt
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
setuptools>=45.2.0
Cython>=0.29.32
wheel>=0.34.2
cibuildwheel>=2.11.1
cibuildwheel>=2.11.2
Sphinx>=5.0.2
sphinx-rtd-theme>=1.0.0
twine>=4.0.1
bump2version>=1.0.1
pandas>=1.3.5
numpy>=1.21.6
pyarrow>=10.0.1
fastparquet>=2022.12.0
Loading

0 comments on commit ec28b97

Please sign in to comment.