Skip to content

kamu-data/kamu-client-python

Repository files navigation

About

Python client library for Kamu.

Start with kamu-cli repo if you are not familiar with the project.

Installing

Install the library:

pip install kamu

Consider installing with extra features:

pip install kamu[jupyter-autoviz,jupyter-sql,spark]

Extras

  • jupyter-autoviz- Jupyter auto-viz for Pandas data frames
  • jupyter-sql - Jupyter %%sql cell magic
  • spark - extra libraries temporarily required to communicate with Spark engine

Using in plain Python scripts

import kamu

con = kamu.connect("grpc+tls://node.demo.kamu.dev:50050")

# Executes query on the node and returns result as Pandas DataFrame
df = con.query(
    """
    select
        event_time, open, close, volume
    from 'kamu/co.alphavantage.tickers.daily.spy'
    where from_symbol = 'spy' and to_symbol = 'usd'
    order by event_time
    """
)

print(df)

By default the connection will use DataFusion engine with Postgres-like SQL dialect.

The client library is based on modern ADBC standard and the underlying connection can be used directly with other libraries supporting ADBC data sources:

import kamu
import pandas

con = kamu.connect("grpc+tls://node.demo.kamu.dev:50050")

df = pandas.read_sql_query(
    "select 1 as x",
    con.as_adbc(),
)

Authentication

You can supply an access token via token parameter:

kamu.connect("grpc+tls://node.demo.kamu.dev:50050", token="<access-token>")

When token is not provided the library will authenticate as anonymous user. If node allows anonymous access the client will get a session token assigned during the handshake procedure and will use it for all subsequent requests.

Using in Jupyter

Load the extension in your notebook:

%load_ext kamu

Create connection:

con = kamu.connect("grpc+tls://node.demo.kamu.dev:50050")

Extension provides a convenience %%sql magic:

%%sql
select
    event_time, open, close, volume
from 'kamu/co.alphavantage.tickers.daily.spy'
where from_symbol = 'spy' and to_symbol = 'usd'
order by event_time

The above is equivalent to:

con.query("...")

To save the query result into a variable use:

%%sql -o df
select * from x

The above is equivalent to:

df = con.query("...")
df

To silence the output add -q:

%%sql -o df -q
select * from x

The kamu extension automatically registers autovizwidget to offer some options to visualize your data frames.

Jupyter extension

Other Notebook Environmnets

This library should work with most Python-based notebook environments.

Here's an example Google Colab Notebook.

Serving data from a local Kamu workspace

If you have kamu-cli you can serve data directly from a local workspace like so:

con = kamu.connect("file:///path/to/workspace")

This will automatically start a kamu sql server sub-process and connect to it using an appropriate protocol.

Use file:// to start the server in the current directory.

Using with Spark

You can specify a different engine when connecting:

con = kamu.connect("http://livy:8888", engine="spark")

Note that currently Spark connectivity relies on Livy HTTP gateway but in future will be unified under ADBC.

You can also provide extra configuration to the connection:

con = kamu.connect(
    "http://livy:8888",
    engine="spark",
    connection_params=dict(
        driver_memory="1000m",
        executor_memory="2000m",
    ),
)