Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
4332d25
feat: comprehensive PostgreSQL compatibility improvements
iPeluwa Jun 26, 2025
de12d47
Streamline integration tests for better maintainability
iPeluwa Jun 26, 2025
df689cb
Fix clippy uninlined_format_args warnings
iPeluwa Jun 26, 2025
362dde2
Fix remaining clippy uninlined_format_args warnings
iPeluwa Jun 27, 2025
07df7d9
Implement major PostgreSQL compatibility improvements
iPeluwa Jun 27, 2025
880ce04
Add comprehensive transaction support + fix clippy warnings
iPeluwa Jun 27, 2025
f09487a
Comprehensive integration test suite enhancement
iPeluwa Jun 27, 2025
c32bccf
feat: Add enterprise-grade security features
iPeluwa Jun 27, 2025
8e28ff3
fix: Update format strings to use inline variable syntax
iPeluwa Jun 27, 2025
f447346
fix: Complete format string modernization for clippy
iPeluwa Jun 27, 2025
1c798cf
fix: Fix remaining format string warnings in CLI crate
iPeluwa Jun 27, 2025
f6c6796
Fix compilation errors in arrow-pg list encoder
iPeluwa Jun 27, 2025
3bbe20b
Fix CI integration test issues
iPeluwa Jun 27, 2025
2c5b550
Remove unused pytest import from test_transactions.py
iPeluwa Jun 28, 2025
ad9eaa6
Merge branch 'master' into master
iPeluwa Jun 28, 2025
afa0b4f
Cargo fmt
iPeluwa Jun 28, 2025
547b86f
fix: Remove unused imports from lib.rs
iPeluwa Jun 28, 2025
4f17d88
Address all maintainer feedback from @sunng87
iPeluwa Jun 29, 2025
3ba6e1e
FULLY address @sunng87 feedback: Remove custom startup handler
iPeluwa Jun 30, 2025
98ae79d
Polish final code quality and README
iPeluwa Jun 30, 2025
5300dbf
fix: Update format strings to use inline variable syntax
iPeluwa Jun 30, 2025
d0ee868
Merge pull request #1 from iPeluwa/enterprise-security-features
iPeluwa Jun 30, 2025
04efdd9
fix: Disable default features for tokio-rustls and use ring provider
iPeluwa Jun 30, 2025
bdfb239
Merge pull request #2 from iPeluwa/enterprise-security-features
iPeluwa Jun 30, 2025
a58a4cd
fix: Replace all console output with proper logging
iPeluwa Jun 30, 2025
c3825cd
Merge branch 'enterprise-security-features'
iPeluwa Jun 30, 2025
4d8ae10
FMT
iPeluwa Jun 30, 2025
40c610c
revert cli
iPeluwa Jun 30, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,9 @@ jobs:
override: true
- run: |
pip install psycopg
- run: ./tests-integration/test.sh
- run: |
cd tests-integration
./test.sh

msrv:
name: MSRV
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@
.envrc
.vscode
.aider*
/test_env
13 changes: 13 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

201 changes: 150 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,29 +2,58 @@

![Crates.io Version](https://img.shields.io/crates/v/datafusion-postgres?label=datafusion-postgres)

Serving any [datafusion](https://datafusion.apache.org) `SessionContext` in
Postgres protocol. Available as a library and a cli tool.

This project is to add a [postgresql compatible access
layer](https://github.com/sunng87/pgwire) to the [Apache
Datafusion](https://github.com/apache/arrow-datafusion) query engine.
A PostgreSQL-compatible server for [Apache DataFusion](https://datafusion.apache.org), supporting authentication, role-based access control, and SSL/TLS encryption. Available as both a library and CLI tool.

Built on [pgwire](https://github.com/sunng87/pgwire) to provide PostgreSQL wire protocol compatibility for analytical workloads.
It was originally an example of the [pgwire](https://github.com/sunng87/pgwire)
project.

## Roadmap
## ✨ Key Features

- πŸ”Œ **Full PostgreSQL Wire Protocol** - Compatible with all PostgreSQL clients and drivers
- πŸ›‘οΈ **Security Features** - Authentication, RBAC, and SSL/TLS encryption
- πŸ—οΈ **Complete System Catalogs** - Real `pg_catalog` tables with accurate metadata
- πŸ“Š **Advanced Data Types** - Comprehensive Arrow ↔ PostgreSQL type mapping
- πŸ”„ **Transaction Support** - ACID transaction lifecycle (BEGIN/COMMIT/ROLLBACK)
- ⚑ **High Performance** - Apache DataFusion's columnar query execution

## 🎯 Features

### Core Functionality
- βœ… Library and CLI tool
- βœ… PostgreSQL wire protocol compatibility
- βœ… Complete `pg_catalog` system tables
- βœ… Arrow ↔ PostgreSQL data type mapping
- βœ… PostgreSQL functions (version, current_schema, has_table_privilege, etc.)

### Security & Authentication
- βœ… User authentication and RBAC
- βœ… Granular permissions (SELECT, INSERT, UPDATE, DELETE, CREATE, DROP)
- βœ… Role inheritance and grant management
- βœ… SSL/TLS encryption
- βœ… Query-level permission checking

### Transaction Support
- βœ… ACID transaction lifecycle
- βœ… BEGIN/COMMIT/ROLLBACK with all variants
- βœ… Failed transaction handling and recovery

### Future Enhancements
- ⏳ Connection pooling optimizations
- ⏳ Advanced authentication (LDAP, certificates)
- ⏳ COPY protocol for bulk data loading

## πŸ” Authentication

This project is in its very early stage, feel free to join the development by
picking up unfinished items.
Supports standard pgwire authentication methods:

- [x] datafusion-postgres as a CLI tool
- [x] datafusion-postgres as a library
- [x] datafusion information schema: require user to enable from session_config
- [ ] datafusion pg catalog: a postgres compatible `pg_catalog`
- [ ] data type mapping between arrow and postgres: in progress
- [ ] additional postgres functions for datafusion
- **Cleartext**: `CleartextStartupHandler` for simple password authentication
- **MD5**: `MD5StartupHandler` for MD5-hashed passwords
- **SCRAM**: `SASLScramAuthStartupHandler` for secure authentication

## Usage
See `auth.rs` for complete implementation examples using `DfAuthSource`.

## πŸš€ Quick Start

### The Library `datafusion-postgres`

Expand All @@ -33,6 +62,7 @@ function which takes a datafusion `SessionContext` and some server configuration
options.

```rust
use std::sync::Arc;
use datafusion::prelude::SessionContext;
use datafusion_postgres::{serve, ServerOptions};

Expand All @@ -41,20 +71,36 @@ let session_context = Arc::new(SessionContext::new());
// Configure your `session_context`
// ...

// Start the Postgres compatible server
serve(session_context, &ServerOptions::new()).await
// Start the Postgres compatible server with SSL/TLS
let server_options = ServerOptions::new()
.with_host("127.0.0.1".to_string())
.with_port(5432)
.with_tls_cert_path(Some("server.crt".to_string()))
.with_tls_key_path(Some("server.key".to_string()));

serve(session_context, &server_options).await
```

### Security Features

```rust
// The server automatically includes:
// - User authentication (default postgres superuser)
// - Role-based access control with predefined roles:
// - readonly: SELECT permissions
// - readwrite: SELECT, INSERT, UPDATE, DELETE permissions
// - dbadmin: Full administrative permissions
// - SSL/TLS encryption when certificates are provided
// - Query-level permission checking
```

### The CLI `datafusion-postgres-cli`

As a command-line application, this tool serves any JSON/CSV/Arrow/Parquet/Avro
files as table, and expose them via Postgres compatible protocol, with which you
can connect using psql or language drivers to execute `SELECT` queries against
them.
Command-line tool to serve JSON/CSV/Arrow/Parquet/Avro files as PostgreSQL-compatible tables.

```
datafusion-postgres 0.4.0
A postgres interface for datafusion. Serve any CSV/JSON/Arrow files as tables.
datafusion-postgres-cli 0.6.1
A PostgreSQL interface for DataFusion. Serve CSV/JSON/Arrow/Parquet files as tables.

USAGE:
datafusion-postgres-cli [OPTIONS]
Expand All @@ -68,44 +114,97 @@ OPTIONS:
--avro <avro-tables>... Avro files to register as table, using syntax `table_name:file_path`
--csv <csv-tables>... CSV files to register as table, using syntax `table_name:file_path`
-d, --dir <directory> Directory to serve, all supported files will be registered as tables
--host <host> Host address the server listens to, default to 127.0.0.1 [default: 127.0.0.1]
--host <host> Host address the server listens to [default: 127.0.0.1]
--json <json-tables>... JSON files to register as table, using syntax `table_name:file_path`
--parquet <parquet-tables>... Parquet files to register as table, using syntax `table_name:file_path`
-p <port> Port the server listens to, default to 5432 [default: 5432]
-p <port> Port the server listens to [default: 5432]
--tls-cert <tls-cert> Path to TLS certificate file for SSL/TLS encryption
--tls-key <tls-key> Path to TLS private key file for SSL/TLS encryption
```

For example, we use this command to host `ETTm1.csv` dataset as table `ettm1`.
#### πŸ”’ Security Options

```
datafusion-postgres -c ettm1:ETTm1.csv
Loaded ETTm1.csv as table ettm1
Listening to 127.0.0.1:5432
```bash
# Run with SSL/TLS encryption
datafusion-postgres-cli \
--csv data:sample.csv \
--tls-cert server.crt \
--tls-key server.key

# Run without encryption (development only)
datafusion-postgres-cli --csv data:sample.csv
```

Then connect to it via `psql`:
## πŸ“‹ Example Usage

### Basic Example

Host a CSV dataset as a PostgreSQL-compatible table:

```bash
datafusion-postgres-cli --csv climate:delhiclimate.csv
```

```
Loaded delhiclimate.csv as table climate
TLS not configured. Running without encryption.
Listening on 127.0.0.1:5432 (unencrypted)
```

### Connect with psql

> **πŸ” Authentication**: The default setup allows connections without authentication for development. For secure deployments, use `DfAuthSource` with standard pgwire authentication handlers (cleartext, MD5, or SCRAM). See `auth.rs` for implementation examples.

```bash
psql -h 127.0.0.1 -p 5432 -U postgres
psql (16.2, server 0.20.0)
WARNING: psql major version 16, server major version 0.20.
Some psql features might not work.
Type "help" for help.

postgres=> select * from ettm1 limit 10;
date | HUFL | HULL | MUFL | MULL | LUFL | LULL | OT
----------------------------+--------------------+--------------------+--------------------+---------------------+-------------------+--------------------+--------------------
2016-07-01 00:00:00.000000 | 5.827000141143799 | 2.009000062942505 | 1.5989999771118164 | 0.4620000123977661 | 4.203000068664552 | 1.3400000333786009 | 30.5310001373291
2016-07-01 00:15:00.000000 | 5.760000228881836 | 2.075999975204468 | 1.4919999837875366 | 0.4259999990463257 | 4.263999938964844 | 1.4010000228881836 | 30.459999084472656
2016-07-01 00:30:00.000000 | 5.760000228881836 | 1.9420000314712524 | 1.4919999837875366 | 0.3910000026226044 | 4.234000205993652 | 1.309999942779541 | 30.038000106811523
2016-07-01 00:45:00.000000 | 5.760000228881836 | 1.9420000314712524 | 1.4919999837875366 | 0.4259999990463257 | 4.234000205993652 | 1.309999942779541 | 27.01300048828125
2016-07-01 01:00:00.000000 | 5.692999839782715 | 2.075999975204468 | 1.4919999837875366 | 0.4259999990463257 | 4.142000198364259 | 1.371000051498413 | 27.78700065612793
2016-07-01 01:15:00.000000 | 5.492000102996826 | 1.9420000314712524 | 1.4570000171661377 | 0.3910000026226044 | 4.111999988555908 | 1.2790000438690186 | 27.716999053955078
2016-07-01 01:30:00.000000 | 5.357999801635742 | 1.875 | 1.350000023841858 | 0.35499998927116394 | 3.928999900817871 | 1.3400000333786009 | 27.645999908447266
2016-07-01 01:45:00.000000 | 5.1570000648498535 | 1.8079999685287482 | 1.350000023841858 | 0.3199999928474426 | 3.806999921798706 | 1.2790000438690186 | 27.083999633789066
2016-07-01 02:00:00.000000 | 5.1570000648498535 | 1.741000056266785 | 1.2790000438690186 | 0.35499998927116394 | 3.776999950408936 | 1.218000054359436 | 27.78700065612793
2016-07-01 02:15:00.000000 | 5.1570000648498535 | 1.8079999685287482 | 1.350000023841858 | 0.4259999990463257 | 3.776999950408936 | 1.187999963760376 | 27.506000518798828
(10 rows)
```

```sql
postgres=> SELECT COUNT(*) FROM climate;
count
-------
1462
(1 row)

postgres=> SELECT date, meantemp FROM climate WHERE meantemp > 35 LIMIT 5;
date | meantemp
------------+----------
2017-05-15 | 36.9
2017-05-16 | 37.9
2017-05-17 | 38.6
2017-05-18 | 37.4
2017-05-19 | 35.4
(5 rows)

postgres=> BEGIN;
BEGIN
postgres=> SELECT AVG(meantemp) FROM climate;
avg
------------------
25.4955206557617
(1 row)
postgres=> COMMIT;
COMMIT
```

### πŸ” Production Setup with SSL/TLS

```bash
# Generate SSL certificates
openssl req -x509 -newkey rsa:4096 -keyout server.key -out server.crt \
-days 365 -nodes -subj "/C=US/ST=CA/L=SF/O=MyOrg/CN=localhost"

# Start secure server
datafusion-postgres-cli \
--csv climate:delhiclimate.csv \
--tls-cert server.crt \
--tls-key server.key
```

```
Loaded delhiclimate.csv as table climate
TLS enabled using cert: server.crt and key: server.key
Listening on 127.0.0.1:5432 with TLS encryption
```

## License
Expand Down
Loading
Loading