Skip to content

StreamXfer is a powerful tool for streaming data from SQL Server to object storage for seamless transfer using UNIX pipe, supporting various general data formats(CSV, TSV, JSON).

License

Notifications You must be signed in to change notification settings

zhiweio/StreamXfer

Repository files navigation

StreamXfer

StreamXfer is a powerful tool for streaming data from SQL Server to local or object storage(S3) for seamless transfer using UNIX pipe, supporting various general data formats(CSV, TSV, JSON).

Supported OS: Linux, macOS

I've migrated 10TB data from SQL Server into Amazon Redshift using this tool.

Demo

asciicast

Installation

Prerequisites

Before installing StreamXfer, you need to install the following dependencies:

Install from PyPI

$ python3 -m pip install streamxfer

Install from Source

$ git clone https://github.com/zhiweio/StreamXfer.git && cd StreamXfer/
$ python3 setup.py install

Install from Docker

$ docker pull zhiweio/streamxfer:latest

Usage

StreamXfer can be used as a command-line tool or as a library in Python.

Command-line Usage

$ stx [OPTIONS] PYMSSQL_URL TABLE OUTPUT_PATH

Here is an example command:

$ stx 'mssql+pymssql:://user:pass@host:port/db' '[dbo].[test]' /local/path/to/dir/

You can also use the following options:

  • -F, --format: The data format (CSV, TSV, or JSON).
  • -C, --compress-type: The compression type (LZOP or GZIP).

For more detailed options, run:

$ stx --help
Usage: stx [OPTIONS] PYMSSQL_URL TABLE OUTPUT_PATH

  StreamXfer is a powerful tool for streaming data from SQL Server to local or
  object storage(S3) for seamless transfer using UNIX pipe, supporting various
  general data formats(CSV, TSV, JSON).

  Examples:
      stx 'mssql+pymssql:://user:pass@host:port/db' '[dbo].[test]' /local/path/to/dir/
      stx 'mssql+pymssql:://user:pass@host:port/db' '[dbo].[test]' s3://bucket/path/to/dir/

Options:
  -F, --format [CSV|TSV|JSON]     [default: JSON]
  -C, --compress-type [LZOP|GZIP]
  -h, --help                      Show this message and exit.

Docker Usage

To use StreamXfer in Docker container:

$ docker run --rm -v $(pwd)/data:/tmp/data zhiweio/streamxfer bash -c "stx 'mssql+pymssql:://user:pass@host:port/db' '[dbo].[test]' /tmp/data/dbo_test"
$ docker run --rm zhiweio/streamxfer bash -c "stx 'mssql+pymssql:://user:pass@host:port/db' '[dbo].[test]' s3://bucket/path/to/dir"

Library Usage

To use StreamXfer as a library in Python, you can import the StreamXfer class, and use them to build and pump the data stream.

Here is an example code snippet:

from streamxfer import StreamXfer
from streamxfer.format import Format
from streamxfer.compress import CompressType

sx = StreamXfer(
    "mssql+pymssql:://user:pass@host:port/db",
    format=Format.CSV,
    compress_type=CompressType.LZOP,
    chunk_size=1000000,
)
sx.build("[dbo].[test]", path="s3://bucket/path/to/dir/")
sx.pump()

Related

Here are some related articles

Authors

License

GPL-3.0

About

StreamXfer is a powerful tool for streaming data from SQL Server to object storage for seamless transfer using UNIX pipe, supporting various general data formats(CSV, TSV, JSON).

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published