StreamXfer is a powerful tool for streaming data from SQL Server to local or object storage(S3) for seamless transfer using UNIX pipe, supporting various general data formats(CSV, TSV, JSON).
Supported OS: Linux, macOS
I've migrated 10TB data from SQL Server into Amazon Redshift using this tool.
Prerequisites
Before installing StreamXfer, you need to install the following dependencies:
- mssql-tools: SQL Docs - bcp Utility
- lzop: Download
- awscli: AWS CLI install and update instructions
Install from PyPI
$ python3 -m pip install streamxfer
Install from Source
$ git clone https://github.com/zhiweio/StreamXfer.git && cd StreamXfer/
$ python3 setup.py install
Install from Docker
$ docker pull zhiweio/streamxfer:latest
StreamXfer can be used as a command-line tool or as a library in Python.
$ stx [OPTIONS] PYMSSQL_URL TABLE OUTPUT_PATH
Here is an example command:
$ stx 'mssql+pymssql:://user:pass@host:port/db' '[dbo].[test]' /local/path/to/dir/
You can also use the following options:
-F, --format
: The data format (CSV, TSV, or JSON).-C, --compress-type
: The compression type (LZOP or GZIP).
For more detailed options, run:
$ stx --help
Usage: stx [OPTIONS] PYMSSQL_URL TABLE OUTPUT_PATH
StreamXfer is a powerful tool for streaming data from SQL Server to local or
object storage(S3) for seamless transfer using UNIX pipe, supporting various
general data formats(CSV, TSV, JSON).
Examples:
stx 'mssql+pymssql:://user:pass@host:port/db' '[dbo].[test]' /local/path/to/dir/
stx 'mssql+pymssql:://user:pass@host:port/db' '[dbo].[test]' s3://bucket/path/to/dir/
Options:
-F, --format [CSV|TSV|JSON] [default: JSON]
-C, --compress-type [LZOP|GZIP]
-h, --help Show this message and exit.
To use StreamXfer in Docker container:
$ docker run --rm -v $(pwd)/data:/tmp/data zhiweio/streamxfer bash -c "stx 'mssql+pymssql:://user:pass@host:port/db' '[dbo].[test]' /tmp/data/dbo_test"
$ docker run --rm zhiweio/streamxfer bash -c "stx 'mssql+pymssql:://user:pass@host:port/db' '[dbo].[test]' s3://bucket/path/to/dir"
To use StreamXfer as a library in Python, you can import the StreamXfer class, and use them to build and pump the data stream.
Here is an example code snippet:
from streamxfer import StreamXfer
from streamxfer.format import Format
from streamxfer.compress import CompressType
sx = StreamXfer(
"mssql+pymssql:://user:pass@host:port/db",
format=Format.CSV,
compress_type=CompressType.LZOP,
chunk_size=1000000,
)
sx.build("[dbo].[test]", path="s3://bucket/path/to/dir/")
sx.pump()
Here are some related articles