Skip to content

Commit

Permalink
Automate source code formatting with clang-format (#190)
Browse files Browse the repository at this point in the history
* Format source (add .clang-format)

* Remove redundant quotes

* Fix 2json help test

* Fix test-cli tests

* Fix typos in expected output of ext tests

* Format missing header in include directory [skip ci]

* Format missing file in src directory [skip ci]

* Format missing file in examples directory [skip ci]

* Format missing files in app directory

* Fix expected output for ext test-3

* Add CI script and update CI accordingly

* Format enums [skip ci]

* Add checks for clang-format 13

* Format missing enum

* Update clang-format existence and version checks

* Add separate job to set tag on release

* Update tag job

* Add lint changes and clang-format note for contributors [skip ci]

* Minor updates  [skip ci]

* Quiet brew already installed warnings

* Use preinstalled GCC 12 on macOS

* Use clang on macos-14 runner instead of gcc-12

* Use gcc-13 on macOS runners
  • Loading branch information
iamazeem authored Sep 19, 2024
1 parent 58fba28 commit 5fdaddb
Show file tree
Hide file tree
Showing 117 changed files with 5,022 additions and 5,158 deletions.
7 changes: 7 additions & 0 deletions .clang-format
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
BasedOnStyle: Microsoft
BreakBeforeBraces: Attach
ContinuationIndentWidth: 2
IndentWidth: 2
SortIncludes: false
TabWidth: 2
UseTab: Never
54 changes: 38 additions & 16 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,42 @@ on:
types: [published]

jobs:
tag:
name: tag
runs-on: ubuntu-latest

outputs:
TAG: ${{ steps.tag.outputs.TAG || '0.3.6' }}

steps:
- name: Set TAG on release
if: startsWith(github.ref, 'refs/tags/v')
id: tag
shell: bash
run: |
TAG="$GITHUB_REF_NAME"
echo "TAG: $TAG"
if [[ $TAG == "v"* ]]; then
TAG="${TAG:1}"
fi
echo "TAG: $TAG"
echo "TAG=$TAG" >> "$GITHUB_OUTPUT"
format:
name: format
runs-on: ubuntu-latest

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Run clang-format
run: ./scripts/ci-run-clang-format.sh

ci:
name: ci
needs: [format, tag]

strategy:
matrix:
os: [ubuntu-20.04, macos-13, macos-14]
Expand All @@ -23,7 +57,7 @@ jobs:
contents: write

env:
TAG: "0.3.6"
TAG: ${{ needs.tag.outputs.TAG }}
AMD64_LINUX_GCC: amd64-linux-gcc
AMD64_LINUX_CLANG: amd64-linux-clang
AMD64_WINDOWS_MINGW: amd64-windows-mingw
Expand All @@ -34,18 +68,6 @@ jobs:
ARTIFACT_RETENTION_DAYS: 5

steps:
- name: Get tag if tagged/released and set TAG env var
if: startsWith(github.ref, 'refs/tags/v')
shell: bash
run: |
TAG="$GITHUB_REF_NAME"
echo "TAG: $TAG"
if [[ $TAG == "v"* ]]; then
TAG="${TAG:1}"
fi
echo "TAG: $TAG"
echo "TAG=$TAG" >> "$GITHUB_ENV"
- name: Checkout
uses: actions/checkout@v4

Expand All @@ -59,7 +81,7 @@ jobs:
- name: Set up macOS (AMD64 and ARM64)
if: runner.os == 'macOS'
run: |
brew install coreutils tree autoconf automake libtool gcc@11
brew install --quiet coreutils tree autoconf automake libtool
brew uninstall jq
# --- Build ---
Expand Down Expand Up @@ -130,7 +152,7 @@ jobs:
if: matrix.os == 'macos-13'
env:
PREFIX: ${{ env.AMD64_MACOSX_GCC }}
CC: gcc-11
CC: gcc-13
MAKE: make
RUN_TESTS: false
shell: bash
Expand All @@ -142,7 +164,7 @@ jobs:
if: matrix.os == 'macos-14'
env:
PREFIX: ${{ env.ARM64_MACOSX_GCC }}
CC: gcc-11
CC: gcc-13
MAKE: make
RUN_TESTS: false
shell: bash
Expand Down
116 changes: 59 additions & 57 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,32 +73,32 @@ that implements the expected

## Key highlights

* Available as BOTH a library and an application (coming soon: standalone
- Available as BOTH a library and an application (coming soon: standalone
zsvutil library for common helper functions such as csv writer)
* Open-source, permissively licensed
* Handles real-world CSV the same way that spreadsheet programs do (*including
- Open-source, permissively licensed
- Handles real-world CSV the same way that spreadsheet programs do (*including
edge cases*). Gracefully handles (and can "clean") real-world data that may be
"dirty".
* Runs on macOS (tested on clang/gcc), Linux (gcc), Windows (mingw), BSD
- Runs on macOS (tested on clang/gcc), Linux (gcc), Windows (mingw), BSD
(gcc-only) and in-browser (emscripten/wasm)
* Fastest (at least, vs all alternatives and on all platforms we've benchmarked
- Fastest (at least, vs all alternatives and on all platforms we've benchmarked
where 256-bit SIMD operations are available). See
[app/benchmark/README.md](app/benchmark/README.md)
* Low memory usage (regardless of how big your data is) and size footprint for
- Low memory usage (regardless of how big your data is) and size footprint for
both lib (~20k) and CLI executable (< 1MB)
* Handles general delimited data (e.g. pipe-delimited) and fixed-with input
- Handles general delimited data (e.g. pipe-delimited) and fixed-with input
(with specified widths or auto-detected widths)
* Handles multi-row headers
* Handles input from any stream, including caller-defined streams accessed via a
- Handles multi-row headers
- Handles input from any stream, including caller-defined streams accessed via a
single caller-defined `fread`-like function
* Easy to use as a library in a few lines of code, via either pull or push
- Easy to use as a library in a few lines of code, via either pull or push
parsing
* Includes the `zsv` CLI with the following built-in commands:
* `select`, `count`, `sql` query, `desc`ribe, `flatten`, `serialize`, `2json`,
- Includes the `zsv` CLI with the following built-in commands:
- `select`, `count`, `sql` query, `desc`ribe, `flatten`, `serialize`, `2json`,
`2db`, `stack`, `pretty`, `2tsv`, `paste`, `compare`, `jq`, `prop`, `rm`
* easily [convert between CSV/JSON/sqlite3](docs/csv_json_sqlite.md)
* [compare multiple files](docs/compare.md)
* CLI is easy to extend/customize with a few lines of code via modular plug-in
- easily [convert between CSV/JSON/sqlite3](docs/csv_json_sqlite.md)
- [compare multiple files](docs/compare.md)
- CLI is easy to extend/customize with a few lines of code via modular plug-in
framework. Just write a few custom functions and compile into a distributable
DLL that any existing zsv installation can use.

Expand Down Expand Up @@ -184,10 +184,10 @@ npm install zsv-lib

Please note:

* This package is still in alpha and currently only exposes a small subset of
- This package is still in alpha and currently only exposes a small subset of
the zsv library capabilities. More to come!
* The CLI is not yet available as a Node package
* If you'd like to use additional parser features, or use the CLI as a Node
- The CLI is not yet available as a Node package
- If you'd like to use additional parser features, or use the CLI as a Node
package, please feel free to post a request in an issue here.

### From source
Expand All @@ -198,19 +198,19 @@ See [BUILD.md](BUILD.md) for more details.

Our objectives, which we were unable to find in a pre-existing project, are:

* Reasonably high performance
* Runs on any platform, including web assembly
* Available as both a library and a standalone executable / command-line
- Reasonably high performance
- Runs on any platform, including web assembly
- Available as both a library and a standalone executable / command-line
interface utility (CLI)
* Memory-efficient, configurable resource limits
* Handles real-world CSV cases the same way that Excel does, including all edge
- Memory-efficient, configurable resource limits
- Handles real-world CSV cases the same way that Excel does, including all edge
cases (quote handling, newline handling (either `\n` or `\r`), embedded
newlines, abnormal quoting (e.g. aaa"aaa,bbb...)
* Handles other "dirty" data issues:
* Assumes valid UTF8, but does not misbehave if input contains bad UTF8
* Option to specify multi-row headers
* Does not assume or stop working in case of inconsistent numbers of columns
* Easy to use library or extend/customize CLI
- Handles other "dirty" data issues:
- Assumes valid UTF8, but does not misbehave if input contains bad UTF8
- Option to specify multi-row headers
- Does not assume or stop working in case of inconsistent numbers of columns
- Easy to use library or extend/customize CLI

There are several excellent tools that achieve high performance. Among those we
considered were xsv and tsv-utils. While they met our performance objective,
Expand All @@ -234,34 +234,34 @@ needs.

`zsv` comes with several built-in commands:

* `echo`: read CSV from stdin and write it back out to stdout. This is mostly
- `echo`: read CSV from stdin and write it back out to stdout. This is mostly
useful for demonstrating how to use the API and also how to create a plug-in,
and has several uses beyond that including adding/removing BOM, cleaning up
bad UTF8, whitespace or blank column trimming, limiting output to a contiguous
data block, skipping leading garbage, and even proving substitution values
without modifying the underlying source
* `select`: re-shape CSV by skipping leading garbage, combining header rows into
- `select`: re-shape CSV by skipping leading garbage, combining header rows into
a single header, selecting or excluding specified columns, removing duplicate
columns, sampling, converting from fixed-width input, searching and more
* `sql`: treat one or more CSV files like database tables and query with SQL
* `desc`: provide a quick description of your table data
* `pretty`: format for console (fixed-width) display, or convert to markdown
- `sql`: treat one or more CSV files like database tables and query with SQL
- `desc`: provide a quick description of your table data
- `pretty`: format for console (fixed-width) display, or convert to markdown
format
* `2json`: convert CSV to JSON. Optionally, output in
- `2json`: convert CSV to JSON. Optionally, output in
[database schema](docs/db.schema.json)
* `2tsv`: convert to TSV (tab-delimited) format
* `compare`: compare two or more tables of data and output the differences
* `paste` (alpha): horizontally paste two tables together (given inputs X and Y,
- `2tsv`: convert to TSV (tab-delimited) format
- `compare`: compare two or more tables of data and output the differences
- `paste` (alpha): horizontally paste two tables together (given inputs X and Y,
output 1...N rows where each row all columns of X in row N, followed by all
columns of Y in row N)
* `serialize` (inverse of flatten): convert an NxM table to a single 3x (Nx(M-1))
- `serialize` (inverse of flatten): convert an NxM table to a single 3x (Nx(M-1))
table with columns: Row, Column Name, Column Value
* `flatten` (inverse of serialize): flatten a table by combining rows that share
- `flatten` (inverse of serialize): flatten a table by combining rows that share
a common value in a specified identifier column
* `stack`: merge CSV files vertically
* `jq`: run a `jq` filter
* `2db`: [convert from JSON to sqlite3 db](docs/csv_json_sqlite.md)
* `prop`: view or save parsing options associated with a file, such as initial
- `stack`: merge CSV files vertically
- `jq`: run a `jq` filter
- `2db`: [convert from JSON to sqlite3 db](docs/csv_json_sqlite.md)
- `prop`: view or save parsing options associated with a file, such as initial
rows to ignore, or header row span. Saved options are be applied by default
when processing that file.

Expand Down Expand Up @@ -332,10 +332,10 @@ You can extend `zsv` by providing a pre-compiled shared or static library that
defines the functions specified in `extension_template.h` and which `zsv` loads
in one of three ways:
* as a static library that is statically linked at compile time
* as a dynamic library that is linked at compile time and located in any library
- as a static library that is statically linked at compile time
- as a dynamic library that is linked at compile time and located in any library
search path
* as a dynamic library that is located in the same folder as the `zsv`
- as a dynamic library that is located in the same folder as the `zsv`
executable and loaded at runtime if/as/when the custom mode is invoked
#### Example and template
Expand All @@ -354,24 +354,26 @@ helping, please post an issue.
### Possible enhancements and related developments
* online "playground" (soon to be released)
* optimize search; add search with hyperscan or re2 regex matching, possibly
- online "playground" (soon to be released)
- optimize search; add search with hyperscan or re2 regex matching, possibly
parallelize?
* optional OpenMP or other multi-threading for row processing
* auto-generated documentation, and better documentation in general
* Additional benchmarking. Would be great to use
- optional OpenMP or other multi-threading for row processing
- auto-generated documentation, and better documentation in general
- Additional benchmarking. Would be great to use
<https://bitbucket.org/ewanhiggs/csv-game/src/master/> as a springboard to
benchmarking a number of various tasks
* encoding conversion e.g. UTF16 to UTF8
- encoding conversion e.g. UTF16 to UTF8
## Contribute
* [Fork](https://github.com/liquidaty/zsv/fork) the project.
* Check out the latest [`main`](https://github.com/liquidaty/zsv/tree/main)
- [Fork](https://github.com/liquidaty/zsv/fork) the project.
- Check out the latest [`main`](https://github.com/liquidaty/zsv/tree/main)
branch.
* Create a feature or bugfix branch from `main`.
* Commit and push your changes.
* Submit the PR.
- Create a feature or bugfix branch from `main`.
- Update your required changes.
- Make sure to run `clang-format` (version 14 or later) for C source updates.
- Commit and push your changes.
- Submit the PR.
## License
Expand Down
Loading

0 comments on commit 5fdaddb

Please sign in to comment.