Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate source code formatting with clang-format #190

Merged
merged 23 commits into from
Sep 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
fdbe9d8
Format source (add .clang-format)
iamazeem Aug 23, 2024
d2e4d5a
Remove redundant quotes
iamazeem Aug 23, 2024
866d0ef
Fix 2json help test
iamazeem Aug 23, 2024
fce7386
Fix test-cli tests
iamazeem Aug 23, 2024
7d0649d
Fix typos in expected output of ext tests
iamazeem Aug 23, 2024
3ee16de
Format missing header in include directory [skip ci]
iamazeem Aug 23, 2024
982dd71
Format missing file in src directory [skip ci]
iamazeem Aug 23, 2024
68c7486
Format missing file in examples directory [skip ci]
iamazeem Aug 23, 2024
2bb3dcc
Format missing files in app directory
iamazeem Aug 23, 2024
d107feb
Fix expected output for ext test-3
iamazeem Aug 23, 2024
21e40a1
Add CI script and update CI accordingly
iamazeem Aug 24, 2024
bce82fa
Format enums [skip ci]
iamazeem Aug 24, 2024
ad24f36
Add checks for clang-format 13
iamazeem Aug 24, 2024
2edca69
Format missing enum
iamazeem Aug 24, 2024
aecfe0b
Update clang-format existence and version checks
iamazeem Aug 24, 2024
5e45e0f
Add separate job to set tag on release
iamazeem Aug 24, 2024
d442ed7
Update tag job
iamazeem Aug 24, 2024
2d66198
Add lint changes and clang-format note for contributors [skip ci]
iamazeem Aug 24, 2024
43f17d7
Minor updates [skip ci]
iamazeem Aug 29, 2024
b42f047
Quiet brew already installed warnings
iamazeem Sep 5, 2024
9510f0e
Use preinstalled GCC 12 on macOS
iamazeem Sep 5, 2024
5420e8c
Use clang on macos-14 runner instead of gcc-12
iamazeem Sep 5, 2024
207d16c
Use gcc-13 on macOS runners
iamazeem Sep 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .clang-format
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
BasedOnStyle: Microsoft
BreakBeforeBraces: Attach
ContinuationIndentWidth: 2
IndentWidth: 2
SortIncludes: false
TabWidth: 2
UseTab: Never
54 changes: 38 additions & 16 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,42 @@ on:
types: [published]

jobs:
tag:
name: tag
runs-on: ubuntu-latest

outputs:
TAG: ${{ steps.tag.outputs.TAG || '0.3.6' }}

steps:
- name: Set TAG on release
if: startsWith(github.ref, 'refs/tags/v')
id: tag
shell: bash
run: |
TAG="$GITHUB_REF_NAME"
echo "TAG: $TAG"
if [[ $TAG == "v"* ]]; then
TAG="${TAG:1}"
fi
echo "TAG: $TAG"
echo "TAG=$TAG" >> "$GITHUB_OUTPUT"

format:
name: format
runs-on: ubuntu-latest

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Run clang-format
run: ./scripts/ci-run-clang-format.sh

ci:
name: ci
needs: [format, tag]

strategy:
matrix:
os: [ubuntu-20.04, macos-13, macos-14]
Expand All @@ -23,7 +57,7 @@ jobs:
contents: write

env:
TAG: "0.3.6"
TAG: ${{ needs.tag.outputs.TAG }}
AMD64_LINUX_GCC: amd64-linux-gcc
AMD64_LINUX_CLANG: amd64-linux-clang
AMD64_WINDOWS_MINGW: amd64-windows-mingw
Expand All @@ -34,18 +68,6 @@ jobs:
ARTIFACT_RETENTION_DAYS: 5

steps:
- name: Get tag if tagged/released and set TAG env var
if: startsWith(github.ref, 'refs/tags/v')
shell: bash
run: |
TAG="$GITHUB_REF_NAME"
echo "TAG: $TAG"
if [[ $TAG == "v"* ]]; then
TAG="${TAG:1}"
fi
echo "TAG: $TAG"
echo "TAG=$TAG" >> "$GITHUB_ENV"

- name: Checkout
uses: actions/checkout@v4

Expand All @@ -59,7 +81,7 @@ jobs:
- name: Set up macOS (AMD64 and ARM64)
if: runner.os == 'macOS'
run: |
brew install coreutils tree autoconf automake libtool gcc@11
brew install --quiet coreutils tree autoconf automake libtool
brew uninstall jq

# --- Build ---
Expand Down Expand Up @@ -130,7 +152,7 @@ jobs:
if: matrix.os == 'macos-13'
env:
PREFIX: ${{ env.AMD64_MACOSX_GCC }}
CC: gcc-11
CC: gcc-13
MAKE: make
RUN_TESTS: false
shell: bash
Expand All @@ -142,7 +164,7 @@ jobs:
if: matrix.os == 'macos-14'
env:
PREFIX: ${{ env.ARM64_MACOSX_GCC }}
CC: gcc-11
CC: gcc-13
MAKE: make
RUN_TESTS: false
shell: bash
Expand Down
116 changes: 59 additions & 57 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,32 +73,32 @@ that implements the expected

## Key highlights

* Available as BOTH a library and an application (coming soon: standalone
- Available as BOTH a library and an application (coming soon: standalone
zsvutil library for common helper functions such as csv writer)
* Open-source, permissively licensed
* Handles real-world CSV the same way that spreadsheet programs do (*including
- Open-source, permissively licensed
- Handles real-world CSV the same way that spreadsheet programs do (*including
edge cases*). Gracefully handles (and can "clean") real-world data that may be
"dirty".
* Runs on macOS (tested on clang/gcc), Linux (gcc), Windows (mingw), BSD
- Runs on macOS (tested on clang/gcc), Linux (gcc), Windows (mingw), BSD
(gcc-only) and in-browser (emscripten/wasm)
* Fastest (at least, vs all alternatives and on all platforms we've benchmarked
- Fastest (at least, vs all alternatives and on all platforms we've benchmarked
where 256-bit SIMD operations are available). See
[app/benchmark/README.md](app/benchmark/README.md)
* Low memory usage (regardless of how big your data is) and size footprint for
- Low memory usage (regardless of how big your data is) and size footprint for
both lib (~20k) and CLI executable (< 1MB)
* Handles general delimited data (e.g. pipe-delimited) and fixed-with input
- Handles general delimited data (e.g. pipe-delimited) and fixed-with input
(with specified widths or auto-detected widths)
* Handles multi-row headers
* Handles input from any stream, including caller-defined streams accessed via a
- Handles multi-row headers
- Handles input from any stream, including caller-defined streams accessed via a
single caller-defined `fread`-like function
* Easy to use as a library in a few lines of code, via either pull or push
- Easy to use as a library in a few lines of code, via either pull or push
parsing
* Includes the `zsv` CLI with the following built-in commands:
* `select`, `count`, `sql` query, `desc`ribe, `flatten`, `serialize`, `2json`,
- Includes the `zsv` CLI with the following built-in commands:
- `select`, `count`, `sql` query, `desc`ribe, `flatten`, `serialize`, `2json`,
`2db`, `stack`, `pretty`, `2tsv`, `paste`, `compare`, `jq`, `prop`, `rm`
* easily [convert between CSV/JSON/sqlite3](docs/csv_json_sqlite.md)
* [compare multiple files](docs/compare.md)
* CLI is easy to extend/customize with a few lines of code via modular plug-in
- easily [convert between CSV/JSON/sqlite3](docs/csv_json_sqlite.md)
- [compare multiple files](docs/compare.md)
- CLI is easy to extend/customize with a few lines of code via modular plug-in
framework. Just write a few custom functions and compile into a distributable
DLL that any existing zsv installation can use.

Expand Down Expand Up @@ -184,10 +184,10 @@ npm install zsv-lib

Please note:

* This package is still in alpha and currently only exposes a small subset of
- This package is still in alpha and currently only exposes a small subset of
the zsv library capabilities. More to come!
* The CLI is not yet available as a Node package
* If you'd like to use additional parser features, or use the CLI as a Node
- The CLI is not yet available as a Node package
- If you'd like to use additional parser features, or use the CLI as a Node
package, please feel free to post a request in an issue here.

### From source
Expand All @@ -198,19 +198,19 @@ See [BUILD.md](BUILD.md) for more details.

Our objectives, which we were unable to find in a pre-existing project, are:

* Reasonably high performance
* Runs on any platform, including web assembly
* Available as both a library and a standalone executable / command-line
- Reasonably high performance
- Runs on any platform, including web assembly
- Available as both a library and a standalone executable / command-line
interface utility (CLI)
* Memory-efficient, configurable resource limits
* Handles real-world CSV cases the same way that Excel does, including all edge
- Memory-efficient, configurable resource limits
- Handles real-world CSV cases the same way that Excel does, including all edge
cases (quote handling, newline handling (either `\n` or `\r`), embedded
newlines, abnormal quoting (e.g. aaa"aaa,bbb...)
* Handles other "dirty" data issues:
* Assumes valid UTF8, but does not misbehave if input contains bad UTF8
* Option to specify multi-row headers
* Does not assume or stop working in case of inconsistent numbers of columns
* Easy to use library or extend/customize CLI
- Handles other "dirty" data issues:
- Assumes valid UTF8, but does not misbehave if input contains bad UTF8
- Option to specify multi-row headers
- Does not assume or stop working in case of inconsistent numbers of columns
- Easy to use library or extend/customize CLI

There are several excellent tools that achieve high performance. Among those we
considered were xsv and tsv-utils. While they met our performance objective,
Expand All @@ -234,34 +234,34 @@ needs.

`zsv` comes with several built-in commands:

* `echo`: read CSV from stdin and write it back out to stdout. This is mostly
- `echo`: read CSV from stdin and write it back out to stdout. This is mostly
useful for demonstrating how to use the API and also how to create a plug-in,
and has several uses beyond that including adding/removing BOM, cleaning up
bad UTF8, whitespace or blank column trimming, limiting output to a contiguous
data block, skipping leading garbage, and even proving substitution values
without modifying the underlying source
* `select`: re-shape CSV by skipping leading garbage, combining header rows into
- `select`: re-shape CSV by skipping leading garbage, combining header rows into
a single header, selecting or excluding specified columns, removing duplicate
columns, sampling, converting from fixed-width input, searching and more
* `sql`: treat one or more CSV files like database tables and query with SQL
* `desc`: provide a quick description of your table data
* `pretty`: format for console (fixed-width) display, or convert to markdown
- `sql`: treat one or more CSV files like database tables and query with SQL
- `desc`: provide a quick description of your table data
- `pretty`: format for console (fixed-width) display, or convert to markdown
format
* `2json`: convert CSV to JSON. Optionally, output in
- `2json`: convert CSV to JSON. Optionally, output in
[database schema](docs/db.schema.json)
* `2tsv`: convert to TSV (tab-delimited) format
* `compare`: compare two or more tables of data and output the differences
* `paste` (alpha): horizontally paste two tables together (given inputs X and Y,
- `2tsv`: convert to TSV (tab-delimited) format
- `compare`: compare two or more tables of data and output the differences
- `paste` (alpha): horizontally paste two tables together (given inputs X and Y,
output 1...N rows where each row all columns of X in row N, followed by all
columns of Y in row N)
* `serialize` (inverse of flatten): convert an NxM table to a single 3x (Nx(M-1))
- `serialize` (inverse of flatten): convert an NxM table to a single 3x (Nx(M-1))
table with columns: Row, Column Name, Column Value
* `flatten` (inverse of serialize): flatten a table by combining rows that share
- `flatten` (inverse of serialize): flatten a table by combining rows that share
a common value in a specified identifier column
* `stack`: merge CSV files vertically
* `jq`: run a `jq` filter
* `2db`: [convert from JSON to sqlite3 db](docs/csv_json_sqlite.md)
* `prop`: view or save parsing options associated with a file, such as initial
- `stack`: merge CSV files vertically
- `jq`: run a `jq` filter
- `2db`: [convert from JSON to sqlite3 db](docs/csv_json_sqlite.md)
- `prop`: view or save parsing options associated with a file, such as initial
rows to ignore, or header row span. Saved options are be applied by default
when processing that file.

Expand Down Expand Up @@ -332,10 +332,10 @@ You can extend `zsv` by providing a pre-compiled shared or static library that
defines the functions specified in `extension_template.h` and which `zsv` loads
in one of three ways:

* as a static library that is statically linked at compile time
* as a dynamic library that is linked at compile time and located in any library
- as a static library that is statically linked at compile time
- as a dynamic library that is linked at compile time and located in any library
search path
* as a dynamic library that is located in the same folder as the `zsv`
- as a dynamic library that is located in the same folder as the `zsv`
executable and loaded at runtime if/as/when the custom mode is invoked

#### Example and template
Expand All @@ -354,24 +354,26 @@ helping, please post an issue.

### Possible enhancements and related developments

* online "playground" (soon to be released)
* optimize search; add search with hyperscan or re2 regex matching, possibly
- online "playground" (soon to be released)
- optimize search; add search with hyperscan or re2 regex matching, possibly
parallelize?
* optional OpenMP or other multi-threading for row processing
* auto-generated documentation, and better documentation in general
* Additional benchmarking. Would be great to use
- optional OpenMP or other multi-threading for row processing
- auto-generated documentation, and better documentation in general
- Additional benchmarking. Would be great to use
<https://bitbucket.org/ewanhiggs/csv-game/src/master/> as a springboard to
benchmarking a number of various tasks
* encoding conversion e.g. UTF16 to UTF8
- encoding conversion e.g. UTF16 to UTF8

## Contribute

* [Fork](https://github.com/liquidaty/zsv/fork) the project.
* Check out the latest [`main`](https://github.com/liquidaty/zsv/tree/main)
- [Fork](https://github.com/liquidaty/zsv/fork) the project.
- Check out the latest [`main`](https://github.com/liquidaty/zsv/tree/main)
branch.
* Create a feature or bugfix branch from `main`.
* Commit and push your changes.
* Submit the PR.
- Create a feature or bugfix branch from `main`.
- Update your required changes.
- Make sure to run `clang-format` (version 14 or later) for C source updates.
- Commit and push your changes.
- Submit the PR.

## License

Expand Down
Loading