Releases: kenfar/DataGristle
v0.2.2
V0.2.1 - 2021-04
- Improvement: the field-names from headers can now be used instead of column offsets
for gristle_sorter, gristle_freaker, gristle_profiler, and gristle_slicer. - Improvement: The use of the header now follows four simple rules:
- It can be referred to as row 0 when it makes sense - like with gristle_slicer
& gristle_viewer. - It will be passed through when it makes sense - like with gristle_sorter.
- It will be used to translate field names to offsets for configuration.
- But will otherwise be ignored.
- It can be referred to as row 0 when it makes sense - like with gristle_slicer
- Bug Fix: gristle_freaker was failing with 0-length files when using col-type=each
- Bug Fix: gristle_sorter was failing with some multi-directional sorts
Installation can be done through either pypi or building from source:
- pip from pypi: https://pypi.org/project/datagristle/0.2.2/
- pip from this release on github: pip install git+https://github.com/kenfar/[email protected]#egg=datagristle
- build from source
v0.2.1
V0.2.1 - 2021-04
Improvement: added gristle_sorter as a script to install in the system so that it is available to users.
Improvement: Now supports python versions 3.8 and 3.9.
Improvement: All csv programs now support envvars and config files for input and can generate config files.
Improvement: Programs always autodetect file csv dialect before applying user overrides - except for piped-in data. This results in a very consistent experience but also means that you may sometimes need to turn dialect options off rather than only on.
Improvement: A directly or example configurations is provided for reference - and is also used for testing: https://github.com/kenfar/DataGristle/tree/master/examples
BREAKING CHANGE: dropped support for python version 3.7
BREAKING CHANGES to all csv programs:
- Various changes to names of options for consistency between programs, with older names caught with an error msg that provides the new name.
- Various improvements to csv dialect handling for consistency and correct handling of escapechar, doublequoting, skipinitialspace.
Installation can be done through either pypi or building from source:
* pip from pypi: https://pypi.org/project/datagristle/0.2.1/
* pip from this release on github: pip install -U -e git://github.com/kenfar/[email protected]#egg=datagristle
* build from source
Maintenance release with breaking changes
Improvement: now supports python versions 3.7 and 3.8
BREAKING CHANGE: dropped support for python version 3.6
Bumped versions on dependent modules to eliminate vulnerabilities
gristle_differ
- BREAKING CHANGE: col_names renamed to col-names for consistency
- Fixes --already-unix option bug with file parsing
- Fixes --stats bug with empty files
- Improvement: added ability to use column names from file headers
- breaking change: col_names renamed to col-names for consistency
- Improvement: if a key-col is in the ignore-cols - it will simply be ignored, and the program will continue processing.
- Improvement: if a key-col is in the compare-cols - it will simply be ignored, and the program will continue processing.
- Improvement: if neither compare or ignore cols are provided it will use all cols as compare-cols and continue processing.
- Improvement: CLI help is updated to provide more details and accurate examples of these options.
Added gristle_dir_merger
This release adds gristle_dir_merger - a tool for consolidating large directories of files. This tool is both fast and flexible. More info can be found on the readme, or by entering gristle_dir_merger --long-help.
Upgraded gristle_validator
The primary feature of this release is the support within gristle_validator of the json schema. This allows users to define a schema with data quality requirements (identify fields in a csv, then for each field describe type, min & max value, min & max length, whether or not blanks are allowed, and provide a regex validation pattern).