Skip to content

Releases: vincentlaucsb/csv-parser

CSV Parser 2.3.0: Race Condition Fix

15 Jun 20:23
105b44d
Compare
Choose a tag to compare

What's Changed

  • CSVField: new member function try_parse_decimal() to specify one or more decimal symbols by @wilfz in #226
  • Replace the includes of Windows.h with windows.h (#204) by @ludovicdelfau in #235
  • Use const CSVFormat& in calculate_score by @rajgoel in #236
  • Fix memory issues in CSVFieldList by @vincentlaucsb in #237

Race Condition Notes

Background

The CSV Parser tries to perform as few allocations as possible. Instead of naively storing individual CSV fields as singular std::strings in a std::vector, the parser keeps references to the raw input and uses lightweight RawCSVField objects to mark where a specific field starts and ends in that field (as well as flag indicating if an escaped quote is present). This has the benefits of:

  1. Avoiding the cost of constructing many std::string instances
  2. Avoiding the cost of constant std::vector reallocations
  3. Preserving locality of reference

Furthermore, the CSV Parser also uses separate threads for parsing CSV and for iterating over the data. As CSV rows are parsed, they are made available to the user who may utilize them without interrupting the parsing of new rows.

The Race Condition

The RawCSVField objects mentioned previously were stored as contiguous blocks, and an std::vector of pointers to these blocks were used to keep track of them.

However, as @ludovicdelfau accurately diagnosed, if the reading thread attempted to access a RawCSVField (e.g. through reading a CSVField ) at the same time that a parsing thread was pushing a new RawCSVField to an at-capacity std::vector, the parsing thread's push would cause the contents of the std::vector to be reallocated, thus causing the reading thread to access deallocated memory.

This issue was first reported in #217.

The Fix

The fix was simple. An std::deque was dropped in to replace std::vector to store RawCSVField pointers, as std::deque does not perform reallocations. This change appears to even improve the CSV Parser's performance as the cost of constant reallocations is avoided. The loss of memory locality typical in std::deque applications was avoided as, again, the CSV Parser is storing pointers to RawCSVField[] and not the RawCSVField objects themselves.

New Contributors

Full Changelog: 2.2.3...2.3.0

CSV Parser 2.2.3

26 May 21:26
Compare
Choose a tag to compare
  • Fix n_rows() being off-by-one when the CSVReader iterator was used (reported in #173)
    • Note: This was due to a simple counting error where the iterator did not increment the row counter for the first row. All rows were still correctly read.
  • Implement ability to handle arbitrary combinations of \r and \n in line endings (#223)
  • Fix CSV writers incorrectly converting decimal values between 0 and -1 to positive numbers

CSV Parser 2.2.2

20 May 03:52
84c1db1
Compare
Choose a tag to compare

What's Changed

  • Allow parsing of numbers that begin with +, fixing #213
  • Fix compiler warnings in g++ from using abs and in try_parse_hex() #227
  • Fix invalid memory access issue in g++ builds #228
    • Issue was caused when using CSVField methods in conjunction with CSVRow reverse iterators
  • CMake options to disable programs building by @BaptisteLemarcis in #148

New Contributors

Full Changelog: 2.2.1...2.2.2

CSV Parser 2.2.1

18 May 02:28
Compare
Choose a tag to compare

This is a simple CMake change that makes it easier to #include "csv.hpp" in a CMake project that grabs csv-parser using FetchContent_Declare().

What's Changed

  • Provide directory of library's header as the include directory by @grosscol in #220

New Contributors

Full Changelog: 2.2.0...2.2.1

CSV Parser 2.2.0

03 Apr 07:13
Compare
Choose a tag to compare

CSV Parser 2.1.3

29 Jul 05:39
ea547fd
Compare
Choose a tag to compare
  • Fix various compatibility issues with g++ and clang
  • Added hex value parsing
  • Fixed a rare out-of-bounds condition

CSV Parser 2.1.2

27 Jul 03:28
39a6af6
Compare
Choose a tag to compare
  • Fixed compilation issues with C++11 and 14.
    • CSV Parser should now be should C++11 compatible once again with g++ 7.5 or up
  • Allowed users to customize decimal place precision when writing CSVs
  • Fixed floating point output
    • Arbitrarily large integers stored in doubles can now be output w/o limits
  • Fixed newlines not being escaped by CSVWriter

CSV Parser 2.1.1

15 Apr 08:18
Compare
Choose a tag to compare
  • Fixed CSVStats only processing first 5000 rows thanks to @TobyEalden
  • Fixed parsing """fields like this""" thanks to @rpadrela
  • Fixed CSVReader move semantics thanks to @artpaul

Minor Patch

20 Dec 11:11
Compare
Choose a tag to compare

Fixed #142 where decimal numbers were being printed properly by CSVWriter, and incorporated #137 and #134

Better, faster, stronger

18 Oct 09:30
621a9d9
Compare
Choose a tag to compare

New Features

  • CSVReader can now parse from memory mapped files, std::stringstream, and std::ifstream
  • DelimWriter now supports writing rows encoded as std::tuple
  • DelimWriter automatically converts numbers and other data types stored in vectors, arrays, and tuples

Improvements

  • CSVReader is now a no-copy parser when memory-mapped IO is used
    • CSVRow and CSVField now refer to the original memory map
  • Significant performance improvements for some files

Bug Fixes

  • Fixed potential thread safety issues with internals::CSVFieldList

API Changes

  • CSVReader::feed() and CSVReader::end_feed() have been removed. In-memory parsing should be performed via the interface for std::stringsteam.