|
25 | 25 | There's plenty of other CSV parsers in the wild, but I had a hard time finding what I wanted. Inspired by Python's `csv` module, I wanted a library with **simple, intuitive syntax**. Furthermore, I wanted support for special use cases such as calculating statistics on very large files. Thus, this library was created with these following goals in mind.
|
26 | 26 |
|
27 | 27 | ### Performance and Memory Requirements
|
28 |
| -With the deluge of large datasets available, a performant CSV parser is a necessity. By using overlapped threads, memory mapped IO, and |
29 |
| -efficient data structures, this parser can quickly tackle large CSV files. Furthermore, this parser has a minimal memory footprint and |
30 |
| -can handle larger-than-RAM files. |
| 28 | +A high performance CSV parser allows you to take advantage of the deluge of large datasets available. By using overlapped threads, memory mapped IO, and |
| 29 | +minimal memory allocation, this parser can quickly tackle large CSV files--even if they are larger than RAM. |
| 30 | + |
| 31 | +In fact, [according to Visual Studio's profier](https://github.com/vincentlaucsb/csv-parser/wiki/Microsoft-Visual-Studio-CPU-Profiling-Results) this |
| 32 | +CSV parser **spends almost 90% of its CPU cycles actually reading your data** as opposed to getting hung up in hard disk I/O or pushing around memory. |
31 | 33 |
|
32 | 34 | #### Show me the numbers
|
33 |
| -On my computer (Intel Core i7-8550U @ 1.80GHz/Toshiba XG5 SSD), this parser can read |
34 |
| - * the [69.9 MB 2015_StateDepartment.csv](https://github.com/vincentlaucsb/csv-data/tree/master/real_data) in 0.26 seconds (269 MBps) |
35 |
| - * a [1.4 GB Craigslist Used Vehicles Dataset](https://www.kaggle.com/austinreese/craigslist-carstrucks-data/version/7) in 2.1 seconds (667 MBps) |
36 |
| - * a [1.24GB Car Accidents Dataset](https://www.kaggle.com/sobhanmoosavi/us-accidents) in 5 seconds (248 MBps) |
| 35 | +On my computer (12th Gen Intel(R) Core(TM) i5-12400 @ 2.50 GHz/Western Digital Blue 5400RPM HDD), this parser can read |
| 36 | + * the [69.9 MB 2015_StateDepartment.csv](https://github.com/vincentlaucsb/csv-data/tree/master/real_data) in 0.19 seconds (360 MBps) |
| 37 | + * a [1.4 GB Craigslist Used Vehicles Dataset](https://www.kaggle.com/austinreese/craigslist-carstrucks-data/version/7) in 1.18 seconds (1.2 GBps) |
| 38 | + * a [2.9GB Car Accidents Dataset](https://www.kaggle.com/sobhanmoosavi/us-accidents) in 8.49 seconds (352 MBps) |
37 | 39 |
|
38 | 40 | ### Robust Yet Flexible
|
39 | 41 | #### RFC 4180 and Beyond
|
|
0 commit comments