Parser for *.csv.gz traffic files. Requires Node.js 12 or later.
Example usage:
$ git clone https://github.com/bbar/traffic-parser.git traffic-parser
$ cd traffic-parser
$ yarn install # (or npm install)
$ node parse.js \
--sources="/some/path/a.csv.gz /some/path/b.csv.gz /some/path/c.csv.gz /some/path/d.csv.gz" \
--destination="/some/path/data/parsed/intervals" \
--weekdays="0,1,2,3,4,5,6" \
--batch=625 \
--sourceInterval=5 \
--targetInterval=60
| Argument | Required | Type | Default | Description |
|---|---|---|---|---|
| sources | yes | String | Files to parse | |
| destination | yes | String | Directory where parsed files are placed | |
| weekdays | no | String | 0,1,2,3,4,5,6 | Weekdays to parse (0=Sunday) |
| batch | no | Int | 625 | Max lines written to any file at once |
| sourceInterval | no | Int | 5 | (Mins) Interval between traffic samples in the *.csv.gz file |
| targetInterval | no | Int | 5 | (Mins) Desired interval between traffic samples for parsed files |
A quick note about the batch argument... A batch size of 625 means the code will parse 625 lines from a *.csv.gz file, write those to disk, then parse another 625 lines, write to disk, …, until it’s done. If you set a batch limit of 30000 or so and haven't increased node's limit with --max-old-space-size, V8 will likely explode with a memory error. Surprisingly (to me, anyway) setting a larger batch size doesn't mean better performance. I started at 20000 and kept cutting it in half until I saw the best performance, which was around 625. So that's the default.