Skip to content

Tips For Large Files

Sal edited this page Jun 13, 2024 · 11 revisions

Here are some tips on how to traverse larger Parquet files.

Adjust how many records will be loaded

By default, the application will only load and display the first 1000 records within the Parquet file. This is because records must be loaded into memory in order to be displayed. And this might not be possible for larger Parquet files with a lot of records.

  • This setting can be found in the top right corner of the UI.
  • Adjust this according to how much your machine can handle. This number directly determines how much RAM the program will use to load the necessary records
  • Selecting too much will cause a crash
  • Reducing the amount of fields to display will help reduce the memory footprint.

Load only fields you care about

By default the application will try to load all the fields within the Parquet file into the UI. This might be okay for smaller files but when dealing with larger files selecting all the fields might not be very efficient

Reducing the amount of fields to load will decrease load times and reduce memory usage, allowing more records to be loaded into memory for display.

Use default column sizing

Using the Edit → Column Sizing → Columns and Contents option comes at a performance cost because the utility needs to go through all the cells and determine the minimum width necessary to show all contents. Using the Column Names option will provide a performance boost.