Skip to content

kunaljubce/1brc

Repository files navigation

1brc

This is a Python implementation of the wildly popular One Billion Rows challenge, initiated originally to be solved only in Java - https://github.com/gunnarmorling/1brc

Experimental Setup

Runtime improvements over iterations

Iteration Number Runtime (in seconds)
1 1138.61
2

Implementation Details and improvements done over iterations

Iteration 1
  • Read the file in batches. Call batch_calculation() to do the calculation of average batch by batch, as the file is read. Inside this function -
    • First the tuple object read is split and converted to a list - [Place, Temperature]
    • This list is then converted to a dict {Place, Temperature} and inserted into an List[Dict] variable - input_batch_list.
    • Finally we are passing this input_batch_list variable to the calc_average_over_entire_data() function.
  • The calc_average_over_entire_data() achieves two objectives -
    • When called from within batch_calculation(), it iterates over the input_batch_list and calculates the average per batch.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages