Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce GC'able cache for Roo::Excelx::Extractor#doc #456

Merged

Conversation

chopraanmol1
Copy link
Member

Memory Profiling script:

MemoryProfiler.report{Roo::Excelx.new('test/files/Bibelbund.xlsx').tap{|x|(2..x.last_row).each{|i| x.row(i)}}}.pretty_print

Benchmark Script:

puts Benchmark.measure{ Roo::Excelx.new('test/files/Bibelbund.xlsx').tap{|x|(2..x.last_row).each{|i| x.row(i)}} }

Result on Master:

Total allocated: 37131810 bytes (517026 objects)
Total retained:  5562913 bytes (103010 objects)

allocated memory by gem
-----------------------------------
  19288066  roo/lib
  11049821  nokogiri-1.8.4
   6792403  rubyzip-1.2.2
      1304  tmpdir
       216  other

retained memory by gem
-----------------------------------
   5560934  roo/lib
       782  rubyzip-1.2.2
       725  nokogiri-1.8.4
       296  tmpdir
       176  other

  0.720000   0.020000   0.740000 (  0.733750)

Result after patch:

Total allocated: 34561842 bytes (504998 objects)
Total retained:  5563553 bytes (103026 objects)

allocated memory by gem
-----------------------------------
  19254338  roo/lib
   8513101  nokogiri-1.8.4
   6792403  rubyzip-1.2.2
      1304  tmpdir
       320  weakref
       216  other
       160  ref-2.0.0

retained memory by gem
-----------------------------------
   5561094  roo/lib
       782  rubyzip-1.2.2
       725  nokogiri-1.8.4
       320  weakref
       296  tmpdir
       176  other
       160  ref-2.0.0

  0.610000   0.010000   0.620000 (  0.618642)

Note:
Ruby does have native implementation of WeakRef. But I've choosed to go with 'ref' gem for this following reason:
https://github.com/ruby-concurrency/ref#problems-with-weakref

@chopraanmol1
Copy link
Member Author

Additionally, we can look if this optimization can be used elsewhere in the repo. If so, we can extract instance_cache code and optimize other places. But, I think it will be better to do that in separate PR.

@coveralls
Copy link

coveralls commented Sep 18, 2018

Coverage Status

Coverage decreased (-0.05%) to 93.888% when pulling feee367 on chopraanmol1:excelx_extractor_doc_with_gc-able_cache into 0a2e800 on roo-rb:master.

@tgturner
Copy link
Contributor

@chopraanmol1 The bottom of the ref README states: "MRI Ruby 2.0+ has a good implementation of WeakRef."

Since we only REALLY support MRI 2.0+, shouldn't we just use that rather than pull in an external dependency? That and I don't see much activity on the WeakRef repo in the past year.

Memory Profiling script:
```
MemoryProfiler.report{Roo::Excelx.new('test/files/Bibelbund.xlsx').tap{|x|(2..x.last_row).each{|i| x.row(i)}}}.pretty_print

```

Benchmark Script:
```
puts Benchmark.measure{ Roo::Excelx.new('test/files/Bibelbund.xlsx').tap{|x|(2..x.last_row).each{|i| x.row(i)}} }
```


Result on Master:
```
Total allocated: 37131810 bytes (517026 objects)
Total retained:  5562913 bytes (103010 objects)

allocated memory by gem
-----------------------------------
  19288066  roo/lib
  11049821  nokogiri-1.8.4
   6792403  rubyzip-1.2.2
      1304  tmpdir
       216  other

retained memory by gem
-----------------------------------
   5560934  roo/lib
       782  rubyzip-1.2.2
       725  nokogiri-1.8.4
       296  tmpdir
       176  other

  0.720000   0.020000   0.740000 (  0.733750)
```

Result after patch:
```
Total allocated: 34561842 bytes (504998 objects)
Total retained:  5563553 bytes (103026 objects)

allocated memory by gem
-----------------------------------
  19254338  roo/lib
   8513101  nokogiri-1.8.4
   6792403  rubyzip-1.2.2
      1304  tmpdir
       320  weakref
       216  other
       160  ref-2.0.0

retained memory by gem
-----------------------------------
   5561094  roo/lib
       782  rubyzip-1.2.2
       725  nokogiri-1.8.4
       320  weakref
       296  tmpdir
       176  other
       160  ref-2.0.0

  0.610000   0.010000   0.620000 (  0.618642)
```
@chopraanmol1 chopraanmol1 force-pushed the excelx_extractor_doc_with_gc-able_cache branch from c939c01 to feee367 Compare September 19, 2018 05:25
Copy link
Contributor

@tgturner tgturner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great

@tgturner tgturner merged commit bd0b0f8 into roo-rb:master Sep 19, 2018
chopraanmol1 added a commit to chopraanmol1/roo that referenced this pull request Sep 25, 2018
Discussed in roo-rb#457 (comment)

No noticeable performance or memory benefits.
I think roo-rb#456 must have solved most performance and memory issue regarding extract_hyperlinks.
tgturner pushed a commit that referenced this pull request Sep 27, 2018
Discussed in #457 (comment)

No noticeable performance or memory benefits.
I think #456 must have solved most performance and memory issue regarding extract_hyperlinks.
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Jan 20, 2019
pkgsrc change: add "USE_LANGUAGES= # none".

##  [2.8.0] 2019-01-18
### Fixed
- Fixed inconsistent column length for CSV [375](roo-rb/roo#375)
- Fixed formatted_value with `%` for Excelx [416](roo-rb/roo#416)
- Improved Memory consumption and performance [434](roo-rb/roo#434) [449](roo-rb/roo#449) [454](roo-rb/roo#454) [456](roo-rb/roo#456) [458](roo-rb/roo#458) [462](roo-rb/roo#462) [466](roo-rb/roo#466)
- Accept both Transitional and Strict Type for Excelx's worksheets [441](roo-rb/roo#441)
- Fixed ruby warnings [442](roo-rb/roo#442) [476](roo-rb/roo#476)
- Restore support for URL as file identifier for CSV [462](roo-rb/roo#462)
- Fixed missing location for Excelx's links [482](roo-rb/roo#482)

### Changed / Added
- Drop support for ruby 2.2.x and lower
- Updated rubyzip version for fixing security issue. Now minimal version is 1.2.1
- Roo::Excelx::Coordinate now inherits Array [458](roo-rb/roo#458)
- Improved Roo::HeaderRowNotFoundError exception's message [461](roo-rb/roo#461)
- Added `empty_cell` option which by default disable allocation for Roo::Excelx::Cell::Empty [464](roo-rb/roo#464)
- Added support for variable number of decimals for Excelx's formatted_value [387](roo-rb/roo#387)
- Added `disable_html_injection` option to disable html injection for shared string in `Roo::Excelx` [392](roo-rb/roo#392)
- Added image extraction for Excelx [414](roo-rb/roo#414) [397](roo-rb/roo#397)
- Added support for `1e6` as scientific notation for Excelx [433](roo-rb/roo#433)
- Added support for Integer as 0 based index for Excelx's `sheet_for` [455](roo-rb/roo#455)
- Extended `no_hyperlinks` option for non streaming Excelx methods [459](roo-rb/roo#459)
- Added `empty_cell` option to disable Roo::Excelx::Cell::Empty allocation for Excelx [464](roo-rb/roo#464)
- Added support for Integer with leading zero for Roo:Excelx [479](roo-rb/roo#479)
- Refactored Excelx code [453](roo-rb/roo#453) [477](roo-rb/roo#477) [483](roo-rb/roo#483) [484](roo-rb/roo#484)

### Deprecations
- Roo::Excelx::Sheet#present_cells is deprecated [454](roo-rb/roo#454)
- Roo::Utils.split_coordinate is deprecated [458](roo-rb/roo#458)
- Roo::Excelx::Cell::Base#link is deprecated [457](roo-rb/roo#457)
aravindm pushed a commit to chobiwa/roo that referenced this pull request Jun 18, 2019
Memory Profiling script:
```
MemoryProfiler.report{Roo::Excelx.new('test/files/Bibelbund.xlsx').tap{|x|(2..x.last_row).each{|i| x.row(i)}}}.pretty_print

```

Benchmark Script:
```
puts Benchmark.measure{ Roo::Excelx.new('test/files/Bibelbund.xlsx').tap{|x|(2..x.last_row).each{|i| x.row(i)}} }
```


Result on Master:
```
Total allocated: 37131810 bytes (517026 objects)
Total retained:  5562913 bytes (103010 objects)

allocated memory by gem
-----------------------------------
  19288066  roo/lib
  11049821  nokogiri-1.8.4
   6792403  rubyzip-1.2.2
      1304  tmpdir
       216  other

retained memory by gem
-----------------------------------
   5560934  roo/lib
       782  rubyzip-1.2.2
       725  nokogiri-1.8.4
       296  tmpdir
       176  other

  0.720000   0.020000   0.740000 (  0.733750)
```

Result after patch:
```
Total allocated: 34561842 bytes (504998 objects)
Total retained:  5563553 bytes (103026 objects)

allocated memory by gem
-----------------------------------
  19254338  roo/lib
   8513101  nokogiri-1.8.4
   6792403  rubyzip-1.2.2
      1304  tmpdir
       320  weakref
       216  other
       160  ref-2.0.0

retained memory by gem
-----------------------------------
   5561094  roo/lib
       782  rubyzip-1.2.2
       725  nokogiri-1.8.4
       320  weakref
       296  tmpdir
       176  other
       160  ref-2.0.0

  0.610000   0.010000   0.620000 (  0.618642)
```
aravindm pushed a commit to chobiwa/roo that referenced this pull request Jun 18, 2019
…b#459)

Discussed in roo-rb#457 (comment)

No noticeable performance or memory benefits.
I think roo-rb#456 must have solved most performance and memory issue regarding extract_hyperlinks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants