Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary cache hash for Roo::Excelx #454

Merged

Conversation

chopraanmol1
Copy link
Member

  1. After Excelx Memory and Performance Optimization #434 we no longer need to cache Roo::Utils.extract_coordinate

  2. Roo::Excelx::Sheet#present_cells is only used to calculate first/last row/column. Refactored code to avoid creation of present_cells hash.

Memory Profile Script

require 'roo'
require 'memory_profiler'

MemoryProfiler.report do
  excel = Roo::Excelx.new('test/files/Bibelbund.xlsx')
  (1..excel.last_row).each{|a| excel.row a}
end.pretty_print

Master

Total allocated: 42820054 bytes (545014 objects)
Total retained:  12833905 bytes (185969 objects)

allocated memory by gem
-----------------------------------
  24082522  roo/lib
  12173047  nokogiri-1.8.4
   6563085  rubyzip-1.2.2
      1184  tmpdir
       216  other

retained memory by gem
-----------------------------------
   9059622  roo/lib
   3773019  nokogiri-1.8.4
       792  rubyzip-1.2.2
       296  tmpdir
       176  other

Modified

Total allocated: 38800821 bytes (517025 objects)
Total retained:  9879455 bytes (157986 objects)

allocated memory by gem
-----------------------------------
  20951802  roo/lib
  11053807  nokogiri-1.8.4
   6793812  rubyzip-1.2.2
      1184  tmpdir
       216  other

retained memory by gem
-----------------------------------
   7224422  roo/lib
   2653779  nokogiri-1.8.4
       782  rubyzip-1.2.2
       296  tmpdir
       176  other

This PR also improves preformance by 5-12%

1. After roo-rb#434 we no longer need to cache Roo::Utils.extract_coordinate

2. Roo::Excelx::Sheet#present_cells is only used to calculate first/last row/column. Refactored code to avoid creation of present_cells hash.

Memory Profile Script
```
require 'roo'
require 'memory_profiler'

MemoryProfiler.report do
  excel = Roo::Excelx.new('test/files/Bibelbund.xlsx')
  (1..excel.last_row).each{|a| excel.row a}
end.pretty_print
```

Master
```
Total allocated: 42820054 bytes (545014 objects)
Total retained:  12833905 bytes (185969 objects)

allocated memory by gem
-----------------------------------
  24082522  roo/lib
  12173047  nokogiri-1.8.4
   6563085  rubyzip-1.2.2
      1184  tmpdir
       216  other

retained memory by gem
-----------------------------------
   9059622  roo/lib
   3773019  nokogiri-1.8.4
       792  rubyzip-1.2.2
       296  tmpdir
       176  other
```

Modified
```
Total allocated: 38800821 bytes (517025 objects)
Total retained:  9879455 bytes (157986 objects)

allocated memory by gem
-----------------------------------
  20951802  roo/lib
  11053807  nokogiri-1.8.4
   6793812  rubyzip-1.2.2
      1184  tmpdir
       216  other

retained memory by gem
-----------------------------------
   7224422  roo/lib
   2653779  nokogiri-1.8.4
       782  rubyzip-1.2.2
       296  tmpdir
       176  other
```

This PR also improves performance by 5-12%
@coveralls
Copy link

coveralls commented Sep 16, 2018

Coverage Status

Coverage decreased (-0.1%) to 93.924% when pulling a2853fe on chopraanmol1:remove_unnecessary_cache_for_excelx into 782420b on roo-rb:master.

@chopraanmol1
Copy link
Member Author

Downside:
For Excelx file with all cell as hyperlink, performance is slower by 3-5%.

Copy link
Contributor

@tgturner tgturner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Would love to see an answer about removing the public method for a sheet.

@@ -81,8 +82,8 @@ def cell_value_type(type, format)
# # => <Excelx::Cell::String>
#
# Returns a type of <Excelx::Cell>.
def cell_from_xml(cell_xml, hyperlink)
coordinate = ::Roo::Utils.extract_coordinate(cell_xml[COMMON_STRINGS[:r]])
def cell_from_xml(cell_xml, hyperlink, coordinate=nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing space around =

first_row = last_row = first_col = last_col = nil

cells.each do |(row, col), cell|
if cell && !cell.empty?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be next unless cell && !cell.empty? instead of wrapping the whole thing in an if statement

@@ -22,10 +22,6 @@ def cells
@cells ||= @sheet.cells(@rels)
end

def present_cells
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a public method, so maybe leave it in if people are using it? Or deprecate it? Or the next release will just need to be a breaking release, probably worth adding to the release notes if so.

@tgturner tgturner self-assigned this Sep 16, 2018
Add presence method in Roo::Excelx::Cell::*

Some cosmetic fixup
@chopraanmol1 chopraanmol1 force-pushed the remove_unnecessary_cache_for_excelx branch from c9413e4 to a2853fe Compare September 17, 2018 07:04
Copy link
Contributor

@tgturner tgturner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, except the deprecation warning isn't indented correctly.

lib/roo/excelx/sheet.rb Show resolved Hide resolved
@tgturner tgturner merged commit ced9c5c into roo-rb:master Sep 17, 2018
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Jan 20, 2019
pkgsrc change: add "USE_LANGUAGES= # none".

##  [2.8.0] 2019-01-18
### Fixed
- Fixed inconsistent column length for CSV [375](roo-rb/roo#375)
- Fixed formatted_value with `%` for Excelx [416](roo-rb/roo#416)
- Improved Memory consumption and performance [434](roo-rb/roo#434) [449](roo-rb/roo#449) [454](roo-rb/roo#454) [456](roo-rb/roo#456) [458](roo-rb/roo#458) [462](roo-rb/roo#462) [466](roo-rb/roo#466)
- Accept both Transitional and Strict Type for Excelx's worksheets [441](roo-rb/roo#441)
- Fixed ruby warnings [442](roo-rb/roo#442) [476](roo-rb/roo#476)
- Restore support for URL as file identifier for CSV [462](roo-rb/roo#462)
- Fixed missing location for Excelx's links [482](roo-rb/roo#482)

### Changed / Added
- Drop support for ruby 2.2.x and lower
- Updated rubyzip version for fixing security issue. Now minimal version is 1.2.1
- Roo::Excelx::Coordinate now inherits Array [458](roo-rb/roo#458)
- Improved Roo::HeaderRowNotFoundError exception's message [461](roo-rb/roo#461)
- Added `empty_cell` option which by default disable allocation for Roo::Excelx::Cell::Empty [464](roo-rb/roo#464)
- Added support for variable number of decimals for Excelx's formatted_value [387](roo-rb/roo#387)
- Added `disable_html_injection` option to disable html injection for shared string in `Roo::Excelx` [392](roo-rb/roo#392)
- Added image extraction for Excelx [414](roo-rb/roo#414) [397](roo-rb/roo#397)
- Added support for `1e6` as scientific notation for Excelx [433](roo-rb/roo#433)
- Added support for Integer as 0 based index for Excelx's `sheet_for` [455](roo-rb/roo#455)
- Extended `no_hyperlinks` option for non streaming Excelx methods [459](roo-rb/roo#459)
- Added `empty_cell` option to disable Roo::Excelx::Cell::Empty allocation for Excelx [464](roo-rb/roo#464)
- Added support for Integer with leading zero for Roo:Excelx [479](roo-rb/roo#479)
- Refactored Excelx code [453](roo-rb/roo#453) [477](roo-rb/roo#477) [483](roo-rb/roo#483) [484](roo-rb/roo#484)

### Deprecations
- Roo::Excelx::Sheet#present_cells is deprecated [454](roo-rb/roo#454)
- Roo::Utils.split_coordinate is deprecated [458](roo-rb/roo#458)
- Roo::Excelx::Cell::Base#link is deprecated [457](roo-rb/roo#457)
aravindm pushed a commit to chobiwa/roo that referenced this pull request Jun 18, 2019
* Remove unnecessary cache hash

1. After roo-rb#434 we no longer need to cache Roo::Utils.extract_coordinate

2. Roo::Excelx::Sheet#present_cells is only used to calculate first/last row/column. Refactored code to avoid creation of present_cells hash.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants