Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic repeater can break colorspace #473

Closed
dmarkow opened this issue Apr 10, 2013 · 30 comments
Closed

Dynamic repeater can break colorspace #473

dmarkow opened this issue Apr 10, 2013 · 30 comments

Comments

@dmarkow
Copy link
Contributor

dmarkow commented Apr 10, 2013

require 'bundler'
Bundler.require

Prawn::Document.generate("test.pdf") do
  repeat :all, dynamic: true do
    text "Page #{page_number}", size: 24, style: :bold
  end

  text "Test"

  # start_new_page
  table([['Test', 'Test']])

  start_new_page
  text "Test"
end

Using Adobe Acrobat or Reader as my viewer (both on OS X and Windows), with the code above, the "Page #" repeater only shows up on the first page. The second page still outputs the "Test" text, but there is no "Page 2" at the top.

If I uncomment the start_new_page call before the table call, then all 3 pages get "Page 1", "Page 2", "Page 3".

OS X's Preview.app works fine, it's only Acrobat that has the problem (it seems common that Preview is a little more forgiving with PDF files?).

For now my workaround is to just not use dynamic repeaters (meaning I can't refer to the page count in them).

@nicdal
Copy link

nicdal commented Apr 30, 2013

+1

@jlduran
Copy link

jlduran commented May 25, 2013

Prawn::Document.generate("test.pdf") do
  text "Test"

  # start_new_page
  table([['Test', 'Test']])

  start_new_page
  text "Test"

  number_pages "Page <page>", size: 24, style: :bold                                             
end

@dmarkow
Copy link
Contributor Author

dmarkow commented May 25, 2013

@jlduran My example is simplified; in my use case, I can't use number_pages because I'm doing additional formatting (I'm actually using a table in my repeater) and don't want the number in the exact place number_pages will put it.

@dmarkow
Copy link
Contributor Author

dmarkow commented Jan 13, 2014

Is there a reason this was closed? It hasn't had any updates, but it is still an issue. (I just tested it on the master branch and it's still exhibiting the same problem).

@practicingruby
Copy link
Member

Hi @dmarkow, sorry for the confusion on this.

In November we did a bulk close on any ticket that did not see recent activity, because we had something like 100+ open tickets that had never been properly addressed by our core team or the submitter had gone silent (some as old as 2-3 years!). There's a mailing list post about that here: https://groups.google.com/forum/#!topic/prawn-ruby/czGB6L4HZhQ

The process for re-opening a ticket was simply leaving a comment and making sure there is a reproducing example, and this ticket meets both of those criteria. I've confirmed the bug still exists in Adobe Reader on master, so we'll look into it eventually. If you want to investigate and contribute a fix or some research, we're actively monitoring the tracker now, and a patch is absolutely welcome!

@dmarkow
Copy link
Contributor Author

dmarkow commented Jan 14, 2014

Thanks! I did a comparison with and without the table and looked at the generated PDF files. I then manually removed the chunk relating to the table from the file with a table. After doing so, there was one difference between the files.

With a table on Page 1, the repeater code looks like this:

q
/DeviceRGB cs
0.000 0.000 0.000 scn
0.000 0.000 0.000 SCN
1 w
0 J
0 j
[ ] 0 d

BT
36 127.384 Td
/F1.0 12 Tf
[<50> 40 <6167652032>] TJ
ET

Q

And without a table, it looks like this:

q
/DeviceRGB cs
0.000 0.000 0.000 scn
/DeviceRGB CS
0.000 0.000 0.000 SCN
1 w
0 J
0 j
[ ] 0 d

BT
36 127.384 Td
/F1.0 12 Tf
[<50> 40 <6167652032>] TJ
ET

Q

The file with a table appears to be missing a /DeviceRGB CS near the beginning. When I manually add it, the file shows the table and the repeater perfectly. This is the first time I've ever looked at the source of a PDF file, so I'm not sure exactly what these codes are doing, but I'll try and dig into the prawn source code to see if I can figure it out.

@practicingruby
Copy link
Member

Oh, that makes sense!

It looks like the colorspace isn't being set for the stroke color when table is used, even though the fill color is being specified. Here's what the code is doing.

Set non-stroke colorspace to RGB

/DeviceRGB cs

Set nonstroke color to black

0.000 0.000 0.000 scn

Set stroke colorspace to RGB

/DeviceRGB CS

Set stroke color to black

0.000 0.000 0.000 SCN

It makes sense to me that missing the colorspace definition would lead to corruption.

@practicingruby
Copy link
Member

And if you look at the lib/prawn/repeaters.rb file, it calls save_graphics_state. My guess is that some interaction between dynamic repeater manipulation of the graphics state stack and whatever table is doing under the hood with graphics state is to blame. Please do take a closer look!

@dmarkow
Copy link
Contributor Author

dmarkow commented Jan 14, 2014

Dug in a little more. Part of the table code includes a draw_borders(pt) method. If I clear out this method, everything works fine. I started removing a bit of code at a time and it looks like even this super simple code causes the issue:

class Table
  class Cell
    def draw_borders(pt)
      @pdf.mask(:line_width, :stroke_color) do
      end
    end
  end
end

If I remove stroke_color from the list of values to mask, it works fine. Digging into the mask method, I threw some debug statements in before/after the yield to see if stroke_color is getting properly reset, which it is:

class Document
  def mask(*fields) # :nodoc:
    # Stores the current state of the named attributes, executes the block, and
    # then restores the original values after the block has executed.
    # -- I will remove the nodoc if/when this feature is a little less hacky

    puts "stroke_color before: #{stroke_color}" # Outputs 000000
    stored = {}
    fields.each { |f| stored[f] = send(f) }
    yield
    fields.each { |f| send("#{f}=", stored[f]) }
    puts "stroke_color after: #{stroke_color}" # Outputs 000000
  end
end

The only thing I can think of is that maybe there's something about the way sending stroke_color= works that's causing this? I'll keep digging...

@practicingruby
Copy link
Member

stroke_color= is just an alias for stroke_color(color) so I don't think that would cause problems. But keep looking! You may want to look at the calls to set_color_space (defined in prawn/graphics/color.rb` to see the difference between how it's being called in the different cases.. that's where the colorspace is actually set.

@dmarkow
Copy link
Contributor Author

dmarkow commented Jan 14, 2014

Ok, so here's the code within set_color that is being called.

set_color_space type, color_space(color)
color = color_to_s(color)
write_color(color, operator)

The problem is within set_color_space:

return if current_color_space(type) == color_space && !state.page.in_stamp_stream?

The stroke colorspace for DeviceRGB is already set, so it's not setting it again. However, the write_color line still executes anyway. So this is why we have the missing DeviceRGB line, but still have the 0.000 0.000 0.000 SCN line being written anyway.

Coming back to my earlier comparison, the same way adding the /DeviceRGB CS line fixed the problem, just removing the 0.000 0.000 0.000 SCN line also fixes the problem.

So at this point I'm not sure which is best: Adding a conditional to write_color too to make sure it doesn't write the same stroke color that's already set, or removing the conditional from set_color_space?

@practicingruby
Copy link
Member

Sounds like you nailed it! Next step is to write a test to reproduce the bad behavior.

You should be able to use the tests in https://github.com/prawnpdf/prawn/blob/master/spec/graphics_spec.rb as a guide.

@practicingruby
Copy link
Member

@dmarkow: Here are some specific lines that should be useful for testing examples:

https://github.com/prawnpdf/prawn/blob/master/spec/graphics_spec.rb#L233-L281

@dmarkow
Copy link
Contributor Author

dmarkow commented Jan 14, 2014

Yeah, I can get a spec written up, my only question is: what's the "correct" output?

/DeviceRGB cs
0.000 0.000 0.000 scn
/DeviceRGB CS
0.000 0.000 0.000 SCN

or

/DeviceRGB cs
0.000 0.000 0.000 scn

I would assume it would be best to err on the side of caution and explicitly set the colorspace again even if it's already set, but I'm not super familiar with the inner workings of the PDF format.

@practicingruby
Copy link
Member

Oh, those are not repeated statements, the capitalization matters. the lowercase ones (cs/scn) are setting the fill (non-stroke) colorspace + color, the uppercase (CS/SCN) are setting the stroke colorspace + color.

If both the fill color and stroke color have been set, I'd expect to see the first example, if only the fill color has been set, I'd expect to see the second example.

For details on this, see page 240 of the PDF standard:
http://wwwimages.adobe.com/www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/pdf_reference_1-7.pdf

@dmarkow
Copy link
Contributor Author

dmarkow commented Jan 14, 2014

I understand the case difference. I'm just confused why this (the "bad" output from the original issue) doesn't work:

/DeviceRGB cs
0.000 0.000 0.000 scn
0.000 0.000 0.000 SCN

Currently, prawn skips the /DeviceRGB CS line because it thinks the current stroke colorspace is already RGB. So shouldn't the 0.000 0.000 0.000 SCN line, setting the stroke color to black, work fine?

I'm inclined to remove the conditional and allow setting /DeviceRGB CS even if it's already set, but I don't know what the side-effects of that would be (I tried removing it and all the specs still pass...)

@practicingruby
Copy link
Member

It looks like the problem is on page 2, not page 1. Yes, the stroke colorspace is showing up on the first page (it's object 5 0, which uses the stream from object 4 0). But the stream in 4 0 resets the graphics stack to the original state as its last action (see the Q Q calls at the end)

So by the time we get to rendering the second page (8 0), the colorspace is NOT set, but there are calls to SCN (see stream 7 0). I'm not sure what's causing Prawn to get out of sync with the PDF, but that may be the source of the problem.

@practicingruby
Copy link
Member

Note, all of the above are from your original example at the top of this issue.

@practicingruby
Copy link
Member

We also have a test that says the colorspace should be set on every page to please fussy readers whenever the colors have been set, but it's clear from the output that's not happening in this case.

Here's the test:
https://github.com/prawnpdf/prawn/blob/master/spec/graphics_spec.rb#L272-L281

@dmarkow
Copy link
Contributor Author

dmarkow commented Jan 14, 2014

I think this has to do with the way the repeater runs. After all the pages are generated, it goes back and runs the repeater on each page.

I threw a puts line in the run method for the repeater so we can see what the graphic state is for each page's repeater. When I have a table on the first page, the graphics state already has the stroke color space set to DeviceRGB

#<PDF::Core::GraphicState:0x007f9eef963170 @color_space={:stroke=>:DeviceRGB}, @fill_color="000000", @stroke_color="000000", @dash={:dash=>nil, :space=>nil, :phase=>0}, @cap_style=:butt, @join_style=:miter, @line_width=1>
#<PDF::Core::GraphicState:0x007f9eeee0ba70 @color_space={:stroke=>:DeviceRGB}, @fill_color="000000", @stroke_color="000000", @dash={:dash=>nil, :space=>nil, :phase=>0}, @cap_style=:butt, @join_style=:miter, @line_width=1>

When I add a start_new_page to push the table to the second page, though, it clears out the stroke color space, causing it to be re-defined on each page, which is why it displays properly:

#<PDF::Core::GraphicState:0x007fdc1b7202a0 @color_space={}, @fill_color="000000", @stroke_color="000000", @dash={:dash=>nil, :space=>nil, :phase=>0}, @cap_style=:butt, @join_style=:miter, @line_width=1>
#<PDF::Core::GraphicState:0x007fdc1bf9e008 @color_space={}, @fill_color="000000", @stroke_color="000000", @dash={:dash=>nil, :space=>nil, :phase=>0}, @cap_style=:butt, @join_style=:miter, @line_width=1>
#<PDF::Core::GraphicState:0x007fdc1bf7a2c0 @color_space={}, @fill_color="000000", @stroke_color="000000", @dash={:dash=>nil, :space=>nil, :phase=>0}, @cap_style=:butt, @join_style=:miter, @line_width=1>

There must be something telling the repeater to always use the first page's graphic state, even if we're running on a different page?

@practicingruby
Copy link
Member

To clarify a bit, you don't necessarily want to save the graphics state of the first page, what you want to do is create a complete snapshot of the graphics state wherever the repeater is defined, fully restore it before running the repeater code, and then fully revert to whatever the graphics state was on a given page after the repeater code runs.

In other words, repeaters should not assume anything about the page they're being run on, they should push a new state onto the graphics stack, replay the settings from wherever they were defined, then pop the stack to ensure that none of their settings affect the rest of the page. Right now I think Prawn is trying to do that, but corrupting things in the process due to a mismatch between Prawn's internal state and what's being written to the PDF.

I'd look at the following methods for further investigate:

freeze_stamp_graphics: https://github.com/prawnpdf/prawn/blob/master/lib/prawn/stamp.rb#L125-L131
save_graphics_state: https://github.com/prawnpdf/prawn/blob/master/lib/prawn/document/graphics_state.rb#L42-L49
Repeater#run: https://github.com/prawnpdf/prawn/blob/master/lib/prawn/repeater.rb#L111-L116

Sorry for the spaghetti code. Once we narrow down the source of the bug and get a test written, I'll probably refactor this somewhat, because it clearly needs it.

@dmarkow
Copy link
Contributor Author

dmarkow commented Jan 14, 2014

Thanks for pointing me in the right direction. I think (hope) I'm making some progress.

In my original example, moving the repeater code to the bottom of the generate block makes it work perfectly fine for both Page 1/2. Looking at Repeater#initialize, it looks like @graphic_state is set to whatever the current page's graphic state is when the repeater is declared. Since my second page doesn't have a table on it, the repeater state for that page doesn't have the stroke colorspace set. Since that's now the page where I define the repeater, it starts with a "clean" graphic state.

It looks like when the repeater runs, it executes @document.save_graphics_state(@graphic_state) for each page. save_graphics_state(graphic_state) calls save_graphic_state(previous_state), which then initializes a new GraphicState object based on previous_state which is set to that @graphic_state variable. So now the new "clean" state has inherited the color_space parameter from @graphic_state.

So whatever state was used to initialize @graphic_state is used for every page, rather than the actual current page's state (which is why having a table-free Page 1 allowed for a relatively "clean" graphic state for my repeater).

It looks like, in the run method, @document.graphic_state does correctly refers to the current page's state for each iteration. So could we possibly avoid using the parameter altogether, causing it to push a new graphic state onto the stack based on the current page's state rather than Page 1's state?

@document.save_graphics_state do
  @document.send(:freeze_stamp_graphics)
  @block.call
end

@practicingruby
Copy link
Member

@dmarkow For the most part your understanding of what's going on reflects my basic understanding of the code as well. The problem with using whatever the current page's graphic state is would be that you wouldn't have consistent behavior for your stamp across pages.

For example, if you have a page that sets the fill color to red but does not set it back to black before crossing a page boundary, do you really want your repeated text to be in red on that page, and only on that page? I can't imagine that's what people would want.

If we wanted to simplify the semantics of the repeaters, the decent way may indeed be to call save_graphics_state with no parameter, but before running block.call, execute a sort of "reset" which would put all the graphics states back to their default, as if you were starting with a fresh document. This would mean that the repeaters would not inherit the graphic state of the page they were defined on, and would essentially have their own graphics state that would be consistent across all the pages.

Once we are able to implement a clear state, we'd also be able to create a "restore the graphics state from the page the repeater was defined on", so it's just a matter of what behavior feels more natural.

But I definitely think we should avoid having a variable graphics state from page to page... content generated by a repeater is meant to look the same on every page, or at least start from the same base state on every page. That's what omitting the argument to save_graphic_state would do unless we also wrote some "reset" code to be run before the block was called on each page.

@dmarkow
Copy link
Contributor Author

dmarkow commented Jan 14, 2014

We do have this spec, which says that we should be using the graphic state in place when we created the repeater.

    context "dynamic repeaters" do
      it "should preserve the graphic state at creation time" do

Looking at some other specs, I came across Document with a stamp should save the color space even when same as current page color space. So it seems that stamps had the same issue with using the stamp on a page with a different color_space, and a solution was added to force the color space to be set if it was being set from a stamp:

def set_color_space(type, color_space)
  # don't set the same color space again
  return if current_color_space(type) == color_space && !state.page.in_stamp_stream?

I think this is why non-dynamic repeaters don't exhibit the same issue: non-dynamic repeaters are just stamps, whereas dynamic repeaters are a little more complex.

@dmarkow
Copy link
Contributor Author

dmarkow commented Jan 14, 2014

Now, the expected behavior may be "the repeater should use the exact state in place when it was defined", but it actually doesn't appear to be doing that. I'm defining my table after defining my repeater, but it's being affected by the stroke color_space set by the table anyway.

I looked at the Repeater class's initializer and noticed that it dup's the current graphic_state, but it does not dup the color_space. Ruby just points the new @graphic_state variable's color_space to the same hash as the source, so if we change the color_space after defining the repeater, it will actually affect the "frozen" repeater. This is why my defining a table added the stroke color_space to my "frozen" repeater.

By utilizing GraphicState.new, which properly dups the hashes, my problem is fixed:

class Repeater #:nodoc:
  ...
  def initialize(document, page_filter, dynamic = false, &block)
    ...
    @graphic_state = PDF::Core::GraphicState.new(document.state.page.graphic_state)

I don't know how I feel about that method call, but it works (a better way might be to add a copy method to GraphicState, or override initialize_copy to properly dup the hashes, though that would only be compatible with ruby 1.9 and newer).

That being said, if made a table and then defined my repeater, I'd still have the same issue all over again, even with the added line above.


I agree that repeaters should not use the current page's state. However, I also don't think they should inherit anything when they are defined. Rather, they should be completely separate entities, starting with a perfectly clear state (e.g. black fill/stroke, empty color space, etc.).

I think inheriting graphic state attributes that were defined before the repeater is misleading. And by inheriting the current graphic state, moving the repeater code to different parts of the generate block can have a huge effect on the appearance of the repeater, whereas I would expect it to look the same regardless of where I defined it.

Right now, the only way to get the repeater to behave completely independent of the rest of the document's graphic state is to define the repeater before you do anything else (at least once the dup bug I mentioned above is fixed).

@practicingruby
Copy link
Member

Thanks a lot for the further research and pull request, I'll take a closer look at this in the next couple days.

@dmarkow
Copy link
Contributor Author

dmarkow commented Jan 14, 2014

Thanks a ton for all your help. No rush, my branch is fixing the problem for me for the time being.

@practicingruby
Copy link
Member

Note to self for future investigation: It looks like if we push newly initialized graphics state object rather than the graphics state of whatever page the repeater is defined on, that would give us a blank slate to work with. Unsure whether we cache those values elsewhere in document though.

@practicingruby practicingruby changed the title Dynamic repeater broken when first page has a table Dynamic repeater can break colorspace Jul 18, 2014
@dmarkow
Copy link
Contributor Author

dmarkow commented Oct 6, 2014

It's been a while since I looked at this, but I'm pretty sure it was resolved by #635, so I'm gonna close it. Thanks!

@dmarkow dmarkow closed this as completed Oct 6, 2014
@practicingruby
Copy link
Member

Thanks, please re-open if you run into the issue again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants