Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.NET 5 - Printing - XPS requires STA Threads #4000

Open
wstaelens opened this issue Dec 17, 2020 · 21 comments
Open

.NET 5 - Printing - XPS requires STA Threads #4000

wstaelens opened this issue Dec 17, 2020 · 21 comments

Comments

@wstaelens
Copy link
Contributor

Issue Title

Printing in .NET 5 requires XPS but this requires STA Threads.
Why is this needed?

Is it possible to render or process XPS in separate threads instead of STA?

This is a serious limitation, XPS printing is slow as heck.

@wstaelens
Copy link
Contributor Author

any updates on multithreading and printing XPS?

@scalablecory
Copy link

In Framework you were able to create your own STA threads and do multi-threaded XPS rendering. I don't know if this is still the case in .NET 5.

@scalablecory scalablecory transferred this issue from dotnet/core Jan 13, 2021
@SamBent
Copy link
Contributor

SamBent commented Jan 15, 2021

Printing is STA because it uses resources that are STA (packages, print queue, etc). If you're printing visual content, you obviously need to be on the (STA) UI thread that allows access to the content.

.NET 5 supports multiple STA threads, exactly the same as .NET Fx. Use this to get "multi-threaded printing", in the sense of several independent print tasks each running on its own thread. If you're asking about several threads cooperating on a single print task, that's probably never going to happen - printing is inherently sequential.

@wstaelens
Copy link
Contributor Author

accepting a print job from a V4 driver plugin really has some issues.

First of all, see "XPS documents in .NET Core can't be opened" ( #3546 )

Next one of the biggest issues is the performance of XPS.

Having for example a document of 3000 pages being printed to a V4 driver. Because of the very annoying STA requirement, it takes ages to render the pages sequentally. We can't render the pages in parallel (if possible in C#, feel free to explain how), in other words other code and logic that works on individual pages is unable to go parallel and is slow because it all has to go sequentially. Eventually we go out of memory as we can't hold all the rendered pages for some actions we are doing.

The performance issues can easily be reproduced with Microsoft's own XPS Viewer and Microsoft XPS Document Writer (printer). When opening the original pdf (3MB) and we print it to the Microsoft XPS Document Writer printer as an .xps, it takes ages to print. Once it has been printed we have an .xps file grown to 50MB. Opening the xps in Microsoft XPS Viewer and searching a word (which exists e.g. on page 2668) literally takes ages as it processes sequentally through the document. Sumatra finds the word in about 50 seconds, XPS Viewer does it in ±6 minutes. (to compare: foxit reader on the original pdf does it in 25 seconds).

I can't share this big file (confidential) but just take some pdf files, ebooks in pdf, with a lot of pages and print them. (or print and capture the XPS print jobs with a render filter to catch the xps on the microsoft generic V4 driver.)

Can these XPS printing issues please be tackled or prioritized?

.NET SDK 5.0.202
.NET runtime 5.0.5
Windows 10 20H2 (19042.928)
Windows Server 2019 1809 (17763.1879)

@miloush
Copy link
Contributor

miloush commented May 5, 2021

As it has been mentioned, you can create multiple STA threads, could you not use it to render pages or sections in parallel and then combine them into one document?

@wstaelens
Copy link
Contributor Author

@miloush i don't see how this can be faster unless you have 1000+ pages and even then, opening it several times and combining it again will drop your performance gain...

@wstaelens
Copy link
Contributor Author

@miloush also XPS is STA. An XPS Document can NEVER be attached to multiple threads. Processing and manipulation speed of large documents because of STA is really 🐌

@miloush
Copy link
Contributor

miloush commented May 23, 2021

@wstaelens your example had 3000 pages. It might help if rendering the elements is what takes the most time.

You asked for multithreading, we said you can have multiple STA threads. If printing itself is the bottleneck, then that's a different issue - and if you can reproduce it by printing PDF to a XPS printer I am not entirely sure this is within the scope of WPF.

What is the nature of the document? Is it UI elements, plain text, or a photo per page? A repro would be helpful only if it was random content.

@wstaelens
Copy link
Contributor Author

@miloush see: #6301

The situation:

PDF files dumped in hotfolder 
--> opened by our code and converted to XPS 
--> modifications to XPS file (e.g. adding barcode) 
--> converting back to PDF the single (or merged) modified XPS documents.
--> dumping the resulting PDF files to a folder

Thanks to STA it is slow, consumes a lot of memory and we are not even printing in this scenario...

(I keep on wondering if the XPS format itself is actually being used internally at Microsoft.. No wonder this couldn't replace PDF... Even XPS documents generated by the MS print driver can't be opened. Guess we'll switch back to GDI and metafiles 😞 )

@wstaelens
Copy link
Contributor Author

any updates?

@wstaelens
Copy link
Contributor Author

Update:

Jan 04, 2023:
"Your suggestion has been queued up for prioritization. Feature suggestions are prioritized based on the value to our broader developer community and the product roadmap. We may not be able to pursue this one immediately, but we will continue to monitor it for community input"

Please add votes on: https://developercommunity.visualstudio.com/t/Improve-XPS-printing-STA/690912

@edwardneal
Copy link

We've been looking at opening XPS documents in .NET Core over in dotnet/runtime#51929 and this point came up. For the sake of full disclosure: I've got next to no experience of actually using WPF. It seems like a pretty concrete problem which fits what I know about Windows Forms, but I'd appreciate any correction or extra insight from someone with a deeper knowledge of the area.

I think the requirement for STA threads is intrinsic to part of the design of XpsDocument, unfortunately. XPS effectively wraps a subset of XAML, and GetFixedDocumentSequence calls XamlReader.Load to generate a FixedDocumentSequence object. The moment it does that, it's creating a WPF UI element, which directly generates the issue:

  1. WPF UI elements require STA threads
  2. Each UI element can only be modified by one thread at a time

This seems to correlate with Windows Forms' behaviour, but there might well be WPF-specific wrinkles to this.

The FixedDocumentSequenceReader property and AddFixedDocumentSequence method might be safe to call from an MTA thread - at a quick glance, these seem to just be manipulating XML. These aren't terribly helpful on their own, but if you know the structure of the documents you receive then you might be able to get the underlying XpsResource, call GetStream and start modifying XML. It's very ugly though. Another alternative might be to use the Package APIs (which I don't think have the STA requirement) directly.

What's missing is something similar to the dotnet/Open-XML-SDK repository: a type-safe way to modify parts of the package. I'm afraid I've not got an answer for that - anyone who picked that up would be reinventing parts of XAML parsing, and this might extend to parts of its layout engine (I'm not sure whether or not text in an XPS document would re-flow around an image, for example. If that happens, there are knock-on effects on inter-page text overflowing.)

I think there's probably no easy answer to the issue, but hopefully this helps to at least identify the underlying problem. Avoiding STA threads usually needs code execution to avoid using large parts of WPF's UI elements, and it looks like large parts of the XpsDocument API are based upon them.

@wstaelens
Copy link
Contributor Author

@edwardneal yes we know that UI elements require STA threads and this is the showstopper and has been for years.
Working directly on the stream and manipulating the text itself is almost unworkable and not maintainable.

Just wishing somebody would tackle this once and for all. Yes this would have a big impact. But it could allow to manipulate/render/... XPS files similar as you work with PDF files. Without having the "ugly" and "dirty" pdf file format (seen so much garbage around). We could speed up things so much if we could have MTA for XPS.

As suggested by @miloush when running multiple STA threads, you always have a bottleneck when you need to merge/recombine the files into a single xps document. So this is also a no-go, it just moves the problem.

Everybody just thinks about "physical printers" and ignores the problem but think about virtual printers, xps viewers, internal reporting, print job manipulation/conversion (without physical output) etc...

@edwardneal
Copy link

@wstaelens I agree - the idea behind XPS is good. An open, easily-parsable, declarative way to lay out a document is a great idea, and I've seen my fair share of "interesting" PDF documents to know that they're not a good solution. I'm not a fan of the way that idea was implemented though - when the design of a system-level function like printing is tightly coupled to a subset of a UI framework, this sort of implementation issue became almost inevitable. More specifically to XpsDocument, I'm surprised that there's any way to directly obtain a FixedDocument instance: it feels quite risky to instantiate a XAML control hierarchy directly from a user-provided print job.

Given time, my personal preference would be to see something along these lines:

Initial cleanup, infrastructure and preparatory work

  • Lift the core Packaging functionality (signatures, thumbnails) out of XpsDocument
  • Lift the functionality of XpsDocument out of the core WPF repository to "somewhere else". Implement something capable of constructing a FixedDocumentSequence from an IXpsFixedDocumentSequenceReader and converting a FixedDocumentSequence via another interface (e.g. IFixedDocumentSequenceWriter) but make a clear point that a XPS is a markup language which encompasses a subset of XAML - not "WPF on the printed page"
  • Ideally, turn System.Xaml into an independent package (so that it can be referenced from non-Windows environments, and to eliminate a possible dependency. This was originally requested under issue Publish System.Xaml as a separate nuget package. #46.)

Some of this is optional, and a lot could be done piecemeal. It'd strip the Packaging functionality out of XpsDocument to force it to a single responsibility, and make it clear that XPS is a subset of XAML, not "WPF on the printed page". It'd also create a much smaller interface between WPF and XPS: IFixedDocumentSequenceReader/Writer.

Shift object model and processing methodology from WPF to XAML

  • Take the lifted XpsDocument and remove all direct references to WPF controls/visuals
  • Take advantage of the constraints in ECMA-388 to whitelist XAML elements as they're loaded
  • Shift away from XamlReader.Load, so we're not directly constructing an object graph of these visuals. Instead, consider using XamlXmlReader to process each FixedDocument in the FDS
  • The movement away from instantiating WPF components means that reading becomes thread-safe
  • Take advantage of the fact that content in XPS documents doesn't need to re-flow between pages, permitting multiple writers to the document (maybe restricting to one writer per page)

Some of this is just good practice (whitelisting specific elements in order to protect against malicious XPS payloads.) I don't know whether it'd be better to expose a simple set of nested datatype/KVP structures within each pair, or a strongly-typed object model. At some stage, this'd need to be rendered on a Print Preview page!

Key points are that this effectively means that reading and writing an individual page permits random access (they're separate parts in the underlying ZipPackage.) I'm not certain what reading and writing elements on a page should look like - XamlXmlReader is forward-only reading, and if text can reflow within a page then either an element within a page should be written with its dependency tree or a page should be written as an atomic unit.

Complete WPF decoupling

  • Either deprecate or redirect bindings to XpsDocument and all supporting classes aside from IXpsFixedDocumentSequenceReader (which might be renamed to eliminate the reference to XPS - I quite like the idea that a FixedDocumentSequence could come from any number of sources.)

Result

We'd definitely see architectural (and performance, and possibly security) improvements to XPS processing. The largest breaking change would be that we no longer provide clients with a fully-constructed object graph; the other pieces of work are mostly knock-on effects from that (e.g. no longer using XamlReader.Load means that the core read/write interface has been changed, which means that we need a new way to construct FixedDocumentSequences from it, and to make changes to it.)

The changes which I think would need API review and design approval are:

  • Lift-and-shift of signatures and thumbnails to Packaging (to deprecate on this end, and to add in dotnet/runtime.)
  • Moving XpsDocument away from "core WPF"
  • Rewriting read/write interface to XpsDocument
  • Interface rework needed to generate FixedDocumentSequences from XpsDocument instances
  • Eventual deprecation of the core WPF XpsDocument

This'd take time and sustained attention though. I've not got an objection to contributing here, but it really needs some level of confirmation with the WPF team that this is the right approach. XPS doesn't get a great deal of attention, and it's hard to tell whether it's because XpsDocument covers 80% of the common use case of a mature technology, or because this part of WPF is de-facto deprecated and is no longer accepting breaking changes.

@wstaelens
Copy link
Contributor Author

@edwardneal is this the next one to tackle? ;-)

@edwardneal
Copy link

There are some performance improvements in progress in the runtime repo, but those improvements are on ZipArchive, rather than on XpsDocument directly. Once these are in place, I expect there'll be performance improvements when adding new pages to documents, and potentially when loading the underlying ZipArchive from disk.

I'm currently working upwards from DeflateStream through to ZipArchive, Packaging, then XPSDocument, so it might be a while before XPS-specific improvements to to appear. The underlying library changes should move the needle a bit though.

@wstaelens
Copy link
Contributor Author

@edwardneal the biggest problem of XPS in general is the ApartmentState.STA requirement.

The biggest pain:

  • multiple print jobs cannot be rendered in different threads when coming or being processed by the same user.
  • different pages cannot be rendered in multiple threads separately.
  • if on the background thread an XPS job is processing (rendering) something and meanwhile you request another preview. The preview cannot rendered because of STA. The preview has to wait till that earlier processing is done. (it can only be done if you reopen the job in a different thread of the client but then the memory explodes too fast).

Improvements in performance is great and every thing you gain in speed/less allocations/.. is a big win, but maybe marginal compared to the STA blocking as everything has to wait for sequential processing in the end.

Other formats like PDF, multi-page tiff, multi-page svg,... don't have this issue making them fast to process and manipulate.

@edwardneal
Copy link

I agree; this'll need a bit of work though. For approaches where the output isn't directly shown on a GUI, there's no need for the XpsDocument to construct and retain a fully-fledged WPF control as it builds the XAML. I can read ECMA-388 and write an API request, but I'd like to know if this part of WPF is accepting large API changes first. Could someone from the WPF team confirm this either way?

Alternatively XPS support could be lifted into a dedicated repo, potentially ending up similar to the dotnet/Open-XML-SDK repo. I'd personally be a little happier with the latter, but can support the former.

@wstaelens
Copy link
Contributor Author

anyone from the WPF team who can confirm @edwardneal ?

@wstaelens
Copy link
Contributor Author

@dotnet/wpf-team ? @edwardneal ? @dipeshmsft ?

@edwardneal
Copy link

Sorry wstaelens, I don't have an update - this needs guidance/direction from the WPF team (and if XPS support is to be lifted into its own package, probably another group within MS) first. My offer to help with the implementation still stands though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants