Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSRF When Generating PDFs from User-Controlled HTML #10682

Closed
RealestName opened this issue Mar 12, 2025 · 10 comments
Closed

SSRF When Generating PDFs from User-Controlled HTML #10682

RealestName opened this issue Mar 12, 2025 · 10 comments
Labels

Comments

@RealestName
Copy link

Hello,

During a recent engagement, I came across an "Export to PDF" function which takes user controlled HTML and passes it to the pandoc PDF generator. While testing, I discovered that the PDF generation library is vulnerable to Server-Side Request Forgery (SSRF) when rendering PDFs from HTML containing <iframe> elements. An attacker can exploit this by embedding an iframe with a URL pointing to internal resources, potentially exposing sensitive data or interacting with internal services.

@tarleb
Copy link
Collaborator

tarleb commented Mar 12, 2025

I don't see that as a security issue with pandoc, but we could consider adding an item for that to the "Security" section in the manual. Would you want to submit a PR?

@RealestName RealestName changed the title SSRF When SSRF When Generating PDFs from User-Controlled HTML Mar 12, 2025
@tarleb
Copy link
Collaborator

tarleb commented Mar 12, 2025

Also note that you can prevent pandoc from resolving iframe elements by enabling the raw_html option.

@RealestName
Copy link
Author

RealestName commented Mar 12, 2025

While testing, I noticed that JavaScript execution was blocked, meaning some security measures were implemented. This scenario is similar to CVE-2022-35583 (Which is set to a 9.8 in severity, and even the POC uses the same payload). I believe that the best security solution would be to set the raw_html option as enabled by default and make it the user's responsibility.

@RealestName
Copy link
Author

I don't see that as a security issue with pandoc, but we could consider adding an item for that to the "Security" section in the manual. Would you want to submit a PR?

I think it would be best to add an item to the security section, if possible.

@jgm
Copy link
Owner

jgm commented Mar 12, 2025

Could you describe the problem in more detail? You have an HTML file, with an iframe whose src attribute points to something internal. If you view this in a browser, the internal thing will show up in the iframe, now? And so if you ask pandoc to produce a PDF, the PDF will also include the internal thing?

So, what is the risk you envision? If you use pandoc to convert an HTML file containing private information, the private information will end up in the resulting file. But that is true whether or not the information is included via an iframe.

@tarleb
Copy link
Collaborator

tarleb commented Mar 12, 2025

I think the issue opens up when offering HTML-to-PDF-conversions as a service: A server in an internal company network might leak information that's only available from within the network.

@RealestName
Copy link
Author

Could you describe the problem in more detail? You have an HTML file, with an iframe whose src attribute points to something internal. If you view this in a browser, the internal thing will show up in the iframe, now? And so if you ask pandoc to produce a PDF, the PDF will also include the internal thing?

So, what is the risk you envision? If you use pandoc to convert an HTML file containing private information, the private information will end up in the resulting file. But that is true whether or not the information is included via an iframe.

The key risk is that an attacker can exploit this behavior to force the server to make requests to internal resources, this is known as Server-Side Request Forgery (SSRF).

Let's consider the following attack scenario:

Suppose an attacker submits or uploads an HTML file with an <iframe> like this:

<iframe src="http://localhost/admin" />

When the server processes this HTML to generate a PDF, the server itself will attempt to fetch the content at http://localhost/admin (or any internal endpoint).

Impact:

This request could reach internal services that are not intended to be publicly accessible.
The attacker may gain access to sensitive data, internal admin panels, cloud metadata endpoints, or other restricted resources.
Even if the content isn’t directly rendered in the PDF, timing-based attacks, error messages, or other indirect data leakage vectors could still reveal information.

Why This Is Dangerous:

Unlike a browser where same-origin policies apply, pandoc runs on the server without restriction, making this a powerful SSRF vector. The risk is not just leaking private information in an iframe, it's the ability of an attacker to trigger unintended requests to internal services.

@jgm
Copy link
Owner

jgm commented Mar 12, 2025

As noted in the Security section, our recommendation is to use --sandox when running on a server, or even better to use pandoc-server. Both will block this kind of operation.

Note that item 2 of security already mentions this threat for HTML.

@jgm
Copy link
Owner

jgm commented Mar 12, 2025

OK, I see that the threat is somewhat different from normal include files, because it could be an HTTP request. Perhaps splitting item 2 into two parts, one for standard include and one for HTML iframe, would make sense.

@jgm jgm closed this as completed in 67edf7c Mar 12, 2025
@jgm
Copy link
Owner

jgm commented Mar 12, 2025

Let me know if anything in my commit should be changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants