Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attempt to fix broken PDF links #1909

Merged
merged 2 commits into from
Aug 26, 2022
Merged

Conversation

matthias314
Copy link
Contributor

@matthias314 matthias314 commented Aug 20, 2022

This is supposed to address several issues regarding the PDF version of the Julia documentation, namely JuliaLang/julia#38054 and probably also the second comment in JuliaLang/julia#43652. This PR certainly needs improvements, but hopefully it gets the ball rolling. I find it annoying that the Julia documentation has so many broken links, and with this PR they all seem to disappear.

As far as I can tell, the main problem is that the field anchor.nth is used when creating a hypertarget in

function latex(io::IO, anchor::Anchors.Anchor, page, doc)
id = string(hash(string(anchor.id, "-", anchor.nth)))
_println(io, "\n\\hypertarget{", id, "}{}\n")
latex(io, anchor.object, page, doc)
end
but it is not used by hyperlinks. I don't know if this problem only happens for anchor.nth == 1 or in general. In the case of the Julia documentation, the value seems to be always 1, so I have omitted this field completely. I have noticed that in HTMLWriter.jl the case anchor.nth == 1 receives special treatment.

A secondary problem is that the hypertarget is inserted before the text it refers to. Many targets are chapter or section headings, so there is a good chance that there will be a page break between the target and the text. On the other hand, if the target is moved behind the text, then the text is not visible when users follow the link. I've addressed this by using the LaTeX macro \label to create the target. Such labels then need to be referenced by \hyperref instead of \hyperlink. Since I don't know how to tell from within Julia which kind of target a link refers to, I have added a macro \hyperlinkref to documenter.sty which inserts either \hyperref or \hyperlink, depending on whether a corresponding label exists or not.

Other changes: In

function latex(io::IO, node::Documents.DocsNode, page, doc)
id = string(hash(string(node.anchor.id)))
# Docstring header based on the name of the binding and it's category.
_println(io, "\\hypertarget{", id, "}{} ")
_print(io, "\\hyperlink{", id, "}{\\texttt{")
a link is added to a target at the same spot. I have removed the link because it's unclear to me what the purpose of a link is that links to itself. (Theoretically, there could again be a page break between the target and the link.)

For the Julia documentation, the methods latex(io::IO, index::Documents.IndexNode, page, doc) and latex(io::IO, contents::Documents.ContentsNode, page, doc) are apparently never called, so changes in this code haven't been tested. In the latter function I have kept the anchor.nth when generating the link id, and I've moved the assignment id = to where it belongs.

I've also noticed that you use hashes of the anchor ids to get the target labels. Would it be a problem to simply use the ids themselves? For example. can they contain any characters that cause problems?

@mortenpi mortenpi added Format: LaTeX Related to the LaTeX / PDF output Type: Bugfix labels Aug 21, 2022
@mortenpi mortenpi added this to the 0.27.23 milestone Aug 21, 2022
@mortenpi
Copy link
Member

Thank you for the PR! I'm planning a backport release with some TeX fixes, and this fits in there perfectly.

I don't know if this problem only happens for anchor.nth == 1 or in general. In the case of the Julia documentation, the value seems to be always 1, so I have omitted this field completely. I have noticed that in HTMLWriter.jl the case anchor.nth == 1 receives special treatment.

anchor.nth is important when two links happen to have the same .id (e.g. the same section heading in multiple places). The special casing in HTML is just because usually you don't need to disambiguate, and so we don't want to have a trailing -1 in all URLs (which are semi-visible to the readers).

So I do think we need to consistently use the .nth field when generating the labels. However, since we hash them into random strings anyway, there is no need to special case the 1.

Would it be a problem to simply use the ids themselves? For example. can they contain any characters that cause problems?

It's a design decision that predates me, but yes, I'm quite confident it's there to make sure we don't accidentally break the TeX by due to some weird characters. The docstrings .ids in particular can be problematic I think.

I have removed the link because it's unclear to me what the purpose of a link is that links to itself.

I presume it was there to be consistent with the original Markdown writer. But I agree that it doesn't serve any purpose in the PDF.

I've addressed this by using the LaTeX macro \label to create the target.

I am fine with the macro as well, but I wonder if we could use \label for both? Although I guess you can only attach \label to certain special LaTeX commands?

@mortenpi
Copy link
Member

Also, while we're doing this, could you factor out the id generation into a separate function? The logic is currently duplicated in multiple places. I think it would require two methods (or even functions): one for Anchors.Anchor and the other one for a generic object (i.e. for docstring links).

@matthias314
Copy link
Contributor Author

anchor.nth is important when two links happen to have the same .id

As I said, my understanding is limited. But are two identical anchor ids allowed? How can you distinguish between the various targets if you refer to an id that occurs more than once?

I wonder if we could use \label for both

\label in LaTeX can only refer to special locations in the document, for example \chapter, \section and so on as well as \item. The question is whether all Markdown anchors can be covered this way.

could you factor out the id generation into a separate function?

For anchors there is already

function fragment(a::Anchor)
frag = string("#", a.id)
if a.nth > 1
frag = string(frag, "-", a.nth)
end
# TODO: Sanitize the fragment
return frag
end
One could make the # a second argument that would be the empty string for LaTeX. (Or one keeps it as it is invisible the user anyway.)

@mortenpi
Copy link
Member

But are two identical anchor ids allowed? How can you distinguish between the various targets if you refer to an id that occurs more than once?

You mean in at-ref links? You can't, and if you try, it leads to an error. If that is necessary, the users can use the at-id feature to give a heading a unique label.

However, this is not a problem as most anchors do not explicitly get referred to. But we still want to generate the anchors (e.g. in HTML) so that users could copy-paste the URLs.

\label in LaTeX can only refer to special locations in the document, for example \chapter, \section and so on as well as \item. The question is whether all Markdown anchors can be covered this way.

Yep, that's fair. Let's stick to the macro.

For anchors there is already

Yeah, but LaTeXWriter does additional processing that is specific to it, so I think there should be a function in LaTeXWriter that does the hashing etc. I don't think it's a good idea to go down the rabbit hole of trying to generalize the Anchors module in this PR.

@matthias314
Copy link
Contributor Author

OK, I think that now the treatment of anchor.nth is correct. My impression is that the problem arose because the labels were generated are several places in the source code (and have to match!), and at one point (when the nth was omitted if it equals 1) the change was not done everywhere in the code. I therefore thought that the cleanest approach would be to introduce a new function Anchors.label to generate the labels for all anchor and links; also Anchors.fragments uses it now. For uniformity, I've also created a function _hash in LaTeXWriters.jl that is called whenever a hash is computed in that file. This new PR still fixes the broken links in the Julia documentation.

Let me know what you think. If you think it's ready for merging, then I can first make a rebase to get a single clean commit.

Copy link
Member

@mortenpi mortenpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent! I like the splitting of the logic into these separate functions.

Could you also add a note into CHANGELOG.md?

src/Anchors.jl Outdated Show resolved Hide resolved
assets/latex/documenter.sty Outdated Show resolved Hide resolved
@matthias314 matthias314 force-pushed the m3/pdf-links branch 2 times, most recently from cc25856 to f62a004 Compare August 24, 2022 13:34
Co-authored-by: Morten Piibeleht <[email protected]>
@matthias314
Copy link
Contributor Author

I've updated the changelog and rebased the branch. (It took me a few attempts to figure out how links work in the changelog file.) Should be ready now for merging.

@mortenpi mortenpi merged commit f72552a into JuliaDocs:master Aug 26, 2022
@mortenpi
Copy link
Member

Thanks @matthias314!

mortenpi pushed a commit that referenced this pull request Aug 26, 2022
(cherry picked from commit f72552a)
@matthias314 matthias314 deleted the m3/pdf-links branch August 26, 2022 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Format: LaTeX Related to the LaTeX / PDF output Type: Bugfix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants