Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor bad code, build a testable specification and write a compiler guide #478

Open
ringabout opened this issue Aug 24, 2022 · 18 comments
Assignees

Comments

@ringabout
Copy link
Member

ringabout commented Aug 24, 2022

The Neon Genesis Evolution Plan

北海虽赊,扶摇可接;东隅已逝,桑榆非晚。-- 滕王阁序
It is better late than never.

Introduction

Lacking a compiler documentation and a language specification with elaborate testable examples are the long-standing obstacles which hinder the Nim language from gaining the popularity it deserves. A compiler documentation gives developers the overview of the compiler internals and helps them step into the development of the Nim compiler. A language specification clarifies how the language works. Armed with detailed testable examples, it can prevent regressions from happening and act as a delicate guide to developers proving that the feature works and the language works.

The RFC is intended to be progressive and grows gradually. The task should start from commenting the compiler code. A testable specification can roll out at the same time. Finally the comments and the specification should help form a guide summarizing how the Nim compiler works.

The tasks are divided into three steps: comment the compiler code, build a testable specification and write a compiler guide.

Comment the compiler code

Compiler code without comments tends to be hard to understand. Comments help developers gain the insights of why the code is written like this and how it probably works. Comments could be obsolete and meaningless as time goes by. However, It is worthwhile updating the comments periodically since the code and comments will be read thousands of times.

The Nim CI should have a measure to calculate the comment ratio of the code. Ideally it should measure the comment ratio per pull request. The comment ratio is defined as comments lines / total lines (excluding empty lines).

  • add the comment ratio metric to the Nim CI

The contribution of the compiler documentation should be in the form of a pull request. The pull request should comment a certain module or functionality in the Nim compiler.

Build a testable specification

A specification should describe what should work or fail, with abundant testable examples. It should cover as many cases as possible.

As a convention, a describe block can be used. The implementation starts from

template describe*(x: string, body: typed) =
  block:
    body

It may be extended for better documentation generation in the future. The testable specification for the Nim manual should be put under the tests/spec/manual directory. A simple example is given below:

# manual_bool_type.nim
import stdtest/specutils

describe "The internal integer value of a Boolean type":
  describe "0 for false":
    doAssert ord(false) == 0
  describe "1 for true":
    doAssert ord(true) == 1

describe "A Boolean type is an Ordinal type":
  doAssert bool is Ordinal
  doAssert true is Ordinal
  doAssert false is Ordinal

describe "The size of a Boolean type is one byte":
  doAssert sizeof(bool) == 1
  doAssert sizeof(true) == 1
  doAssert sizeof(false) == 1

These examples should generate a pretty HTML documentation. They can be linked to the Nim manual or included into the Nim manual in the foldable form (defaults to folded).

Write a compiler guide

A compiler guide helps developers learn the internals of the Nim compiler without wasting time in reading the compiler code. What is a magic function? How the Nim method is implemented? Is it efficient to use & to concatenate multiple strings? These are questions that a compile guide should attempt to answer. The guide should focus on the explanation of the specific terms and the gist of the implementation.

For instance:

`$` function is a magic proc used to concatenate strings and chars. The Nim compiler does some optimizations to make it perform as good as the in-place version.

There are four overloads for `$` function in the system module.

```nim
proc `&`*(x: string, y: char): string {.magic: "ConStrStr", noSideEffect.}
proc `&`*(x, y: char): string {.magic: "ConStrStr", noSideEffect.}
proc `&`*(x, y: string): string {.magic: "ConStrStr", noSideEffect.}
proc `&`*(x: char, y: string): string {.magic: "ConStrStr", noSideEffect.}
```

All of them have the same magic `mConStrStr`, which is needed in the optimization phases. 

```nim
s = "Hello " & name & ", how do you feel" & 'z'
```

Here is the ast of the right-side expression:

```nim
StmtList
  Infix
    Ident "&"
    Infix
      Ident "&"
      Infix
        Ident "&"
        StrLit "Hello "
        Ident "name"
      StrLit ", how do you feel"
    CharLit 122
```
@Varriount
Copy link

Varriount commented Aug 24, 2022

I wholeheartedly agree with this. And in fact (if someone tells me how), I'm willing to put up a bounty of up to $1,000 USD, depending on the amount of work done.

@ringabout ringabout self-assigned this Aug 25, 2022
@PhilippMDoerner
Copy link

PhilippMDoerner commented Aug 25, 2022

Compiler code without comments tends to be hard to understand. Comments help developers gain the insights of why the code is written like this and how it probably works. Comments could be obsolete and meaningless as time goes by. However, It is worthwhile updating the comments periodically since the code and comments will be read thousands of times.

Just as a forewarning since this has been a massive sticking point for folks at my workplace:
Is the intent to go beyond just doc comments into comments inside of macros/procs/templates?
The experience I've made is that comments within functions get outdated first as somebody updates something that doesn't change the outcome of a function but how it achieves it. doc comments have a somewhat better rate of staying current and correct. That's why I tend to steer away from comments that aren't just doc comments.

I haven't looked at the compiler, but is it an option to have the comments bit only be about doc comments?

@ringabout
Copy link
Member Author

Doc comments still need to explain every quirk in the procs because the Nim compiler is too complex. There are some procs has hundreds or even thousands of lines. It is painful to go through all the code without comments.

Nim does generate documentation for the compiler procs => https://nim-lang.org/docs/compiler/astalgo.html

@Vindaar
Copy link

Vindaar commented Aug 25, 2022

I wholeheartedly agree with this. And in fact (if someone tells me how), I'm willing to put up a bounty of up to $1,000 USD, depending on the amount of work done.

I'd pitch in as well (with a smaller amount I can afford).

@PhilippMDoerner
Copy link

PhilippMDoerner commented Aug 25, 2022

I wholeheartedly agree with this. And in fact (if someone tells me how), I'm willing to put up a bounty of up to $1,000 USD, depending on the amount of work done.

I'd pitch in as well (with a smaller amount I can afford).

I actually missed this one.
Given that I'm already throwing money into the ecosystem, might as well pitch into this bit as well.
Which...err... leads me to the question if we have a mechanism to deal with such an idea of a "community sponsored bounty" or if we best handle this all individually.

@ringabout
Copy link
Member Author

if we have a mechanism to deal with such an idea of a "community sponsored bounty" or if we best handle this all individually.

There is. See the bounty for Interfaces proposal => #39 (comment)

@mratsim
Copy link
Collaborator

mratsim commented Aug 25, 2022

In terms of documentation one of the most scary parts of the compiler is anything that does semantic check sem*.nim and also sigmatch. I dread going there.

I also missed the part about the closure iterator transformation. It's an important area for async, laden with bugs due to the complex interactions with defer/try+except/blocks ...

@haxscramper
Copy link

haxscramper commented Aug 25, 2022

The Nim CI should have a measure to calculate the comment ratio of the code.

Current breakdown of the compiler documentation (single-hash comments are not considered documentation) comment ratio

Nim

@ringabout
Copy link
Member Author

ringabout commented Aug 25, 2022

I'm going to push the setup to set an example for later updates when this plan is accepted.

@haxscramper
Copy link

haxscramper commented Aug 26, 2022

Another obvious statistics while we are at it - this heatmap shows correlation of edits between different parts of the project code. The left side shows total number of edits of the file (at the time of analysis). Each cell in the heatmap represents the percentage of times the bottom file was edited together with one on the left.

Left ordered by total number of edits

image

Or by total number of co-edits with different files

image

Compiler description in general should also include guidelines about relationship between different parts of the compiler - no just who calls who in what order, but changes that are introduced in the semantic layer, how types defined in one part are used in another, how to improve certain parts. People come to the project with questions, and it is important to anticipate common ones, such as "how do I improve an error message X", "there is an RFC Z, I really like it and want to help implement", where do I start", "I changed something in place A and now things break for test B - what might be wrong" and so on and so forth.

@Varriount
Copy link

@haxscramper Just curious - how did you produce those heat maps and charts?

@haxscramper
Copy link

haxscramper commented Aug 29, 2022

I'm writing a tool for this kind of analysis https://github.com/haxscramper/code_forensics - for now it is a proof of concept, but I'm working on more stats such as an average commit message length image

Files edited together 100 times or more

image

150 times or more

image

75 times or more

image

Someone might say it is obvious, but for someone who just came to the project, this kind of structural description can be very valuable.

@Araq
Copy link
Member

Araq commented Sep 2, 2022

I prefer massive refactorings over massive documentation efforts. I know that we can do both, but if I have to choose, I take "refactorings".

  • Rewrite the codegen to be based on a stream of tokens, not based on Ropes. (Memory consumption, performance.)
  • Change how the TLineInfo is stored so that we can save both space and support big files at the same time.
  • Investigate which features are unreasonably costly to maintain for the benefits they bring and phase them out for whatever Nim version that is allowed to break things.

But the biggest elephant in the room is probably "phase ordering issues". We need a set of refactorings so that the transformation pipeline is simple and correct. Then "who calls whom" might also be clean enough so that it doesn't need that much documentation.

@ringabout
Copy link
Member Author

ringabout commented Sep 2, 2022

I agree that commenting is not appropriate. It is better to have a clear code structure. The point that "Rewrite bad code; don't write comments" from https://www.youtube.com/watch?v=NLebZ3XT92E&t=308s seems fair.

The refactorings should have been in the part one. I would like to investigate into refactorings which makes the compiler code more readable and more accessible to newcomers.

@ringabout ringabout changed the title Comment the compiler code, build a testable specification and write a compiler guide Refactor bad code, build a testable specification and write a compiler guide Sep 2, 2022
@haxscramper
Copy link

sigmatch.paramTypesMatch has a total of >>NINE<< one-character variables - m, f, a, x, y, z, c, r, i, all involved in a complex algorithm of the best candidate argument search - that's about a third of the English alphabet. Good to know the algorithm complexity has such a robust limiting mechanism - at most you can write code that is three times as complex, not more, so I'm not sure if we even need to talk about refactoring to be honest

@Araq
Copy link
Member

Araq commented Sep 4, 2022

@haxscramper I don't understand. A refactoring can be as simple as "rename a variable".

@haxscramper
Copy link

haxscramper commented Sep 4, 2022

For the reference - it was a joke, although maybe someone can interpret it as an example starting point or example of how code should not be written. I was just blown away by the sheer number of one-char variables in what appears to be one of the very important procedures in one of the most complex parts of the compiler (semantic analysis). This appears to be a common theme in the sem, with proliferation of undescriptive names that make it harder to intuit what the code was meant to do.

a-mr added a commit to a-mr/Nim that referenced this issue Dec 1, 2022
Fully implements nim-lang/RFCs#125
Follow-up of: nim-lang#18642 (for internal links)
and nim-lang#20127.

Overview
--------

Explicit import-like directive is required, called `.. importdoc::`.
(the syntax is % RST, Markdown will use it for a while).

Then one can reference any symbols/headings/anchors, as if they
were in the local file (but they will be prefixed with a module name
or markup document in link text).
It's possible to reference anything from anywhere (any direction
in `.nim`/`.md`/`.rst` files).

See `doc/docgen.md` for full description.

Working is based on `.idx` files, hence one needs to generate
all `.idx` beforehand. A dedicated option `--index:only` is introduced
(and a separate stage for `--index:only` is added to `kochdocs.nim`).

Performance note
----------------

Full run for `./koch docs` now takes 185% of the time before this PR.
(After: 315 s, before: 170 s on my PC).
All the time seems to be spent on `--index:only` run, which takes
almost as much (85%) of normal doc run -- it seems that most time
is spent on file parsing, turning off HTML generation phase has not
helped much.
(One could avoid it by specifying list of files that can be referenced
and pre-processing only them. But it can become error-prone and I assume
that these linke will be **everywhere** in the repository anyway,
especially considering nim-lang/RFCs#478.
So every `.nim`/`.md` file is processed for `.idx` first).

But that's all without significant part of repository converted to
cross-module auto links. To estimate impact I checked the time for
`doc`ing a few files (after all indexes have been generated), and
everywhere difference was **negligible**.
E.g. for `lib/std/private/osfiles.nim` that `importdoc`s large
`os.idx` and hence should have been a case with relatively large
performance impact, but:

* After: 0.59 s.
* Before: 0.59 s.

So Nim compiler works so slow that doc part basically does not matter :-)

Testing
-------

1) added `extlinks` test to `nimdoc/`
2) checked that `theindex.html` is still correct
2) fixed broken auto-links for modules that were derived from `os.nim`
   by adding appropriate ``importdoc``

Implementation note
-------------------

Parsing and formating of `.idx` entries is moved into a dedicated
`rstidx.nim` module from `rstgen.nim`.

`.idx` file format changed:

* fields are not escaped in most cases because we need original
  strings for referencing, not HTML ones
  (the exception is linkTitle for titles and headings).
  Escaping happens later -- on the stage of `rstgen` buildIndex, etc.
* all lines have fixed number of columns 6
* added discriminator tag as a first column,
  it always allows distinguish Nim/markup entries, titles/headings, etc.
  `rstgen` does not rely any more (in most cases) on ad-hoc logic
  to determine what type each entry is.
* there is now always a title entry added at the first line.
* add a line number as 6th column
* linkTitle (4th) column has a different format: before it was like
  `module: funcName()`, now it's `proc funcName()`.
  (This format is also propagated to `theindex.html` and search results,
  I kept it that way since I like it more though it's discussible.)
  This column is what used for Nim symbols resolution.
* also changed details on column format for headings and titles:
  "keyword" is original, "linkTitle" is HTML one
a-mr added a commit to a-mr/Nim that referenced this issue Dec 16, 2022
Fully implements nim-lang/RFCs#125
Follow-up of: nim-lang#18642 (for internal links)
and nim-lang#20127.

Overview
--------

Explicit import-like directive is required, called `.. importdoc::`.
(the syntax is % RST, Markdown will use it for a while).

Then one can reference any symbols/headings/anchors, as if they
were in the local file (but they will be prefixed with a module name
or markup document in link text).
It's possible to reference anything from anywhere (any direction
in `.nim`/`.md`/`.rst` files).

See `doc/docgen.md` for full description.

Working is based on `.idx` files, hence one needs to generate
all `.idx` beforehand. A dedicated option `--index:only` is introduced
(and a separate stage for `--index:only` is added to `kochdocs.nim`).

Performance note
----------------

Full run for `./koch docs` now takes 185% of the time before this PR.
(After: 315 s, before: 170 s on my PC).
All the time seems to be spent on `--index:only` run, which takes
almost as much (85%) of normal doc run -- it seems that most time
is spent on file parsing, turning off HTML generation phase has not
helped much.
(One could avoid it by specifying list of files that can be referenced
and pre-processing only them. But it can become error-prone and I assume
that these linke will be **everywhere** in the repository anyway,
especially considering nim-lang/RFCs#478.
So every `.nim`/`.md` file is processed for `.idx` first).

But that's all without significant part of repository converted to
cross-module auto links. To estimate impact I checked the time for
`doc`ing a few files (after all indexes have been generated), and
everywhere difference was **negligible**.
E.g. for `lib/std/private/osfiles.nim` that `importdoc`s large
`os.idx` and hence should have been a case with relatively large
performance impact, but:

* After: 0.59 s.
* Before: 0.59 s.

So Nim compiler works so slow that doc part basically does not matter :-)

Testing
-------

1) added `extlinks` test to `nimdoc/`
2) checked that `theindex.html` is still correct
2) fixed broken auto-links for modules that were derived from `os.nim`
   by adding appropriate ``importdoc``

Implementation note
-------------------

Parsing and formating of `.idx` entries is moved into a dedicated
`rstidx.nim` module from `rstgen.nim`.

`.idx` file format changed:

* fields are not escaped in most cases because we need original
  strings for referencing, not HTML ones
  (the exception is linkTitle for titles and headings).
  Escaping happens later -- on the stage of `rstgen` buildIndex, etc.
* all lines have fixed number of columns 6
* added discriminator tag as a first column,
  it always allows distinguish Nim/markup entries, titles/headings, etc.
  `rstgen` does not rely any more (in most cases) on ad-hoc logic
  to determine what type each entry is.
* there is now always a title entry added at the first line.
* add a line number as 6th column
* linkTitle (4th) column has a different format: before it was like
  `module: funcName()`, now it's `proc funcName()`.
  (This format is also propagated to `theindex.html` and search results,
  I kept it that way since I like it more though it's discussible.)
  This column is what used for Nim symbols resolution.
* also changed details on column format for headings and titles:
  "keyword" is original, "linkTitle" is HTML one
Varriount pushed a commit to nim-lang/Nim that referenced this issue Jan 4, 2023
* docgen: implement cross-document links

Fully implements nim-lang/RFCs#125
Follow-up of: #18642 (for internal links)
and #20127.

Overview
--------

Explicit import-like directive is required, called `.. importdoc::`.
(the syntax is % RST, Markdown will use it for a while).

Then one can reference any symbols/headings/anchors, as if they
were in the local file (but they will be prefixed with a module name
or markup document in link text).
It's possible to reference anything from anywhere (any direction
in `.nim`/`.md`/`.rst` files).

See `doc/docgen.md` for full description.

Working is based on `.idx` files, hence one needs to generate
all `.idx` beforehand. A dedicated option `--index:only` is introduced
(and a separate stage for `--index:only` is added to `kochdocs.nim`).

Performance note
----------------

Full run for `./koch docs` now takes 185% of the time before this PR.
(After: 315 s, before: 170 s on my PC).
All the time seems to be spent on `--index:only` run, which takes
almost as much (85%) of normal doc run -- it seems that most time
is spent on file parsing, turning off HTML generation phase has not
helped much.
(One could avoid it by specifying list of files that can be referenced
and pre-processing only them. But it can become error-prone and I assume
that these linke will be **everywhere** in the repository anyway,
especially considering nim-lang/RFCs#478.
So every `.nim`/`.md` file is processed for `.idx` first).

But that's all without significant part of repository converted to
cross-module auto links. To estimate impact I checked the time for
`doc`ing a few files (after all indexes have been generated), and
everywhere difference was **negligible**.
E.g. for `lib/std/private/osfiles.nim` that `importdoc`s large
`os.idx` and hence should have been a case with relatively large
performance impact, but:

* After: 0.59 s.
* Before: 0.59 s.

So Nim compiler works so slow that doc part basically does not matter :-)

Testing
-------

1) added `extlinks` test to `nimdoc/`
2) checked that `theindex.html` is still correct
2) fixed broken auto-links for modules that were derived from `os.nim`
   by adding appropriate ``importdoc``

Implementation note
-------------------

Parsing and formating of `.idx` entries is moved into a dedicated
`rstidx.nim` module from `rstgen.nim`.

`.idx` file format changed:

* fields are not escaped in most cases because we need original
  strings for referencing, not HTML ones
  (the exception is linkTitle for titles and headings).
  Escaping happens later -- on the stage of `rstgen` buildIndex, etc.
* all lines have fixed number of columns 6
* added discriminator tag as a first column,
  it always allows distinguish Nim/markup entries, titles/headings, etc.
  `rstgen` does not rely any more (in most cases) on ad-hoc logic
  to determine what type each entry is.
* there is now always a title entry added at the first line.
* add a line number as 6th column
* linkTitle (4th) column has a different format: before it was like
  `module: funcName()`, now it's `proc funcName()`.
  (This format is also propagated to `theindex.html` and search results,
  I kept it that way since I like it more though it's discussible.)
  This column is what used for Nim symbols resolution.
* also changed details on column format for headings and titles:
  "keyword" is original, "linkTitle" is HTML one

* fix paths on Windows + more clear code

* Update compiler/docgen.nim

Co-authored-by: Andreas Rumpf <[email protected]>

* Handle .md and .nim paths uniformly in findRefFile

* handle titles better + more comments

* don't allow markup overwrite index title for .nim files

Co-authored-by: Andreas Rumpf <[email protected]>
@ringabout
Copy link
Member Author

ringabout commented Feb 15, 2023

Rewrite the codegen to be based on a stream of tokens, not based on Ropes. (Memory consumption, performance.)

I might work on it in a few days, since it doesn't sound too difficult for me. I suppose it does the reversed work compared to the lexer.

survivorm pushed a commit to survivorm/Nim that referenced this issue Feb 28, 2023
* docgen: implement cross-document links

Fully implements nim-lang/RFCs#125
Follow-up of: nim-lang#18642 (for internal links)
and nim-lang#20127.

Overview
--------

Explicit import-like directive is required, called `.. importdoc::`.
(the syntax is % RST, Markdown will use it for a while).

Then one can reference any symbols/headings/anchors, as if they
were in the local file (but they will be prefixed with a module name
or markup document in link text).
It's possible to reference anything from anywhere (any direction
in `.nim`/`.md`/`.rst` files).

See `doc/docgen.md` for full description.

Working is based on `.idx` files, hence one needs to generate
all `.idx` beforehand. A dedicated option `--index:only` is introduced
(and a separate stage for `--index:only` is added to `kochdocs.nim`).

Performance note
----------------

Full run for `./koch docs` now takes 185% of the time before this PR.
(After: 315 s, before: 170 s on my PC).
All the time seems to be spent on `--index:only` run, which takes
almost as much (85%) of normal doc run -- it seems that most time
is spent on file parsing, turning off HTML generation phase has not
helped much.
(One could avoid it by specifying list of files that can be referenced
and pre-processing only them. But it can become error-prone and I assume
that these linke will be **everywhere** in the repository anyway,
especially considering nim-lang/RFCs#478.
So every `.nim`/`.md` file is processed for `.idx` first).

But that's all without significant part of repository converted to
cross-module auto links. To estimate impact I checked the time for
`doc`ing a few files (after all indexes have been generated), and
everywhere difference was **negligible**.
E.g. for `lib/std/private/osfiles.nim` that `importdoc`s large
`os.idx` and hence should have been a case with relatively large
performance impact, but:

* After: 0.59 s.
* Before: 0.59 s.

So Nim compiler works so slow that doc part basically does not matter :-)

Testing
-------

1) added `extlinks` test to `nimdoc/`
2) checked that `theindex.html` is still correct
2) fixed broken auto-links for modules that were derived from `os.nim`
   by adding appropriate ``importdoc``

Implementation note
-------------------

Parsing and formating of `.idx` entries is moved into a dedicated
`rstidx.nim` module from `rstgen.nim`.

`.idx` file format changed:

* fields are not escaped in most cases because we need original
  strings for referencing, not HTML ones
  (the exception is linkTitle for titles and headings).
  Escaping happens later -- on the stage of `rstgen` buildIndex, etc.
* all lines have fixed number of columns 6
* added discriminator tag as a first column,
  it always allows distinguish Nim/markup entries, titles/headings, etc.
  `rstgen` does not rely any more (in most cases) on ad-hoc logic
  to determine what type each entry is.
* there is now always a title entry added at the first line.
* add a line number as 6th column
* linkTitle (4th) column has a different format: before it was like
  `module: funcName()`, now it's `proc funcName()`.
  (This format is also propagated to `theindex.html` and search results,
  I kept it that way since I like it more though it's discussible.)
  This column is what used for Nim symbols resolution.
* also changed details on column format for headings and titles:
  "keyword" is original, "linkTitle" is HTML one

* fix paths on Windows + more clear code

* Update compiler/docgen.nim

Co-authored-by: Andreas Rumpf <[email protected]>

* Handle .md and .nim paths uniformly in findRefFile

* handle titles better + more comments

* don't allow markup overwrite index title for .nim files

Co-authored-by: Andreas Rumpf <[email protected]>
capocasa pushed a commit to capocasa/Nim that referenced this issue Mar 31, 2023
* docgen: implement cross-document links

Fully implements nim-lang/RFCs#125
Follow-up of: nim-lang#18642 (for internal links)
and nim-lang#20127.

Overview
--------

Explicit import-like directive is required, called `.. importdoc::`.
(the syntax is % RST, Markdown will use it for a while).

Then one can reference any symbols/headings/anchors, as if they
were in the local file (but they will be prefixed with a module name
or markup document in link text).
It's possible to reference anything from anywhere (any direction
in `.nim`/`.md`/`.rst` files).

See `doc/docgen.md` for full description.

Working is based on `.idx` files, hence one needs to generate
all `.idx` beforehand. A dedicated option `--index:only` is introduced
(and a separate stage for `--index:only` is added to `kochdocs.nim`).

Performance note
----------------

Full run for `./koch docs` now takes 185% of the time before this PR.
(After: 315 s, before: 170 s on my PC).
All the time seems to be spent on `--index:only` run, which takes
almost as much (85%) of normal doc run -- it seems that most time
is spent on file parsing, turning off HTML generation phase has not
helped much.
(One could avoid it by specifying list of files that can be referenced
and pre-processing only them. But it can become error-prone and I assume
that these linke will be **everywhere** in the repository anyway,
especially considering nim-lang/RFCs#478.
So every `.nim`/`.md` file is processed for `.idx` first).

But that's all without significant part of repository converted to
cross-module auto links. To estimate impact I checked the time for
`doc`ing a few files (after all indexes have been generated), and
everywhere difference was **negligible**.
E.g. for `lib/std/private/osfiles.nim` that `importdoc`s large
`os.idx` and hence should have been a case with relatively large
performance impact, but:

* After: 0.59 s.
* Before: 0.59 s.

So Nim compiler works so slow that doc part basically does not matter :-)

Testing
-------

1) added `extlinks` test to `nimdoc/`
2) checked that `theindex.html` is still correct
2) fixed broken auto-links for modules that were derived from `os.nim`
   by adding appropriate ``importdoc``

Implementation note
-------------------

Parsing and formating of `.idx` entries is moved into a dedicated
`rstidx.nim` module from `rstgen.nim`.

`.idx` file format changed:

* fields are not escaped in most cases because we need original
  strings for referencing, not HTML ones
  (the exception is linkTitle for titles and headings).
  Escaping happens later -- on the stage of `rstgen` buildIndex, etc.
* all lines have fixed number of columns 6
* added discriminator tag as a first column,
  it always allows distinguish Nim/markup entries, titles/headings, etc.
  `rstgen` does not rely any more (in most cases) on ad-hoc logic
  to determine what type each entry is.
* there is now always a title entry added at the first line.
* add a line number as 6th column
* linkTitle (4th) column has a different format: before it was like
  `module: funcName()`, now it's `proc funcName()`.
  (This format is also propagated to `theindex.html` and search results,
  I kept it that way since I like it more though it's discussible.)
  This column is what used for Nim symbols resolution.
* also changed details on column format for headings and titles:
  "keyword" is original, "linkTitle" is HTML one

* fix paths on Windows + more clear code

* Update compiler/docgen.nim

Co-authored-by: Andreas Rumpf <[email protected]>

* Handle .md and .nim paths uniformly in findRefFile

* handle titles better + more comments

* don't allow markup overwrite index title for .nim files

Co-authored-by: Andreas Rumpf <[email protected]>
bung87 pushed a commit to bung87/Nim that referenced this issue Jul 29, 2023
* docgen: implement cross-document links

Fully implements nim-lang/RFCs#125
Follow-up of: nim-lang#18642 (for internal links)
and nim-lang#20127.

Overview
--------

Explicit import-like directive is required, called `.. importdoc::`.
(the syntax is % RST, Markdown will use it for a while).

Then one can reference any symbols/headings/anchors, as if they
were in the local file (but they will be prefixed with a module name
or markup document in link text).
It's possible to reference anything from anywhere (any direction
in `.nim`/`.md`/`.rst` files).

See `doc/docgen.md` for full description.

Working is based on `.idx` files, hence one needs to generate
all `.idx` beforehand. A dedicated option `--index:only` is introduced
(and a separate stage for `--index:only` is added to `kochdocs.nim`).

Performance note
----------------

Full run for `./koch docs` now takes 185% of the time before this PR.
(After: 315 s, before: 170 s on my PC).
All the time seems to be spent on `--index:only` run, which takes
almost as much (85%) of normal doc run -- it seems that most time
is spent on file parsing, turning off HTML generation phase has not
helped much.
(One could avoid it by specifying list of files that can be referenced
and pre-processing only them. But it can become error-prone and I assume
that these linke will be **everywhere** in the repository anyway,
especially considering nim-lang/RFCs#478.
So every `.nim`/`.md` file is processed for `.idx` first).

But that's all without significant part of repository converted to
cross-module auto links. To estimate impact I checked the time for
`doc`ing a few files (after all indexes have been generated), and
everywhere difference was **negligible**.
E.g. for `lib/std/private/osfiles.nim` that `importdoc`s large
`os.idx` and hence should have been a case with relatively large
performance impact, but:

* After: 0.59 s.
* Before: 0.59 s.

So Nim compiler works so slow that doc part basically does not matter :-)

Testing
-------

1) added `extlinks` test to `nimdoc/`
2) checked that `theindex.html` is still correct
2) fixed broken auto-links for modules that were derived from `os.nim`
   by adding appropriate ``importdoc``

Implementation note
-------------------

Parsing and formating of `.idx` entries is moved into a dedicated
`rstidx.nim` module from `rstgen.nim`.

`.idx` file format changed:

* fields are not escaped in most cases because we need original
  strings for referencing, not HTML ones
  (the exception is linkTitle for titles and headings).
  Escaping happens later -- on the stage of `rstgen` buildIndex, etc.
* all lines have fixed number of columns 6
* added discriminator tag as a first column,
  it always allows distinguish Nim/markup entries, titles/headings, etc.
  `rstgen` does not rely any more (in most cases) on ad-hoc logic
  to determine what type each entry is.
* there is now always a title entry added at the first line.
* add a line number as 6th column
* linkTitle (4th) column has a different format: before it was like
  `module: funcName()`, now it's `proc funcName()`.
  (This format is also propagated to `theindex.html` and search results,
  I kept it that way since I like it more though it's discussible.)
  This column is what used for Nim symbols resolution.
* also changed details on column format for headings and titles:
  "keyword" is original, "linkTitle" is HTML one

* fix paths on Windows + more clear code

* Update compiler/docgen.nim

Co-authored-by: Andreas Rumpf <[email protected]>

* Handle .md and .nim paths uniformly in findRefFile

* handle titles better + more comments

* don't allow markup overwrite index title for .nim files

Co-authored-by: Andreas Rumpf <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants