KDocs fixes for `distinct` and `distinctBy` #1628

Allex-Nik · 2025-12-08T19:47:34Z

The documentation for the distinct function with parameters described functionality of distinctBy: it suggested that distinct removes duplicated rows omitting the fact that it also selects the specified columns and the result contains only these columns.

In this PR I:

Fixed this issue by changing descriptions of distinct
Added distinctBy to DocumentationUrls
Specified the parameter explicitly for every function to avoid incorrect resolution of the columns parameter mentioned in DistinctDocs
Made some other minor fixes

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/distinct.kt

Allex-Nik · 2025-12-08T19:53:08Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/distinct.kt

 * {@include [DistinctDocs]}
- * {@set PHRASE_ENDING the specified columns}.
+ * {@set DESCRIPTION It selects the specified columns and keeps only distinct rows based on these selected columns}
+ * {@set [DistinctDocs.DISTINCT_PARAM] @param [columns] The names of the columns to select


I changed the description of the parameter for some functions to mention that it is also used to select columns.

it's also not "the names" here

but as this function is deprecated, you can probably remove the kdocs altogether

Allex-Nik · 2025-12-08T19:55:39Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/distinct.kt

 * {@include [DistinctDocs]}
+ * {@set NAME DistinctBy}
 * {@set PHRASE_ENDING the specified columns}.
+ * {@set [DistinctDocs.DISTINCT_PARAM] @param [columns]


I specified this explicitly for this and the following functions (instead of keeping the default option in the common section in DistinctDocs) to avoid the issue with [columns] I mentioned in the description of the PR.

…ons.

AndreiKingsley · 2025-12-09T09:49:46Z

@Allex-Nik To avoid this problem with [columns] you can write [\columns] instead. We use this trick in many places.

I'd even say it's better to write [\columns] in such places always.

AndreiKingsley

Nice! Please, remove headers and add cross-references for distinct/distinctBy operations

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/distinct.kt

AndreiKingsley · 2025-12-09T09:40:55Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/distinct.kt

+ * ## The {@get NAME Distinct} Operation
 *
- * It removes duplicated rows based on {@get PHRASE_ENDING}.
+ * {@get DESCRIPTION It removes duplicated rows based on {@get PHRASE_ENDING}}.


Please, change in all kdocs "It removes .." -> "Removes .."

Please make sure all @get/@set keys are [references]. This will work too, but references are refactor- and typo-safe :)

See https://github.com/Kotlin/dataframe/blob/master/KDOC_PREPROCESSING.md#arg-interfaces

Then I need to include DESCRIPTION and PHRASE_ENDING into DistinctDocs like this, right?

private interface DistinctDocs { interface DISTINCT_PARAM interface DISTINCT_RETURN interface DESCRIPTION interface PHRASE_ENDING }

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/distinct.kt

AndreiKingsley · 2025-12-09T09:46:59Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/distinct.kt

- * @return A new DataFrame containing only distinct rows.
+ * {@get [DISTINCT_RETURN] @return A new [DataFrame] containing only distinct rows.}
 *
 * @see [Selecting Columns][SelectSelectingOptions].


Let's rewrite this section with our template:

dataframe/core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/convert.kt

Line 109 in 5d6b7af

* @include [SelectingColumns.ColumnGroupsAndNestedColumnsMention]

Something like that

See also [distinctBy], that (write difference here) @include [SelectingColumns.ColumnGroupsAndNestedColumnsMention] See [Selecting Columns][ConvertSelectingOptions]. For more information: {@include [DocumentationUrls.Distinct]}

AndreiKingsley · 2025-12-09T09:50:55Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/distinct.kt

 /**
 * {@include [DistinctDocs]}
+ * {@set NAME DistinctBy}
 * {@set PHRASE_ENDING the specified columns}.


change to [\columns] here and in other places

Jolanrensen · 2025-12-11T11:22:43Z

Specified the parameter explicitly for every function to avoid incorrect resolution of the columns parameter mentioned in DistinctDocs

TL;DR: write it like [columns\]

Long explanation:

Whenever some KDocs is @included, all references mentioned in that doc are expanded to their fully qualified path, if possible. This solves the issue that a reference to [a] in one doc is not necessarily resolvable as [a] in another doc. But it may be resolvable as [a][path.to.a].

If a reference cannot be found, it's left unchanged as [a].

Unfortunately, this system is not perfect (resolving symbols by path in kotlin myself is hard XD), so when you write

/** @param [columns] The names of the columns to consider for evaluating distinct rows. */
interface DistinctDocs

/** @include [DistinctDocs] */
public fun <T> DataFrame<T>.distinctBy...

KoDEx tries to find [columns] in the scope of DistinctDocs, finds it, and expands it to [columns][org.jetbrains.kotlinx.dataframe.columns] before including it at distinctBy().

Luckily, we have the \ escape character :) which allows us to stop KoDEx from doing "clever" things. These are removed from the KDocs in the last phase. These allow you to 'break' tags like \@inlude X, or \$something, or references like \[columns] so they aren't processed anymore :). (You can put the \ anywhere in the reference I believe, actually)

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/distinct.kt

Jolanrensen · 2025-12-11T10:55:39Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/distinct.kt

+ * {@get [DISTINCT_RETURN] @return A new [DataFrame] containing only distinct rows.}
 *
 * @see [Selecting Columns][SelectSelectingOptions].
 * @see {@include [DocumentationUrls.Distinct]}


never used @see like that :o seems to work well!

Jolanrensen · 2025-12-11T10:58:48Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/distinct.kt

 * {@include [DistinctDocs]}
- * {@set PHRASE_ENDING the specified columns}.
+ * {@set DESCRIPTION It selects the specified columns and keeps only distinct rows based on these selected columns}
+ * {@set [DistinctDocs.DISTINCT_PARAM] @param [columns] The names of the columns to select


it's not "The names of the columns" though, it's a columns selector. Check how we write the @param of columnsSelectors in other operations :)

It's probably best to just write the @param line at the bottom of this KDoc, as each overload needs a different line of text for it. Yes, it will be below the @return then, but the IDE still renders it well.

Jolanrensen · 2025-12-11T10:59:15Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/distinct.kt

 * {@include [DistinctDocs]}
- * {@set PHRASE_ENDING the specified columns}.
+ * {@set DESCRIPTION It selects the specified columns and keeps only distinct rows based on these selected columns}
+ * {@set [DistinctDocs.DISTINCT_PARAM] @param [columns] The names of the columns to select


it's also not "the names" here

Jolanrensen · 2025-12-11T10:59:32Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/distinct.kt

 * {@include [DistinctDocs]}
- * {@set PHRASE_ENDING the specified columns}.
+ * {@set DESCRIPTION It selects the specified columns and keeps only distinct rows based on these selected columns}
+ * {@set [DistinctDocs.DISTINCT_PARAM] @param [columns] The names of the columns to select


but as this function is deprecated, you can probably remove the kdocs altogether

Jolanrensen · 2025-12-11T11:02:50Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/distinct.kt

+ * ## The {@get NAME Distinct} Operation
 *
- * It removes duplicated rows based on {@get PHRASE_ENDING}.
+ * {@get DESCRIPTION It removes duplicated rows based on {@get PHRASE_ENDING}}.


Please make sure all @get/@set keys are [references]. This will work too, but references are refactor- and typo-safe :)

Jolanrensen · 2025-12-11T11:03:24Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/distinct.kt

+ * ## The {@get NAME Distinct} Operation
 *
- * It removes duplicated rows based on {@get PHRASE_ENDING}.
+ * {@get DESCRIPTION It removes duplicated rows based on {@get PHRASE_ENDING}}.


See https://github.com/Kotlin/dataframe/blob/master/KDOC_PREPROCESSING.md#arg-interfaces

Allex-Nik requested review from AndreiKingsley and Jolanrensen December 8, 2025 19:47

Allex-Nik commented Dec 8, 2025

View reviewed changes

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/distinct.kt Show resolved Hide resolved

Allex-Nik commented Dec 8, 2025

View reviewed changes

Fixes to the documentation for the distinct and distinctBy functi…

37ca771

…ons.

Allex-Nik force-pushed the distinct-docs branch from b67f0d0 to 37ca771 Compare December 8, 2025 20:02

AndreiKingsley requested changes Dec 9, 2025

View reviewed changes

Jolanrensen requested changes Dec 11, 2025

View reviewed changes

KDocs fixes for distinct and distinctBy #1628

Are you sure you want to change the base?

KDocs fixes for distinct and distinctBy #1628

Conversation

Allex-Nik commented Dec 8, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AndreiKingsley commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndreiKingsley left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jolanrensen commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

KDocs fixes for `distinct` and `distinctBy` #1628

KDocs fixes for `distinct` and `distinctBy` #1628

AndreiKingsley commented Dec 9, 2025 •

edited

Loading

Jolanrensen commented Dec 11, 2025 •

edited

Loading