Skip to content

Conversation

@Allex-Nik
Copy link
Collaborator

Fixes #1434

The documentation for the distinct function with parameters described functionality of distinctBy: it suggested that distinct removes duplicated rows omitting the fact that it also selects the specified columns and the result contains only these columns.

In this PR I:

  • Fixed this issue by changing descriptions of distinct
  • Added distinctBy to DocumentationUrls
  • Specified the parameter explicitly for every function to avoid incorrect resolution of the columns parameter mentioned in DistinctDocs
    image
  • Made some other minor fixes

* {@include [DistinctDocs]}
* {@set PHRASE_ENDING the specified columns}.
* {@set DESCRIPTION It selects the specified columns and keeps only distinct rows based on these selected columns}
* {@set [DistinctDocs.DISTINCT_PARAM] @param [columns] The names of the columns to select
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the description of the parameter for some functions to mention that it is also used to select columns.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's also not "the names" here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but as this function is deprecated, you can probably remove the kdocs altogether

* {@include [DistinctDocs]}
* {@set NAME DistinctBy}
* {@set PHRASE_ENDING the specified columns}.
* {@set [DistinctDocs.DISTINCT_PARAM] @param [columns]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I specified this explicitly for this and the following functions (instead of keeping the default option in the common section in DistinctDocs) to avoid the issue with [columns] I mentioned in the description of the PR.

@AndreiKingsley
Copy link
Collaborator

AndreiKingsley commented Dec 9, 2025

@Allex-Nik To avoid this problem with [columns] you can write [\columns] instead. We use this trick in many places.

I'd even say it's better to write [\columns] in such places always.

Copy link
Collaborator

@AndreiKingsley AndreiKingsley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Please, remove headers and add cross-references for distinct/distinctBy operations

* ## The {@get NAME Distinct} Operation
*
* It removes duplicated rows based on {@get PHRASE_ENDING}.
* {@get DESCRIPTION It removes duplicated rows based on {@get PHRASE_ENDING}}.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, change in all kdocs "It removes .." -> "Removes .."

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure all @get/@set keys are [references]. This will work too, but references are refactor- and typo-safe :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I need to include DESCRIPTION and PHRASE_ENDING into DistinctDocs like this, right?

private interface DistinctDocs {
    interface DISTINCT_PARAM

    interface DISTINCT_RETURN

    interface DESCRIPTION

    interface PHRASE_ENDING
}

* @return A new DataFrame containing only distinct rows.
* {@get [DISTINCT_RETURN] @return A new [DataFrame] containing only distinct rows.}
*
* @see [Selecting Columns][SelectSelectingOptions].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rewrite this section with our template:

* @include [SelectingColumns.ColumnGroupsAndNestedColumnsMention]

Something like that

See also [distinctBy], that (write difference here)

 @include [SelectingColumns.ColumnGroupsAndNestedColumnsMention]
 
 See [Selecting Columns][ConvertSelectingOptions].
 
 For more information: {@include [DocumentationUrls.Distinct]}

/**
* {@include [DistinctDocs]}
* {@set NAME DistinctBy}
* {@set PHRASE_ENDING the specified columns}.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to [\columns] here and in other places

@Jolanrensen
Copy link
Collaborator

Jolanrensen commented Dec 11, 2025

Specified the parameter explicitly for every function to avoid incorrect resolution of the columns parameter mentioned in DistinctDocs

TL;DR: write it like [columns\]

Long explanation:

Whenever some KDocs is @included, all references mentioned in that doc are expanded to their fully qualified path, if possible. This solves the issue that a reference to [a] in one doc is not necessarily resolvable as [a] in another doc. But it may be resolvable as [a][path.to.a].

If a reference cannot be found, it's left unchanged as [a].

Unfortunately, this system is not perfect (resolving symbols by path in kotlin myself is hard XD), so when you write

/** @param [columns] The names of the columns to consider for evaluating distinct rows. */
interface DistinctDocs

/** @include [DistinctDocs] */
public fun <T> DataFrame<T>.distinctBy...

KoDEx tries to find [columns] in the scope of DistinctDocs, finds it, and expands it to [columns][org.jetbrains.kotlinx.dataframe.columns] before including it at distinctBy().

Luckily, we have the \ escape character :) which allows us to stop KoDEx from doing "clever" things. These are removed from the KDocs in the last phase. These allow you to 'break' tags like \@inlude X, or \$something, or references like \[columns] so they aren't processed anymore :). (You can put the \ anywhere in the reference I believe, actually)

* {@get [DISTINCT_RETURN] @return A new [DataFrame] containing only distinct rows.}
*
* @see [Selecting Columns][SelectSelectingOptions].
* @see {@include [DocumentationUrls.Distinct]}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never used @see like that :o seems to work well!

* {@include [DistinctDocs]}
* {@set PHRASE_ENDING the specified columns}.
* {@set DESCRIPTION It selects the specified columns and keeps only distinct rows based on these selected columns}
* {@set [DistinctDocs.DISTINCT_PARAM] @param [columns] The names of the columns to select
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not "The names of the columns" though, it's a columns selector. Check how we write the @param of columnsSelectors in other operations :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably best to just write the @param line at the bottom of this KDoc, as each overload needs a different line of text for it. Yes, it will be below the @return then, but the IDE still renders it well.

* {@include [DistinctDocs]}
* {@set PHRASE_ENDING the specified columns}.
* {@set DESCRIPTION It selects the specified columns and keeps only distinct rows based on these selected columns}
* {@set [DistinctDocs.DISTINCT_PARAM] @param [columns] The names of the columns to select
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's also not "the names" here

* {@include [DistinctDocs]}
* {@set PHRASE_ENDING the specified columns}.
* {@set DESCRIPTION It selects the specified columns and keeps only distinct rows based on these selected columns}
* {@set [DistinctDocs.DISTINCT_PARAM] @param [columns] The names of the columns to select
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but as this function is deprecated, you can probably remove the kdocs altogether

* ## The {@get NAME Distinct} Operation
*
* It removes duplicated rows based on {@get PHRASE_ENDING}.
* {@get DESCRIPTION It removes duplicated rows based on {@get PHRASE_ENDING}}.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure all @get/@set keys are [references]. This will work too, but references are refactor- and typo-safe :)

* ## The {@get NAME Distinct} Operation
*
* It removes duplicated rows based on {@get PHRASE_ENDING}.
* {@get DESCRIPTION It removes duplicated rows based on {@get PHRASE_ENDING}}.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Distinct KDocs are wrong

4 participants