Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File total tokens added to report output #1036

Merged
merged 5 commits into from
May 10, 2023
Merged

File total tokens added to report output #1036

merged 5 commits into from
May 10, 2023

Conversation

jazerix
Copy link
Contributor

@jazerix jazerix commented Apr 20, 2023

This PR adds the count of tokens in each file such that you get an idea of how much of each file is covered by a match. This is especially nice if you are not using the report viewing but ingesting the files in other ways.

Alternatively, this metric can also be converted to a similarity using the same formula found in JPlagComparison.java

I'm aware that it doesn't give you the complete picture, but having a sense of perspective on each file is nice.

Let me know what you think 😃

@tsaglam
Copy link
Member

tsaglam commented Apr 21, 2023

That is a good idea; I will have a look at it.

@jazerix could you run mvn spotless:apply to fix your formatting?

@tsaglam tsaglam added enhancement Issue/PR that involves features, improvements and other changes minor Minor issue/feature/contribution/change labels Apr 21, 2023
@jazerix
Copy link
Contributor Author

jazerix commented Apr 21, 2023

@tsaglam Certainly! I've also renamed the two new variables to align with the current naming scheme.

Copy link
Member

@dfuchss dfuchss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor suggestion

@tsaglam tsaglam requested a review from TwoOfTwelve May 10, 2023 11:52
@tsaglam
Copy link
Member

tsaglam commented May 10, 2023

@TwoOfTwelve could you take a look at this PR? Resolve the conflict and maybe even test if it affects the performance if we run it for a large dataset?

Copy link
Contributor

@TwoOfTwelve TwoOfTwelve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me.

I measured the runtime of the entire application using a bunch of data. Without the changes it took 0.91012 seconds on average, with the changes 0.91279 seconds.
I don't think, this will have a noticeable impact on the performance.

@sonarcloud
Copy link

sonarcloud bot commented May 10, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@tsaglam tsaglam merged commit 53253e5 into jplag:develop May 10, 2023
@jazerix jazerix deleted the develop branch May 14, 2023 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issue/PR that involves features, improvements and other changes minor Minor issue/feature/contribution/change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants