File-based Similarity in the Comparison View #1516

Kr0nox · 2024-01-30T21:02:24Z

Displays the percentage of tokens of a file that are part of a match.
Also displays the total number of tokens in each submission.

The submission file index now holds data about the token count of each file

relates to #1109

Kr0nox · 2024-02-01T13:13:03Z

We should discuss how the EOF Token should be handled. Currently it is excluded in the percent calculation but not in the tooltip, which may be confusing for the user.
If we would include it in the percent calculation a file that is entirely copied could not achive a 100% match. So one option would be to not count it in both numbers

TwoOfTwelve

Looks great, except for the one thing I commented on.

core/src/main/java/de/jplag/reporting/reportobject/ReportObjectFactory.java

sebinside

Regarding the UI, I would argue for not displaying decimal numbers to minimize visual clutter. Also, I would argue for a dark gray instead of black text color.

Kr0nox · 2024-02-06T19:10:25Z

The performance impact of this is pretty heavy.
Zipping was increased from 13 seconds to 61 seconds for 64 copies of JPlag (Comparison time was about 3m 40s)

Because of this the token calculation should probably be moved somewhere in the core where it is more efficient. We should discus where (if there even is a place where it is possible to speed it up) and how this should be done

After merging develop with the immediat zip wirting, the time was increased from ~13 seconds to ~16 seconds for 64 copies of JPlag (comparison time about 3m 40s). This also alligned with a time mesuarement for the the counting loop which accounted for 3 seconds.
Using a parralel stream over the submissions we could reduce this to under a second. This would reduce readibility tho in my opinion. What is your opinion on that?

We should discuss whether this is fine or whether it should be moved. We should factor in how difficult it would be to optimize that. since parsing of submissions is not done in parrallel there would be no perfomance improvement when moving this into the submission class and iterating over the list of tokens returned form the Language:parse method. So the only option I see to do this, is to do this directly in the parser of each language. This would lead to some big changes to the Language and Parser interface in my opinion tho.
@tsaglam What is your opinion?

…le-file-similarity

tsaglam · 2024-02-08T09:38:47Z

@Kr0nox

Using a parallel stream over the submissions we could reduce this to under a second. This would reduce readibility tho in my opinion. What is your opinion on that?

I would say performance > readability here.

We should discuss whether this is fine or whether it should be moved. We should factor in how difficult it would be to optimize that. since parsing of submissions is not done in parallel there would be no performance improvement when moving this into the submission class and iterating over the list of tokens returned form the Language:parse method. So the only option I see to do this, is to do this directly in the parser of each language. This would lead to some big changes to the Language and Parser interface in my opinion tho. What is your opinion?

Doing it in each language is not something I am in favor of.
Another option would be to do it when creating the result object. e.g. either add a method to JPlagResult that returns the number of tokens per file, which is pre-calculated in the constructor and put in a map. Or add a method for that information to the submission class, but only calculate the info lazily when the method is first called. Thus, when writing the report, you can iterate over the submissions in parallel and call that method. This last solution is something I would actually prefer.

…le-file-similarity

tsaglam

Just one small thing.

core/src/main/java/de/jplag/Submission.java

sonarqubecloud · 2024-02-13T12:29:53Z

Quality Gate passed for 'JPlag Report Viewer'

Issues
0 New issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

sonarqubecloud · 2024-02-13T12:35:04Z

Quality Gate passed for 'JPlag Plagiarism Detector'

Issues
0 New issues

Measures
0 Security Hotspots
90.3% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

Kr0nox added 6 commits January 30, 2024 20:04

new submission index structure

20d269f

extract token information

7bef2f3

exclude non matchable tokens

e29ea26

display file percentage

953b0a2

fix spotless

ffd2f25

tooltip

7068e97

Kr0nox added the report-viewer PR / Issue deals (partly) with the report viewer and thus involves web-dev technologies label Jan 30, 2024

Kr0nox requested review from tsaglam, sebinside and TwoOfTwelve January 30, 2024 21:02

Kr0nox added 3 commits January 30, 2024 22:07

fix unit tests

bec418f

fix sonar cloud

75fa0a5

handle eof token

e3381f3

TwoOfTwelve reviewed Feb 2, 2024

View reviewed changes

core/src/main/java/de/jplag/reporting/reportobject/ReportObjectFactory.java Outdated Show resolved Hide resolved

use undefined for not given value

3bbc3e4

sebinside reviewed Feb 6, 2024

View reviewed changes

remove visual complexity

8fc0aeb

Kr0nox requested a review from sebinside February 6, 2024 12:07

Kr0nox added 2 commits February 6, 2024 16:59

exclude eof token

70a5ff1

total token count

2e7de5f

Merge remote-tracking branch 'origin/develop' into report-viewer/sing…

5946812

…le-file-similarity

Kr0nox added 2 commits February 8, 2024 11:38

add calculation to submission and use parralel stream

c7f47d6

fix path of copied submissions

06e4bbe

tsaglam added enhancement Issue/PR that involves features, improvements and other changes minor Minor issue/feature/contribution/change labels Feb 13, 2024

Merge remote-tracking branch 'origin/develop' into report-viewer/sing…

133b6cf

…le-file-similarity

tsaglam reviewed Feb 13, 2024

View reviewed changes

core/src/main/java/de/jplag/Submission.java Outdated Show resolved Hide resolved

save calculation

8c252fb

tsaglam approved these changes Feb 13, 2024

View reviewed changes

sebinside approved these changes Feb 13, 2024

View reviewed changes

sebinside merged commit b91395a into develop Feb 13, 2024

sebinside deleted the report-viewer/single-file-similarity branch February 13, 2024 15:24

Kr0nox mentioned this pull request Feb 13, 2024

remove duplicate info about token count #1555

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File-based Similarity in the Comparison View #1516

File-based Similarity in the Comparison View #1516

Kr0nox commented Jan 30, 2024 •

edited

Loading

Kr0nox commented Feb 1, 2024

TwoOfTwelve left a comment

sebinside left a comment

Kr0nox commented Feb 6, 2024 •

edited

Loading

tsaglam commented Feb 8, 2024 •

edited

Loading

tsaglam left a comment

sonarqubecloud bot commented Feb 13, 2024

sonarqubecloud bot commented Feb 13, 2024

File-based Similarity in the Comparison View #1516

File-based Similarity in the Comparison View #1516

Conversation

Kr0nox commented Jan 30, 2024 • edited Loading

Kr0nox commented Feb 1, 2024

TwoOfTwelve left a comment

Choose a reason for hiding this comment

sebinside left a comment

Choose a reason for hiding this comment

Kr0nox commented Feb 6, 2024 • edited Loading

tsaglam commented Feb 8, 2024 • edited Loading

tsaglam left a comment

Choose a reason for hiding this comment

sonarqubecloud bot commented Feb 13, 2024

Quality Gate passed for 'JPlag Report Viewer'

sonarqubecloud bot commented Feb 13, 2024

Quality Gate passed for 'JPlag Plagiarism Detector'

Kr0nox commented Jan 30, 2024 •

edited

Loading

Kr0nox commented Feb 6, 2024 •

edited

Loading

tsaglam commented Feb 8, 2024 •

edited

Loading