Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/performance issues #2624

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Cespito
Copy link

@Cespito Cespito commented Feb 26, 2024

Pull Request Etiquette

  • I have checked the PRs for upcoming features/bug fixes.
  • I have read the [contributing guidelines][contributing].

Changes

  • Internal code
  • Library interface (affecting end-user code)
  • Documentation
  • Other: _____

Closes Issue: NaN

Description

This pull request makes the following improvements:

  • Optimize time complexity

Details
The following method:

public List<Member> getElementsWithRoles(@Nonnull Collection<Role> roles)
{
    Checks.noneNull(roles, "Roles");
    if (isEmpty())
        return Collections.emptyList();

    List<Role> rolesWithoutPublicRole = roles.stream().filter(role -> !role.isPublicRole()).collect(Collectors.toList());
    if (rolesWithoutPublicRole.isEmpty())
        return asList();

    List<Member> members = new ArrayList<>();
    forEach(member ->
    {
        if (member.getRoles().containsAll(rolesWithoutPublicRole))
            members.add(member);
    });
    return members;
}

has been replaced with:

public List<Member> getElementsWithRoles(@Nonnull Collection<Role> roles)
{
    Checks.noneNull(roles, "Roles");

    Set<Role> rolesWithoutPublicRoleSet = roles.stream()
            .filter(role -> !role.isPublicRole())
            .collect(Collectors.toSet());

    if (isEmpty() || rolesWithoutPublicRoleSet.isEmpty())
        return Collections.emptyList();

    List<Member> members = new ArrayList<>();
    forEach(member -> {
        if (new HashSet<>(member.getRoles()).containsAll(rolesWithoutPublicRoleSet))
            members.add(member);
    });
    return members;
}

The reason is:
The time complexity of this method call is O(n·m), where n is the number of elements in the list on which the method is called, and m is the number of elements in the collection passed to the method as a parameter.
I replaced List with Set for rolesWithoutPublicRoleSet. This makes membership control operations more efficient and filters out any duplicates.
I changed the early exit condition if the member list is empty or if there are no roles without public roles.
I changed the containsAll() method to using a HashSet for member.getRoles(). This makes the membership checking operation more time efficient, since the time required to create java.util.HashSet from java.util.List is O(n) and execute containsAll() on java.util.HashSet is O(m).
In essence we have a transition from a complexity O(n·m) to O(n+m).


The following instruction:

try (InputStream is = new FileInputStream(file))

has been replaced with:

try (InputStream is = Files.newInputStream(file.toPath()))

The reason is:
This method is part of the java.nio.file.Files package which offers more modern support for file-level I/O operations, and provides a more modern and flexible interface for working with files and paths, as well as offer better exception handling and broader support for advanced I/O operations.


The following instructions and similar (I provide one as an example):

for (int i = 0; i < ret.length; i ++)
    ret[i] = c[i+boxzerobytesLength];

has been replaced with:

System.arraycopy(c, boxzerobytesLength, ret, 0, ret.length);

The reason is:
System.arraycopy() is more compact and uses a built-in array copy operation provided by the System class. It is optimized for performance and is generally faster than implementing it manually with the for loop.

Copy link
Contributor

@freya022 freya022 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that changing C code ported to Java after years is not a great idea, all of this makes changes that need to be verified, for no real benefit, and time would be better spent looking at other pull requests.

@@ -92,17 +92,17 @@ public List<Member> getElementsWithRoles(@Nonnull Role... roles)
public List<Member> getElementsWithRoles(@Nonnull Collection<Role> roles)
{
Checks.noneNull(roles, "Roles");
if (isEmpty())
return Collections.emptyList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why this has been moved after rolesWithoutPublicRoleSet has been computed?

{
if (member.getRoles().containsAll(rolesWithoutPublicRole))
forEach(member -> {
if (new HashSet<>(member.getRoles()).containsAll(rolesWithoutPublicRoleSet))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide benchmarks to prove that this method is faster after these changes?

Usually members do not have a significant enough amount of roles for hash tables to provide performance benefits.

@@ -257,8 +256,7 @@ private byte[] generateNonce() {
///return new byte_buf_t(c, boxzerobytesLength, c.length-boxzerobytesLength);
byte [] ret = new byte[c.length-boxzerobytesLength];

for (int i = 0; i < ret.length; i ++)
ret[i] = c[i+boxzerobytesLength];
System.arraycopy(c, boxzerobytesLength, ret, 0, ret.length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please provide evidence that the JVM does not already compile the previous code as if it was using this new code? i.e., prove that the machine code is different

Copy link

@JellyBrick JellyBrick Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As highlighted above there is several pieces of evidence that demonstrate that the generated code is different. In any case, I have carried out several benchmarks that confirm this thesis with different versions of the JDK, the test is easily replicable, for example:
JMH version: 1.37
VM version: JDK 21.0.1, OpenJDK 64-Bit Server VM, 21.0.1+12-29
Warmup: 10 iterations
Measurement: 100 iterations
Threads: 1 thread, will synchronize iterations
Benchmark mode: Average time, time/op

Benchmark Mode Cnt Score Error Units
Benchmark.testArrayCopy16Bytes avgt 100 0,329 ± 0,002 ns/op
Benchmark.testHardCopy16Bytes avgt 100 0,446 ± 0,004 ns/op
Benchmark Mode Cnt Score Error Units
Benchmark.testArrayCopy32Bytes avgt 100 0,331 ± 0,002 ns/op
Benchmark.testHardCopy32Bytes avgt 100 8,367 ± 0,091 ns/op
Benchmark Mode Cnt Score Error Units
Benchmark.testArrayCopy64Bytes avgt 100 0,333 ± 0,002 ns/op
Benchmark.testHardCopy64Bytes avgt 100 12,835 ± 0,096 ns/op

The difference is all the more noticeable as the buffer size increases (KB, MB ecc..).
Also, this method is already used (#3342 System.arraycopy(b, 0, x, 0, len);), I don't see a valid reason to have different implementations of the same thing, among other things in the same class

@MinnDevelopment
Copy link
Member

Performance changes should always be accompanied by benchmarks to prove effectiveness. What actually motivates these changes? Is there a provable performance impact?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants