Skip to content

Conversation

@lukasz-stec
Copy link
Member

@lukasz-stec lukasz-stec commented Jun 17, 2022

Description

Since for nested types createRandomBlockForNestedType
chooses the lengths of the nested values randomly,
there is a chance that the chosen value length is
not big enough for the given nullRate.
To fix this, we cap the nested values nullRate, possibly to 0,
so it matches the nested position count.

Is this change a fix, improvement, new feature, refactoring, or other?

fix flaky test

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

core query engine tests

How would you describe this change to a non-technical end user or system administrator?

fix flaky test

Related issues, pull requests, and links

Documentation

( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

Fixes #12888

@cla-bot cla-bot bot added the cla-signed label Jun 17, 2022
@lukasz-stec lukasz-stec requested a review from ebyhr June 17, 2022 07:15
@findepi findepi requested review from raunaqmorarka and sopel39 June 20, 2022 08:59
@ebyhr ebyhr removed their request for review June 21, 2022 02:44
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the variable named positionCount ? For map and array isn't offsets[positionCount] the total length of data in the Block ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For map and array isn't offsets[positionCount] the total length of data in the Block ?

well yes, if by total length you mean position count of the data/value block

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you call it entryCount or valuesLength ? At least that's the terminology used in AbstractArrayBlock or AbstractMapBlock to distinguish it from positionCount.
Also consider passing down offsets and positionCount separately to this method if it's always meant to be used in context of Map and Row

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is not specific to AbstractMapBlock or AbstractArrayBlock. It returns the nullRate that will match exactly the number of null positions for the given position count.

@sopel39
Copy link
Member

sopel39 commented Jun 22, 2022

Why can't we just fix chooseNullPositions to accept 0 nulls?

@lukasz-stec
Copy link
Member Author

Why can't we just fix chooseNullPositions to accept 0 nulls?

It accepts 0 nulls already. it fails if the null rate is not 0 but the result would be empty. This is to guard against unintentional setup where non-zero nullRate generates 0 nulls.

@sopel39
Copy link
Member

sopel39 commented Jun 23, 2022

It accepts 0 nulls already. it fails if the null rate is not 0 but the result would be empty. This is to guard against unintentional setup where non-zero nullRate generates 0 nulls.

With capNullRate it seems we are purposefully mitigating that check, hence why to keep it in the first place?

@lukasz-stec
Copy link
Member Author

It accepts 0 nulls already. it fails if the null rate is not 0 but the result would be empty. This is to guard against unintentional setup where non-zero nullRate generates 0 nulls.

With capNullRate it seems we are purposefully mitigating that check, hence why to keep it in the first place?

Yes, but only in the nested type case for the value block.
I'm also fine with dropping this check.

@sopel39
Copy link
Member

sopel39 commented Jun 23, 2022

Yes, but only in the nested type case for the value block.
I'm also fine with dropping this check.

Maybe a separate method without that check?

@lukasz-stec
Copy link
Member Author

Yes, but only in the nested type case for the value block.
I'm also fine with dropping this check.

Maybe a separate method without that check?

do we need two? how would use this new method given there is a recursive call in createRandomBlockForNestedType?

@sopel39
Copy link
Member

sopel39 commented Jun 24, 2022

do we need two? how would use this new method given there is a recursive call in createRandomBlockForNestedType?

I think it's cleaner than adding a method with check and then creating another method that mitigates that check. You already have two methods

@lukasz-stec
Copy link
Member Author

do we need two? how would use this new method given there is a recursive call in createRandomBlockForNestedType?

I think it's cleaner than adding a method with check and then creating another method that mitigates that check. You already have two methods

there is a recursive call in createRandomBlockForNestedType

@sopel39
Copy link
Member

sopel39 commented Jun 27, 2022

there is a recursive call in createRandomBlockForNestedType

Would it be possible to simply exclude test cases that generate 0 nulls?

@lukasz-stec lukasz-stec force-pushed the ls/024-flaky-test-pos-appender branch from 0c0df8b to de5f9d9 Compare June 27, 2022 12:01
Use fresh Random instance per method to make the test
deterministic
@lukasz-stec lukasz-stec force-pushed the ls/024-flaky-test-pos-appender branch from de5f9d9 to 36f4446 Compare June 28, 2022 06:44
@lukasz-stec
Copy link
Member Author

there is a recursive call in createRandomBlockForNestedType

Would it be possible to simply exclude test cases that generate 0 nulls?

after offline discussion I changed this fix to make the BlockAssertions deterministic

@sopel39 sopel39 merged commit 6b95dc7 into trinodb:master Jun 28, 2022
@github-actions github-actions bot added this to the 388 milestone Jun 28, 2022

int[] ids = IntStream.range(0, positionCount)
.map(i -> RANDOM.nextInt(dictionary.getPositionCount()))
.map(i -> random().nextInt(dictionary.getPositionCount()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um, this actually uses a fresh Random instance per iteration. Each instance has the same seed, so each iteration will produce the same result. So this is kind of more deterministic than you wanted, I think :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 #13029 cc @sopel39

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Flaky TestPositionsAppender.testConsecutiveBuilds

4 participants