Skip to content

Conversation

@davehagman
Copy link
Contributor

@davehagman davehagman commented Oct 19, 2021

What is the purpose of the pull request

In order to support multiple writers against a single table we need to eliminate (or at least greatly reduce) the chance that multiple writers running in parallel will create instants using the same instant ID (it is currently based on the current time down to second granularity).

The current strategy has a shockingly high rate of instant time collisions even with just two writers. In order to make collisions far less likely this PR adds millisecond granularity to the instant timestamp. It is worth noting that this does not guarantee a collision-free solution so we also include steps to reconcile state in case it does happen.

Brief change log

(for example:)

  • Modified HoodieActiveTimeline.COMMIT_FORMATTER to add milliseconds

Verify this pull request

  • Ran and verified existing unit tests around timeline operations
  • Added unit tests to verify that previous expectations around instant date math still hold true
  • Manually tested multiple writers which created hundreds of collision-free timeline entries

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

@davehagman
Copy link
Contributor Author

Still working to test this. We are going to have issues parsing previous commits that do not have the milliseconds on them so I'm coming up with a strategy for that

@davehagman
Copy link
Contributor Author

@vinothchandar Thanks for the heads up. My last commit abstracts the serde operations into methods on the timeline so that we can properly handle the varying instant formats. I will poke around the compaction code to see if I can spot the special case.

@nsivabalan
Copy link
Contributor

Here are the places that needs fixes

  1. compaction trigger based on time elapsed in secs.
  • no changes required in metadata table. all we do is string comparison. bootstrap commit time is "0000000000000" (13 chars).
  • these are the instants used in bootstrap code. "00000000000000", "00000000000001" and "00000000000002". all of them are 14 chars. again, don't think any changes are required here. string comparison should work out even if there are strings with larger characters.

@davehagman
Copy link
Contributor Author

@nsivabalan @vinothchandar That should be fine as we now use the methods on the active timeline to parse the instant which will handle second and ms granularity here

@davehagman
Copy link
Contributor Author

Added test to validate that date math that is being done by compaction still works

@nsivabalan
Copy link
Contributor

Also, checked HoodieHeartBeatClient to check if we have any interplay there.

Looks like this is what we are doing and looks like we may not need any changes. just wanted to bring to others notice.

Every commit will create a file under heartbeat folder. file name is same as instant time.
And at regular intervals will keep overriding it. and we rely on lastModification time of the file to check for expiration at regular cadence.
@davehagman : can you confirm my understanding. old commit times will still be interpreted as is and we don't upgrade them every to new format right. Just that the newer ones we generated are in new format.

@davehagman
Copy link
Contributor Author

we don't upgrade them every to new format right. Just that the newer ones we generated are in new format.

One potential issue I see is that if you have an old instant without millis such as 20210101120101 and you then parse that to a date (by calling parseDateFromInstantTime), and back to an instant ID you will now have 20210101120101000 (millis attached). If for some reason we try to lookup a specific instant based purely on a Date instance then it will not be found (since the actual persisted instant does not have the 000 appended).

I am trying to find examples of that happening in the codebase but if you know of something then I can focus on that.

@davehagman
Copy link
Contributor Author

The HoodieTimeline.compareTimestamps method (example usage here) which is used to compare instants, should continue to work fine regardless of second or ms granularity because we are just doing simple string comparisons.

All that to say that old commit times will continue to evaluate against new instant timestamps just fine @nsivabalan

@davehagman
Copy link
Contributor Author

davehagman commented Oct 25, 2021

@nsivabalan These tests are failing even on the master branch for me. Are you able to reproduce this locally? I'm unsure how my changes could have affecting these tests but I'll continue to look into it.

@nsivabalan
Copy link
Contributor

yeah, master is broken ATM. I have put up a patch to fix that. once CI succeeds, will merge the fix in.

@nsivabalan
Copy link
Contributor

@hudi-bot azure run

@nsivabalan
Copy link
Contributor

@hudi-bot azure run

@nsivabalan nsivabalan changed the title [HUDI-1292] Millisecond granularity for instant timestamps [HUDI-2559] Millisecond granularity for instant timestamps Nov 3, 2021
@hudi-bot
Copy link
Collaborator

hudi-bot commented Nov 5, 2021

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@vinothchandar
Copy link
Member

we try to lookup a specific instant based purely on a Date instance then it will not be found (since the actual persisted instant does not have the 000 appended).

this is the main part and also all these solo time, bootstrap instant time and the special instants we have in code already, have higher granularity.

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would love having a functional test here with both sec and millisecond granularity mixed into the timeline. I ll check few other things and this should be good to go.

val instantLength = queryInstant.length
if (instantLength == 19 || instantLength == 23) { // for yyyy-MM-dd HH:mm:ss[:SSS]
HoodieActiveTimeline.getInstantForDateString(queryInstant)
} else if (instantLength == 14 || instantLength == 17) { // for yyyyMMddHHmmss[SSS]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we use the 17/ instant length, from the constant we have up above?

@nsivabalan
Copy link
Contributor

@vinothchandar : writing a UT in TestHoodieActiveTimeline would suffice, just to compare old and new timestamp formats? or are you looking for tests at write client layer sort of.

@vinothchandar
Copy link
Member

I am looking for a FT, to ensure an existing table can transition correctly to the new format. yeah may be writing this at timeline level and ensuring all API guarantees are met is a good addition.

Regardless, we should perform some upgrade testing out of band.

@vinothchandar
Copy link
Member

I was trying to understand how this plays with the bootstrap instant times here.

// Instant corresponding to pristine state of the table after its creation

00000000000000 is still YYYYMMDDHHMMSS. So we good.

@vinothchandar
Copy link
Member

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM . thanks @davehagman for this! @nsivabalan can we get more tests in, test it out using yamls as well , ensure things like time based compaction work (may be another UT). and we can land. (rebase and ci)

@nsivabalan
Copy link
Contributor

summary of tests I did with this patch. Looks to be good.

  1. Ran test suite yaml for 10 iterations (~50 commits) for both table types with inserts, updates and deletes.
  2. Upgrade testing:
    Created a table with 090. After clustering was scheduled killed the deltasreamer. so all commits and replace commit so far are in secs granularity.
    Used this patch and restarted the deltastreamer. new commits were in ms granularity. As expected inflight clustering was rolledback and again attempted with same secs granuality. And succeeds too.
    After few commits, new clustering was seen with ms granularity.

Attaching screen shots from my local testing.

Screen Shot 2021-11-12 at 11 05 04 PM

Screen Shot 2021-11-12 at 11 13 50 PM

Screen Shot 2021-11-12 at 11 15 02 PM

@xushiyan
Copy link
Member

closing in favor of #4024

@xushiyan xushiyan closed this Nov 22, 2021
@haggy haggy deleted the ms_granularity_instants branch June 24, 2022 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:blocker Production down; release blocker

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants