[Data Buckets] Distributed Databucket Caching #3500

Kinglykrab · 2023-07-16T19:53:22Z

Goal(s)

As a server operator, have keys cached in memory for readily available access instead of hitting the database. This would allow me to be able to use data buckets more aggressively in cases where I would like to more frequently reference values without incurring database I/O penalty

Problem(s)

Round Trip Database Hits

If you want to use a single data bucket entry in a database intensively for a series of quests and potentially poll it more frequently, you result in more frequent database hits which can add up over time.

Frequent Quest Utilization

If using data-buckets and iterating through an entity list, you can easily add up to 30-50 queries to check the existence of a single key if it is not already cached. This results in round trips to the database and on a game server, we want to avoid doing so if we can.

Solution(s)

Distributed Memory Cache

If we turn the server back-end into a distributed memory cache, we can maintain a reference to flags that are most important and most relevant to an entities progress and have it be available with almost minimal cost.

This would consist of having all zones be part of the distributed cache, while using world as the in between for updates.

Challenge(s)

Race Conditions

Keeping data unique across all zone processes can become a problem of race conditions.

When updates are created in one zone, they need to be propagated to other zones. You can have packets be stormed cross zone especially when the same flag is being manipulated in multiple zones at the same time (albeit rarer).

De-Duplication

You need to have a way to keep the updates unique and discard and de-dupe the ones that are not the latest update.

The update message would need to not only include what is to be updated (key, value, expires, scopes) to the other processes, but would also need to include a timestamp down to the nanosecond so that other processes can use its same reference of time to decide to either discard or update an internal reference to a cached entry.

When a zone process receives a request to update its cache, it will compare it with what it has locally to see if the timestamp is less than the one it received. If the new message is older than the current timestamp, it will discard the message

Updates

Update messages are generated from within the existing data-bucket Set methods and would still generate a database write while preparing a message/packet to be sent to the other zone processes.

The update would include a struct that would load the necessary update and communicate it to the other zone processes.

struct DataBucketCacheEntry {
	DataBucketsRepository::DataBuckets e;
	int64_t                            updated_time{};
	DataBucketCacheUpdateAction        update_action{};

	template<class Archive>
	void serialize(Archive &ar)
	{
		ar(
			CEREAL_NVP(e),
			CEREAL_NVP(updated_time),
			CEREAL_NVP(update_action)
		);
	}
};

update_time will be a Unix timestamp at the resolution level of nanoseconds which should be enough resolution to delineate duplicate update messages relative to system clock.

    std::cout << "milliseconds " << std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now().time_since_epoch()).count() << std::endl;
    std::cout << "microseconds " << std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::system_clock::now().time_since_epoch()).count() << std::endl;
    std::cout << "nanoseconds " << std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::system_clock::now().time_since_epoch()).count() << std::endl;

milliseconds 1689483912387
microseconds 1689483912387701
nanoseconds 1689483912387702810

When Updates are Sent

After a message is committed to the database and a database record id is set. It can be used to pack a DataBucketEntry struct and propagate the message.

When Updates are Processed

Message is received from zone <-> world <-> zone, the decision logic is as followed

Look through internal cache of std::vector<DataBucketCacheEntry> and find if there is an update that matches the id that of what was sent in the update
If there there is no id that matches, simply discard the message
- If there is a match of id - determine if the new messages update_time > local update_time - if it is, then update the local record using vector .at(index) = (new) DataBucketCacheEntry

Deletions

Deletions would need to be handled in the same way as updates, by deleting entries located locally within a zone.

Loading Scoped Entries

Scoped entries would be loaded when that entity enters a zone and loaded within that zone. This would enable any bucket entries or flags that are checked on zone-in to simply reference what is in memory instead of making repeated hits to the database.

Other zone processes don't need to be aware of that entities flags because it is unlikely it will need to be referenced cross zone and if it needs to reference it, a database call may be necessary but an infrequent event.

Unloading Scoped Entities

Scoped entries would be unloaded when an entity is destructed, IE when a client leaves a zone.

Testing

Putting all of my tests here

Bulk NPC Loading

Bucket Miss Cache

Benchmark

Before we implemented caching

After

35x at 100,000 iterations
340x at 1,000,000 iterations

General Lifecycle Testing

sub EVENT_SAY {
    if ($text=~/set/i) {
        quest::set_data("test", 100);   
    }
    if ($text=~/get/i) {
        $client->Message(15, quest::get_data("test"));
    }
    if ($text=~/delete/i) {
        quest::delete_data("test");
    }
}

Distributed Cache Testing

The distributed cache stays in sync with what is maintained in the database. The database is always the source of truth and database writes are always done immediately.

Test Cases

Used 5 zones, 3 that had no players and 2 that had players. 3 that had no players also receive updates and stay fully in sync with their cache. New zones that come alive boot with no cache will receive updates if there are any and maintain their own pool.
Tested accessing a global bucket using get test commands simultaneously, both showing expected results when a key is deleted, updated, or set
Tested accessing a global bucket using get test commands simultaneously while using set on alternating characters with no expiration
Tested accessing a global bucket using get test commands while using set on alternating characters with 3 second expiration. We ensure that the expiration is renewed in destination zones. Also alternated this between the zones and deletion/creation/update life cycle works as expected in all respects. Both showing expected results when a key is deleted, updated, or set
Testing new value updates, ensuring value is propagated properly

Creation (Created in zone, and then propagated)

Update (Zone to Zone)

Bucket Deletion (with propagation)

Reading of an Expired Key (Deletion propagated)

# Notes - Adds a data bucket cache so we're not needlessly hitting the database every time we need to read a data bucket value.

Akkadius · 2023-07-17T04:27:20Z

Putting all of my tests here

Bulk NPC Loading

Bucket Miss Cache

Benchmark

Before we implemented caching

After

35x at 100,000 iterations
340x at 1,000,000 iterations

General Lifecycle Testing

sub EVENT_SAY {
    if ($text=~/set/i) {
        quest::set_data("test", 100);   
    }
    if ($text=~/get/i) {
        $client->Message(15, quest::get_data("test"));
    }
    if ($text=~/delete/i) {
        quest::delete_data("test");
    }
}

- Adds a data bucket cache so we're not needlessly hitting the database every time we need to read a data bucket value.

Akkadius · 2023-07-20T04:06:01Z

Been pairing with Kingly on this PR

The distributed cache stays in sync with what is maintained in the database. The database is always the source of truth and database writes are always done immediately.

Test Cases

Used 5 zones, 3 that had no players and 2 that had players. 3 that had no players also receive updates and stay fully in sync with their cache. New zones that come alive boot with no cache will receive updates if there are any and maintain their own pool.
Tested accessing a global bucket using get test commands simultaneously, both showing expected results when a key is deleted, updated, or set
Tested accessing a global bucket using get test commands simultaneously while using set on alternating characters with no expiration
Tested accessing a global bucket using get test commands while using set on alternating characters with 3 second expiration. We ensure that the expiration is renewed in destination zones. Also alternated this between the zones and deletion/creation/update life cycle works as expected in all respects. Both showing expected results when a key is deleted, updated, or set
Testing new value updates, ensuring value is propagated properly

Creation (Created in zone, and then propagated)

Update (Zone to Zone)

Bucket Deletion (with propagation)

Reading of an Expired Key (Deletion propagated)

… results

Akkadius · 2023-07-20T04:56:15Z

Tested using the same key across scoped and global and found an interesting condition that was fixed in 87c6a77

Test script

sub EVENT_SAY {
    if ($text=~/set/i) {
        $client->SetBucket("Test", 107, "10s");
    } elsif ($text=~/get/i) {
        my $data = $client->GetBucket("Test");
        quest::message(315, $data ne "" ? "player: " . $data : 0);
    } elsif ($text=~/delete/i) {
        $client->DeleteBucket("Test");
    }  
    if ($text=~/set/i) {
        quest::set_data("Test", 3, "20s");
    } elsif ($text=~/get/i) {
        my $data = quest::get_data("Test");
        quest::message(315, $data ne "" ? "global: " . $data : 0);
    } elsif ($text=~/delete/i) {
        quest::delete_data("Test");
    }
}

Akkadius · 2023-07-20T05:12:17Z

Tested spamming deletes and creates back to back

Akkadius · 2023-07-24T17:22:46Z

Been running for several days on Wayfarers

[Data Buckets] Zone-Based Data Bucket Caching

68bc6f8

# Notes - Adds a data bucket cache so we're not needlessly hitting the database every time we need to read a data bucket value.

Kinglykrab marked this pull request as draft July 16, 2023 19:53

Akkadius and others added 7 commits July 16, 2023 15:17

Cleanup and unify GetData access patterns

39cf946

Cache work

fd884a5

Push

496deee

Add to cache when we fetch and do a db hit

d84caf9

Handle bucket misses in cache

a952154

Formatting

f13e929

Logging

2d3ea0b

Kinglykrab and others added 7 commits July 18, 2023 20:56

[Data Buckets] Zone-Based Data Bucket Caching

c25254c

- Adds a data bucket cache so we're not needlessly hitting the database every time we need to read a data bucket value.

Cleanup and unify GetData access patterns

23083d8

Cache work

2f6821b

Push

0d306f6

Add to cache when we fetch and do a db hit

8623e8e

Handle bucket misses in cache

60575f8

Formatting

1709398

Akkadius force-pushed the data_buckets/data_buckets_zone_cache branch from 2d3ea0b to 1709398 Compare July 19, 2023 02:03

Akkadius and others added 5 commits July 18, 2023 21:05

Remove redundant fetches from cache since GetData does the same thing

127f32f

Merge master.

3acdc4b

Push progress

b42020c

Distributed cache work

5b449d7

Logging

e38d8b7

Akkadius changed the title ~~[Data Buckets] Zone-Based Data Bucket Caching~~ [Data Buckets] Distributed Databucket Caching Jul 20, 2023

Akkadius added 2 commits July 19, 2023 23:45

Fix issue with scoping where same named keys could return overlapping…

87c6a77

… results

Merge branch 'master' into data_buckets/data_buckets_zone_cache

1281dac

Akkadius marked this pull request as ready for review July 22, 2023 04:08

Akkadius and others added 2 commits July 22, 2023 00:11

Misses cache tweak, logging, comments

7aee550

Add bot, client, and NPC bucket methods to Lua.

bf4dbb7

Akkadius approved these changes Jul 24, 2023

View reviewed changes

Akkadius merged commit a75648f into master Jul 24, 2023

Akkadius deleted the data_buckets/data_buckets_zone_cache branch July 24, 2023 17:22

Akkadius mentioned this pull request Jul 28, 2023

[Release] 22.22.0 #3513

Merged

joligario added a commit to ProjectEQ/peqphpeditor that referenced this pull request Aug 13, 2023

Rework/fix data buckets including EQEmu/Server#3500

0210b00

fryguy503 pushed a commit to wayfarershaven/phpeditor that referenced this pull request Oct 26, 2023

Rework/fix data buckets including EQEmu/Server#3500

a155122

Akkadius mentioned this pull request Nov 29, 2024

[Databuckets] Improved Reliability and Performance of Databuckets #4562

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Data Buckets] Distributed Databucket Caching #3500

[Data Buckets] Distributed Databucket Caching #3500

Kinglykrab commented Jul 16, 2023 •

edited by Akkadius

Loading

Akkadius commented Jul 17, 2023

Akkadius commented Jul 20, 2023

Akkadius commented Jul 20, 2023

Akkadius commented Jul 20, 2023

Akkadius commented Jul 24, 2023

[Data Buckets] Distributed Databucket Caching #3500

[Data Buckets] Distributed Databucket Caching #3500

Conversation

Kinglykrab commented Jul 16, 2023 • edited by Akkadius Loading

Goal(s)

Problem(s)

Round Trip Database Hits

Frequent Quest Utilization

Solution(s)

Distributed Memory Cache

Challenge(s)

Race Conditions

De-Duplication

Updates

When Updates are Sent

When Updates are Processed

Deletions

Loading Scoped Entries

Unloading Scoped Entities

Testing

Bulk NPC Loading

Bucket Miss Cache

Benchmark

General Lifecycle Testing

Distributed Cache Testing

Test Cases

Creation (Created in zone, and then propagated)

Update (Zone to Zone)

Bucket Deletion (with propagation)

Reading of an Expired Key (Deletion propagated)

Akkadius commented Jul 17, 2023

Bulk NPC Loading

Bucket Miss Cache

Benchmark

General Lifecycle Testing

Akkadius commented Jul 20, 2023

Test Cases

Creation (Created in zone, and then propagated)

Update (Zone to Zone)

Bucket Deletion (with propagation)

Reading of an Expired Key (Deletion propagated)

Akkadius commented Jul 20, 2023

Akkadius commented Jul 20, 2023

Akkadius commented Jul 24, 2023

Kinglykrab commented Jul 16, 2023 •

edited by Akkadius

Loading