Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data Buckets] Distributed Databucket Caching #3500

Merged
merged 24 commits into from
Jul 24, 2023

Conversation

Kinglykrab
Copy link
Contributor

@Kinglykrab Kinglykrab commented Jul 16, 2023

Goal(s)

As a server operator, have keys cached in memory for readily available access instead of hitting the database. This would allow me to be able to use data buckets more aggressively in cases where I would like to more frequently reference values without incurring database I/O penalty

Problem(s)

Round Trip Database Hits

If you want to use a single data bucket entry in a database intensively for a series of quests and potentially poll it more frequently, you result in more frequent database hits which can add up over time.

Frequent Quest Utilization

If using data-buckets and iterating through an entity list, you can easily add up to 30-50 queries to check the existence of a single key if it is not already cached. This results in round trips to the database and on a game server, we want to avoid doing so if we can.

Solution(s)

Distributed Memory Cache

If we turn the server back-end into a distributed memory cache, we can maintain a reference to flags that are most important and most relevant to an entities progress and have it be available with almost minimal cost.

This would consist of having all zones be part of the distributed cache, while using world as the in between for updates.

Challenge(s)

Race Conditions

Keeping data unique across all zone processes can become a problem of race conditions.

When updates are created in one zone, they need to be propagated to other zones. You can have packets be stormed cross zone especially when the same flag is being manipulated in multiple zones at the same time (albeit rarer).

De-Duplication

You need to have a way to keep the updates unique and discard and de-dupe the ones that are not the latest update.

The update message would need to not only include what is to be updated (key, value, expires, scopes) to the other processes, but would also need to include a timestamp down to the nanosecond so that other processes can use its same reference of time to decide to either discard or update an internal reference to a cached entry.

When a zone process receives a request to update its cache, it will compare it with what it has locally to see if the timestamp is less than the one it received. If the new message is older than the current timestamp, it will discard the message

Updates

Update messages are generated from within the existing data-bucket Set methods and would still generate a database write while preparing a message/packet to be sent to the other zone processes.

The update would include a struct that would load the necessary update and communicate it to the other zone processes.

struct DataBucketCacheEntry {
	DataBucketsRepository::DataBuckets e;
	int64_t                            updated_time{};
	DataBucketCacheUpdateAction        update_action{};

	template<class Archive>
	void serialize(Archive &ar)
	{
		ar(
			CEREAL_NVP(e),
			CEREAL_NVP(updated_time),
			CEREAL_NVP(update_action)
		);
	}
};

update_time will be a Unix timestamp at the resolution level of nanoseconds which should be enough resolution to delineate duplicate update messages relative to system clock.

    std::cout << "milliseconds " << std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now().time_since_epoch()).count() << std::endl;
    std::cout << "microseconds " << std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::system_clock::now().time_since_epoch()).count() << std::endl;
    std::cout << "nanoseconds " << std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::system_clock::now().time_since_epoch()).count() << std::endl;
milliseconds 1689483912387
microseconds 1689483912387701
nanoseconds 1689483912387702810

When Updates are Sent

After a message is committed to the database and a database record id is set. It can be used to pack a DataBucketEntry struct and propagate the message.

When Updates are Processed

Message is received from zone <-> world <-> zone, the decision logic is as followed

  • Look through internal cache of std::vector<DataBucketCacheEntry> and find if there is an update that matches the id that of what was sent in the update
  • If there there is no id that matches, simply discard the message
    • If there is a match of id - determine if the new messages update_time > local update_time - if it is, then update the local record using vector .at(index) = (new) DataBucketCacheEntry

Deletions

Deletions would need to be handled in the same way as updates, by deleting entries located locally within a zone.

Loading Scoped Entries

Scoped entries would be loaded when that entity enters a zone and loaded within that zone. This would enable any bucket entries or flags that are checked on zone-in to simply reference what is in memory instead of making repeated hits to the database.

Other zone processes don't need to be aware of that entities flags because it is unlikely it will need to be referenced cross zone and if it needs to reference it, a database call may be necessary but an infrequent event.

Unloading Scoped Entities

Scoped entries would be unloaded when an entity is destructed, IE when a client leaves a zone.

Testing

Putting all of my tests here

Bulk NPC Loading

image

Bucket Miss Cache

image

Benchmark

Before we implemented caching

image

After

image

35x at 100,000 iterations
340x at 1,000,000 iterations

General Lifecycle Testing

image

sub EVENT_SAY {
    if ($text=~/set/i) {
        quest::set_data("test", 100);   
    }
    if ($text=~/get/i) {
        $client->Message(15, quest::get_data("test"));
    }
    if ($text=~/delete/i) {
        quest::delete_data("test");
    }
}

Distributed Cache Testing

The distributed cache stays in sync with what is maintained in the database. The database is always the source of truth and database writes are always done immediately.

Test Cases

  • Used 5 zones, 3 that had no players and 2 that had players. 3 that had no players also receive updates and stay fully in sync with their cache. New zones that come alive boot with no cache will receive updates if there are any and maintain their own pool.
  • Tested accessing a global bucket using get test commands simultaneously, both showing expected results when a key is deleted, updated, or set
  • Tested accessing a global bucket using get test commands simultaneously while using set on alternating characters with no expiration
  • Tested accessing a global bucket using get test commands while using set on alternating characters with 3 second expiration. We ensure that the expiration is renewed in destination zones. Also alternated this between the zones and deletion/creation/update life cycle works as expected in all respects. Both showing expected results when a key is deleted, updated, or set
  • Testing new value updates, ensuring value is propagated properly

Creation (Created in zone, and then propagated)

image

Update (Zone to Zone)

image

Bucket Deletion (with propagation)

image

Reading of an Expired Key (Deletion propagated)

image

# Notes
- Adds a data bucket cache so we're not needlessly hitting the database every time we need to read a data bucket value.
@Kinglykrab Kinglykrab marked this pull request as draft July 16, 2023 19:53
@Akkadius
Copy link
Member

Putting all of my tests here

Bulk NPC Loading

image

Bucket Miss Cache

image

Benchmark

Before we implemented caching

image

After

image

35x at 100,000 iterations
340x at 1,000,000 iterations

General Lifecycle Testing

image

sub EVENT_SAY {
    if ($text=~/set/i) {
        quest::set_data("test", 100);   
    }
    if ($text=~/get/i) {
        $client->Message(15, quest::get_data("test"));
    }
    if ($text=~/delete/i) {
        quest::delete_data("test");
    }
}

@Akkadius Akkadius force-pushed the data_buckets/data_buckets_zone_cache branch from 2d3ea0b to 1709398 Compare July 19, 2023 02:03
@Akkadius Akkadius changed the title [Data Buckets] Zone-Based Data Bucket Caching [Data Buckets] Distributed Databucket Caching Jul 20, 2023
@Akkadius
Copy link
Member

Been pairing with Kingly on this PR

The distributed cache stays in sync with what is maintained in the database. The database is always the source of truth and database writes are always done immediately.

Test Cases

  • Used 5 zones, 3 that had no players and 2 that had players. 3 that had no players also receive updates and stay fully in sync with their cache. New zones that come alive boot with no cache will receive updates if there are any and maintain their own pool.
  • Tested accessing a global bucket using get test commands simultaneously, both showing expected results when a key is deleted, updated, or set
  • Tested accessing a global bucket using get test commands simultaneously while using set on alternating characters with no expiration
  • Tested accessing a global bucket using get test commands while using set on alternating characters with 3 second expiration. We ensure that the expiration is renewed in destination zones. Also alternated this between the zones and deletion/creation/update life cycle works as expected in all respects. Both showing expected results when a key is deleted, updated, or set
  • Testing new value updates, ensuring value is propagated properly

Creation (Created in zone, and then propagated)

image

Update (Zone to Zone)

image

Bucket Deletion (with propagation)

image

Reading of an Expired Key (Deletion propagated)

image

@Akkadius
Copy link
Member

Tested using the same key across scoped and global and found an interesting condition that was fixed in 87c6a77

Test script

sub EVENT_SAY {
    if ($text=~/set/i) {
        $client->SetBucket("Test", 107, "10s");
    } elsif ($text=~/get/i) {
        my $data = $client->GetBucket("Test");
        quest::message(315, $data ne "" ? "player: " . $data : 0);
    } elsif ($text=~/delete/i) {
        $client->DeleteBucket("Test");
    }  
    if ($text=~/set/i) {
        quest::set_data("Test", 3, "20s");
    } elsif ($text=~/get/i) {
        my $data = quest::get_data("Test");
        quest::message(315, $data ne "" ? "global: " . $data : 0);
    } elsif ($text=~/delete/i) {
        quest::delete_data("Test");
    }
}

@Akkadius
Copy link
Member

Tested spamming deletes and creates back to back

@Akkadius Akkadius marked this pull request as ready for review July 22, 2023 04:08
@Akkadius
Copy link
Member

Been running for several days on Wayfarers

@Akkadius Akkadius merged commit a75648f into master Jul 24, 2023
@Akkadius Akkadius deleted the data_buckets/data_buckets_zone_cache branch July 24, 2023 17:22
@Akkadius Akkadius mentioned this pull request Jul 28, 2023
joligario added a commit to ProjectEQ/peqphpeditor that referenced this pull request Aug 13, 2023
fryguy503 pushed a commit to wayfarershaven/phpeditor that referenced this pull request Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants