-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a Chronon service module to add support for serving features #873
base: main
Are you sure you want to change the base?
Conversation
4c405a4
to
b75aded
Compare
Hey @piyush-zlai - could you comment on why Vert.x vs alternatives? |
@caiocamatta-stripe - called out in the PR description but I'll expand more:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@piyush-zlai thanks for the additional commentary on why Vertx.
Overall this looks good to me! This being a fairly large (albeit safe) change that adds new dependencies, you may want to get reviews from a least a couple other people.
One final thing that still wasn't clear to me - how does the service get the schemas for GroupBys it's serving? Or is that purely the Fetcher's responsibility?
"ch.qos.logback" % "logback-classic" % "1.2.11", | ||
"org.slf4j" % "slf4j-api" % "1.7.36", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my own learning/curiosity, any comment on why logback instead of log4j2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I might have hit some dependency issues on log4j2. Could give that a shot again if you'd prefer log4j2. It seems to be the newer framework (though there seem to have been a few sec issues found from time to time). The way we've configured logging currently though is to write async so perf wise we should see pretty low logging overhead.
@@ -0,0 +1,76 @@ | |||
# Chronon Feature Service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we explicitly document the endpoints available here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this but I expect the docs to get outdated soon. A better approach might be something like wiring up OpenApi / Swagger annotations. Could tackle this in a follow up.
/** | ||
* Responsible for loading the relevant concrete Chronon Api implementation and providing that | ||
* for use in the Web service code. We follow similar semantics as the Driver to configure this: | ||
* online.jar - Jar that contains the implementation of the Api | ||
* online.class - Name of the Api class | ||
* online.api.props - Structure that contains fields that are loaded and passed to the Api implementation | ||
* during instantiation to configure it (e.g. connection params) | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really appreciate the comments before the classes!
@@ -0,0 +1,76 @@ | |||
# Chronon Feature Service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's worth documenting the existence of this service in the chronon website?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah def worth it. I was thinking of letting it bake / clearing out some nits etc and putting up a doc update PR in a follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plus one to this.
} | ||
|
||
@Override | ||
public void handle(RoutingContext ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also just for my own knowledge, will vertx use one thread per request it receives? how does it handle concurrent requests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vertx's model is event driven. Various triggers like incoming http requests result in events that are queued. There's a pool of event threads that dequeue these events and dispatch them to get handled. Blocking code in the handler will jam up 1 of your n event threads (n is configurable and defaults to num cpu cores) so typically you kick off work like db lookups etc in a different thread in your handler to not block the precious event threads.
|
||
maybeFeatureResponses.onSuccess(resultList -> { | ||
// as this is a bulkGet request, we might have some successful and some failed responses | ||
// we return the responses in the same order as they come in and mark them as successful / failed based |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is order the only way of knowing what response maps to what request in the bulkget call? (still figuring out what the requests and responses look like)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the order matches exactly. Similar pattern to other bulkget apis out there (e.g. fb graph bulkget ).
This looks like:
Request - /v1/features/join/quickstart/training_set.v2
[{"user_id": "5"}, {"user_id": "6"}, {"user_id": "7"}]
Response:
{
"results": [
{"status": "Success", "features": { ...},
{"status": "Success", "features": { ...},
{"status": "Failure", "error": "Some exception",
]
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'd imagine calls would most often just be for a single element. I've wired it up as a bulkGet primarily cause the fetcher api is a bulk api and instead of sticking to a list of one it was easier to have things set up this way)
@caiocamatta-stripe thanks for taking a look. I'll tag some folks in the coming days (feel free to loop in others at Stripe as well if you'd like more eyes on this).
Yeah this is done as part of the GroupByUpload jobs and the MetadataUpload (join schemas) - https://chronon.ai/getting_started/Tutorial.html#uploading-data (you need to have these upload to the same KV store as you're using in your fetcher) If you have those two sets of jobs then your KV store is primed with GroupByUploads (which contains one row which is your GroupByServingInfo data) and the join schema. |
Summary
This PR adds a module to the project to spin up a service that wraps the Fetcher code and allows us to serve features. This module leverages the Vert.x project. Vert.x is a high performance service toolkit that allows us to build http / grpc services - as the serving layer can be on the hot path for some use-cases, the high perf nature of Vert.x will hopefully be a good starting point. Vert.x core is Java based and given we have 3 Scala versions to support in the Chronon project, I've chosen the path of writing the service in Java rather than dealing with the interop and the multi-version compat story for now.
The service as such adds a couple of HTTP endpoints that allow us to serve groupBys and joins (each returning a json payload). We can extend things in the future with an equivalent set of grpc endpoints if needed. Endpoints look like:
Why / Goal
Currently users getting going on the project have to take on the additional work of writing a feature serving layer of their own (or rely on a set of disparate clients calling and using the Fetcher libs directly). This makes it painful to set up and this burden is exacerbated for users that don't natively work on JVM languages. Hooking up the fetcher code in other languages is non-trivial as it requires rewriting a lot of the core fetcher logic and maintaining parity across as things change over time. Including a service module within the project aims to lower that barrier of entry as it allows users to package the service and have their non-JVM services hit it and get feature data over the wire.
Test Plan
(Readme in the service module directory has some details / instructions on how you can spin up the project and test against the quickstart mongo api)
Checklist
Reviewers
@caiocamatta-stripe @mickjermsurawong-openai