Add a Chronon service module to add support for serving features #873

piyush-zlai · 2024-11-07T18:58:37Z

Summary

This PR adds a module to the project to spin up a service that wraps the Fetcher code and allows us to serve features. This module leverages the Vert.x project. Vert.x is a high performance service toolkit that allows us to build http / grpc services - as the serving layer can be on the hot path for some use-cases, the high perf nature of Vert.x will hopefully be a good starting point. Vert.x core is Java based and given we have 3 Scala versions to support in the Chronon project, I've chosen the path of writing the service in Java rather than dealing with the interop and the multi-version compat story for now.

The service as such adds a couple of HTTP endpoints that allow us to serve groupBys and joins (each returning a json payload). We can extend things in the future with an equivalent set of grpc endpoints if needed. Endpoints look like:

/ping -> Health check
/config -> Returns the svc config
/v1/features/join/<joinName> -> Bulk get request for a given join
/v1/features/groupby/<groupByName> -> Bulk get request for a given group by

Why / Goal

Currently users getting going on the project have to take on the additional work of writing a feature serving layer of their own (or rely on a set of disparate clients calling and using the Fetcher libs directly). This makes it painful to set up and this burden is exacerbated for users that don't natively work on JVM languages. Hooking up the fetcher code in other languages is non-trivial as it requires rewriting a lot of the core fetcher logic and maintaining parity across as things change over time. Including a service module within the project aims to lower that barrier of entry as it allows users to package the service and have their non-JVM services hit it and get feature data over the wire.

Test Plan

Added Unit Tests
Covered by existing CI
Integration tested

(Readme in the service module directory has some details / instructions on how you can spin up the project and test against the quickstart mongo api)

Checklist

Documentation update

Reviewers

@caiocamatta-stripe @mickjermsurawong-openai

caiocamatta-stripe · 2024-11-19T16:00:32Z

Hey @piyush-zlai - could you comment on why Vert.x vs alternatives?

piyush-zlai · 2024-11-19T20:06:55Z

@caiocamatta-stripe - called out in the PR description but I'll expand more:

Vertx is a fairly high performance web service framework. A few benchmarks have found it to be better than others such as Play - Link, Link.
Play is Scala native but the challenge is that as Chronon is using a bunch of different older Scala versions - 2.10, 2.11 and 2.12, we need to deal with this licensing issue - Licensing. Support for older Scala versions requires us to use the Akka fork of Play (Play 2.9.x). If Chronon were on Scala 2.13 we could go with Play 3.x (which is on Pekko).
Vertx being a Java based framework also makes life easier as we don't need to deal with building / cross compiling across 3 Scala versions.
Vertx has a fairly rich set of docs and pretty good support for tracing, testing, grpc etc so the rails from a web service framework are fairly extensible.

caiocamatta-stripe

@piyush-zlai thanks for the additional commentary on why Vertx.

Overall this looks good to me! This being a fairly large (albeit safe) change that adds new dependencies, you may want to get reviews from a least a couple other people.

One final thing that still wasn't clear to me - how does the service get the schemas for GroupBys it's serving? Or is that purely the Fetcher's responsibility?

caiocamatta-stripe · 2024-11-21T14:16:15Z

build.sbt

+      "ch.qos.logback" % "logback-classic" % "1.2.11",
+      "org.slf4j" % "slf4j-api" % "1.7.36",


For my own learning/curiosity, any comment on why logback instead of log4j2?

I think I might have hit some dependency issues on log4j2. Could give that a shot again if you'd prefer log4j2. It seems to be the newer framework (though there seem to have been a few sec issues found from time to time). The way we've configured logging currently though is to write async so perf wise we should see pretty low logging overhead.

caiocamatta-stripe · 2024-11-21T14:22:59Z

service/README.md

@@ -0,0 +1,76 @@
+# Chronon Feature Service


Should we explicitly document the endpoints available here?

I thought about this but I expect the docs to get outdated soon. A better approach might be something like wiring up OpenApi / Swagger annotations. Could tackle this in a follow up.

caiocamatta-stripe · 2024-11-21T14:24:16Z

service/src/main/java/ai/chronon/service/ApiProvider.java

+/**
+ * Responsible for loading the relevant concrete Chronon Api implementation and providing that
+ * for use in the Web service code. We follow similar semantics as the Driver to configure this:
+ * online.jar - Jar that contains the implementation of the Api
+ * online.class - Name of the Api class
+ * online.api.props - Structure that contains fields that are loaded and passed to the Api implementation
+ * during instantiation to configure it (e.g. connection params)
+ */


I really appreciate the comments before the classes!

caiocamatta-stripe · 2024-11-21T14:32:27Z

service/README.md

@@ -0,0 +1,76 @@
+# Chronon Feature Service


Do you think it's worth documenting the existence of this service in the chronon website?

Yeah def worth it. I was thinking of letting it bake / clearing out some nits etc and putting up a doc update PR in a follow up.

Plus one to this.

caiocamatta-stripe · 2024-11-21T14:34:51Z

service/src/main/java/ai/chronon/service/handlers/FeaturesHandler.java

+    }
+
+    @Override
+    public void handle(RoutingContext ctx) {


Also just for my own knowledge, will vertx use one thread per request it receives? how does it handle concurrent requests?

Vertx's model is event driven. Various triggers like incoming http requests result in events that are queued. There's a pool of event threads that dequeue these events and dispatch them to get handled. Blocking code in the handler will jam up 1 of your n event threads (n is configurable and defaults to num cpu cores) so typically you kick off work like db lookups etc in a different thread in your handler to not block the precious event threads.

caiocamatta-stripe · 2024-11-21T14:39:37Z

service/src/main/java/ai/chronon/service/handlers/FeaturesHandler.java

+
+        maybeFeatureResponses.onSuccess(resultList -> {
+            // as this is a bulkGet request, we might have some successful and some failed responses
+            // we return the responses in the same order as they come in and mark them as successful / failed based


Is order the only way of knowing what response maps to what request in the bulkget call? (still figuring out what the requests and responses look like)

Yeah the order matches exactly. Similar pattern to other bulkget apis out there (e.g. fb graph bulkget ).
This looks like:
Request - /v1/features/join/quickstart/training_set.v2

[{"user_id": "5"}, {"user_id": "6"}, {"user_id": "7"}]

Response:

{ "results": [ {"status": "Success", "features": { ...}, {"status": "Success", "features": { ...}, {"status": "Failure", "error": "Some exception", ] }

(I'd imagine calls would most often just be for a single element. I've wired it up as a bulkGet primarily cause the fetcher api is a bulk api and instead of sticking to a list of one it was easier to have things set up this way)

piyush-zlai · 2024-11-21T15:34:47Z

@caiocamatta-stripe thanks for taking a look. I'll tag some folks in the coming days (feel free to loop in others at Stripe as well if you'd like more eyes on this).

One final thing that still wasn't clear to me - how does the service get the schemas for GroupBys it's serving? Or is that purely the Fetcher's responsibility?

Yeah this is done as part of the GroupByUpload jobs and the MetadataUpload (join schemas) - https://chronon.ai/getting_started/Tutorial.html#uploading-data (you need to have these upload to the same KV store as you're using in your fetcher)

If you have those two sets of jobs then your KV store is primed with GroupByUploads (which contains one row which is your GroupByServingInfo data) and the join schema.

piyush-zlai changed the title ~~Add a Chronon service module to add support for serving features~~ [wip] Add a Chronon service module to add support for serving features Nov 7, 2024

piyush-zlai added 18 commits November 12, 2024 11:32

First cut vertx service with ping and config endpoints

c8ee7dd

Add online module

ecfb316

Wire up Api Provider

9ce9b39

Features handler json parsing and tests

4ed9669

More tweaks to feature handler to support bulkGets

7334196

Wire up metrics + add an example config file for quickstart

19b5dfa

Tweak some settings

e485726

Add Readme

696a458

Docs and minor tweaks of the ApiProvider

e2a4e25

Comments on main verticle

07aaea6

Comments on features handler + router

971c1f7

More comments, adding full logback config

acc1d53

Java 8 hoops

38ca32b

Downgrade mockito

157b0f0

More java 8 test issues

d12d18b

Add vertx codegen dep

6cd6670

Trigger Build

f632d26

downgrade log libs to work with java 8

b75aded

piyush-zlai force-pushed the vertx_service branch from 4c405a4 to b75aded Compare November 12, 2024 19:32

piyush-zlai changed the title ~~[wip] Add a Chronon service module to add support for serving features~~ Add a Chronon service module to add support for serving features Nov 12, 2024

Fix assembly and log path

fa05c1e

caiocamatta-stripe approved these changes Nov 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a Chronon service module to add support for serving features #873

Add a Chronon service module to add support for serving features #873

piyush-zlai commented Nov 7, 2024 •

edited

Loading

caiocamatta-stripe commented Nov 19, 2024

piyush-zlai commented Nov 19, 2024

caiocamatta-stripe left a comment

caiocamatta-stripe Nov 21, 2024

piyush-zlai Nov 21, 2024

caiocamatta-stripe Nov 21, 2024

piyush-zlai Nov 21, 2024

caiocamatta-stripe Nov 21, 2024

caiocamatta-stripe Nov 21, 2024

piyush-zlai Nov 21, 2024

pengyu-hou Nov 21, 2024

caiocamatta-stripe Nov 21, 2024

piyush-zlai Nov 21, 2024

caiocamatta-stripe Nov 21, 2024

piyush-zlai Nov 21, 2024

piyush-zlai Nov 21, 2024

piyush-zlai commented Nov 21, 2024

		"ch.qos.logback" % "logback-classic" % "1.2.11",
		"org.slf4j" % "slf4j-api" % "1.7.36",

Add a Chronon service module to add support for serving features #873

Are you sure you want to change the base?

Add a Chronon service module to add support for serving features #873

Conversation

piyush-zlai commented Nov 7, 2024 • edited Loading

Summary

Why / Goal

Test Plan

Checklist

Reviewers

caiocamatta-stripe commented Nov 19, 2024

piyush-zlai commented Nov 19, 2024

caiocamatta-stripe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piyush-zlai commented Nov 21, 2024

piyush-zlai commented Nov 7, 2024 •

edited

Loading