Skip to content

Commit

Permalink
Merge branch 'feature/response-passthrough'
Browse files Browse the repository at this point in the history
  • Loading branch information
jueri committed Nov 11, 2024
2 parents 441cf43 + f8bdf62 commit 44cfb37
Show file tree
Hide file tree
Showing 23 changed files with 908 additions and 496 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/manual-dispatch.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,7 @@ jobs:
python -m pip install pytest
if [ -f web/requirements.txt ]; then pip install -r web/requirements.txt; fi
- name: Test with pytest
env:
CI: true
run: |
pytest
2 changes: 2 additions & 0 deletions .github/workflows/push.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,7 @@ jobs:
python -m pip install pytest
if [ -f web/requirements.txt ]; then pip install -r web/requirements.txt; fi
- name: Test with pytest
env:
CI: true
run: |
pytest
44 changes: 44 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,50 @@
# Release notes
All notable changes to this project will be documented in this file.


## Add response passthrough to ranking endpoints
Previously the STELLA infrastructure demanded a fixed response schema for rankings. The ranking systems were expected to return the documents or items in a certain format and the STELLA app would pass the results after the interleaving als in a certain format. This was not flexible and all content needed to be loaded afterwards from external sources based on the returned ID.

Improving on that, the STELLA App now supports a passthrough mode for the ranking endpoint. This means that the ranking systems can return the documents in any format they like and the STELLA App will return the same format after interleaving. This allows to return the full content of the documents.

To make use of this feature, the experimental systems need additional configurations to tell the STELLA App the JSON Path to the document ranking in the response and the key of the document ID. This can be configured through the `SYSTEMS_CONFIG` environment variable in the docker compose file.

Example:
```
SYSTEMS_CONFIG: |
{
"ranker_base": {"type": "ranker", "base": true, "docid": "id", "hits_path": "$.hits.hits"},
"ranker_exp": {"type": "ranker", "docid": "id", "hits_path": "$.hits.hits"}
}
```

The results are still saved to the database in the base schema of the STELLA app and the original response will not be saved to the database. This is to ensure fast responses and minimize latency. However therefore a new caching mechanism was needed. Therefore, `Flask-Caching` is used. By default, `FileSystemCache` is used, but this can be changed in the `config.py` file.



## Allow system config as JSON
Allow passing the systems config in the docker compose environment variables as a JSON string. This is cleaner and clearer and will allow the configuration of additional system parameters necessary for future updates.

Before:
```
RECSYS_LIST: gesis_rec_pyterrier gesis_rec_pyserini
RECSYS_BASE: gesis_rec_pyterrier
RANKSYS_LIST: gesis_rank_pyserini_base gesis_rank_pyserini
RANKSYS_BASE: gesis_rank_pyserini_base
```

After:
```
SYSTEMS_CONFIG: |
[
{"name": "gesis_rec_pyterrier", "type": "recommender", "base": true},
{"name": "gesis_rec_pyserini", "type": "recommender"},
{"name": "gesis_rank_pyserini_base", "type": "ranker", "base": true},
{"name": "gesis_rank_pyserini", "type": "recommender"}
]
```


## Update to Python 3.9 and Flask 3.0
- Update minimal Python version to 3.9
- Update the `python` version in the `Dockerfile` to `3.9`
Expand Down
45 changes: 24 additions & 21 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,31 +12,34 @@ services:
- "8080:8000"
environment:
# Config
- FLASK_APP=app/app
- FLASK_CONFIG=postgres
- INTERLEAVE=True
- BULK_INDEX=False
- DELETE_SENT_SESSION=True
- INTERVAL_DB_CHECK=3
- SESSION_EXPIRATION=6
FLASK_APP: app/app
FLASK_CONFIG: postgres
INTERLEAVE: True
BULK_INDEX: False
DELETE_SENT_SESSION: True
INTERVAL_DB_CHECK: 3
SESSION_EXPIRATION: 6

# Systems
- RECSYS_LIST=gesis_rec_pyterrier gesis_rec_pyserini
- RECSYS_BASE=gesis_rec_pyterrier
- RANKSYS_LIST=gesis_rank_pyserini_base gesis_rank_pyserini
- RANKSYS_BASE=gesis_rank_pyserini_base

SYSTEMS_CONFIG: |
{
"gesis_rec_pyterrier": {"type": "recommender", "base": true},
"gesis_rec_pyserini": {"type": "recommender"},
"gesis_rank_pyserini_base": {"type": "ranker", "base": true},
"gesis_rank_pyserini": {"type": "recommender"},
}
# Stella Server
- STELLA_SERVER_ADDRESS=http://host.docker.internal:8000
- STELLA_SERVER_USER=[email protected]
- STELLA_SERVER_PASS=pass
- STELLA_SERVER_USERNAME=LIVIVO
STELLA_SERVER_ADDRESS: http://host.docker.internal:8000
STELLA_SERVER_USER: [email protected]
STELLA_SERVER_PASS: pass
STELLA_SERVER_USERNAME: LIVIVO

# Database
- POSTGRES_USER=postgres
- POSTGRES_PW=change-me
- POSTGRES_DB=postgres
- POSTGRES_URL=db:5430
POSTGRES_USER: postgres
POSTGRES_PW: change-me
POSTGRES_DB: postgres
POSTGRES_URL: db:5430

command: gunicorn -w 2 --timeout 60 -b :8000 'app.app:create_app()'
links:
- db:db
Expand Down
166 changes: 18 additions & 148 deletions web/app/api/rankings.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,12 @@
import docker
import time
from flask import jsonify, request, current_app
from app import api
from app.models import db, Session, System, Result, Feedback
from app.services.interleave_service import interleave_rankings
from app.utils import create_dict_response
from app.models import Feedback, Result, Session, db
from app.services.ranking_service import make_ranking
from app.services.session_service import create_new_session
from app.services.system_service import get_least_served_system
from flask import jsonify, request
from pytz import timezone

from . import api
from app.services.session_service import create_new_session
from app.services.ranking_service import query_system

client = docker.DockerClient(base_url="unix://var/run/docker.sock")
tz = timezone("Europe/Berlin")
Expand Down Expand Up @@ -82,149 +80,21 @@ def ranking():
header contains meta-data
body contains ranked document list
"""
page = request.args.get("page", default=0, type=int)
rpp = request.args.get("rpp", default=10, type=int)

# look for "mandatory" GET-parameters (query, container_name, session_id)
query = request.args.get("query", None)
container_name = request.args.get("container", None)
session_id = request.args.get("sid", None)

# Look for optional GET-parameters and set default values
page = request.args.get("page", default=0, type=int)
rpp = request.args.get("rpp", default=20, type=int)

# Return cached results for known (session ID, query) combinations
# tested: true
if session_id and query:
ranking = (
db.session.query(Result)
.filter_by(session_id=session_id, q=query, page=page, rpp=rpp)
.first()
)
if ranking:
if ranking.tdi:
ranking = db.session.query(Result).filter_by(id=ranking.tdi).first()

system_id = (
db.session.query(Session)
.filter_by(id=session_id)
.first()
.system_ranking
)
container_name = (
db.session.query(System).filter_by(id=system_id).first().name
)

response = {
"header": {
"sid": ranking.session_id,
"rid": ranking.id,
"q": query,
"page": ranking.page,
"rpp": ranking.rpp,
"hits": ranking.hits,
"container": {"exp": container_name},
},
"body": ranking.items,
}
return jsonify(response)

# If no query is given, return an empty response
# TODO: Is this save?
if query is None:
return create_dict_response(status=1, ts=round(time.time() * 1000))
return "Missing query string", 400

# Select least served container if no container_name is given. This is the default case for an interleaved experiment.
container_name = request.args.get("container", None)
if container_name is None:
# Depricated precomputed runs code. Will be removed in the future.
if query in current_app.config["HEAD_QUERIES"]:
container_name = (
db.session.query(System)
.filter(System.name != current_app.config["RANKING_BASELINE_CONTAINER"])
.filter(
System.name.notin_(
current_app.config["RECOMMENDER_CONTAINER_NAMES"]
+ current_app.config["RECOMMENDER_PRECOMPUTED_CONTAINER_NAMES"]
)
)
.order_by(System.num_requests)
.first()
.name
)
else:
# Select least served container
container_name = (
db.session.query(System)
.filter(System.name != current_app.config["RANKING_BASELINE_CONTAINER"])
.filter(
System.name.notin_(
current_app.config["RECOMMENDER_CONTAINER_NAMES"]
+ current_app.config["RANKING_PRECOMPUTED_CONTAINER_NAMES"]
+ current_app.config["RECOMMENDER_PRECOMPUTED_CONTAINER_NAMES"]
)
)
.order_by(System.num_requests_no_head)
.first()
.name
)

# Create new session if no session_id is given
if session_id is None:
# make new session and get session_id as sid
container_name = get_least_served_system(query)

session_id = request.args.get("sid", None)
if session_id is None or db.session.query(Session).filter_by(id=session_id) is None:
session_id = create_new_session(container_name, type="ranker")
else:
# SessionID is given, but does not exist in the database
if db.session.query(Session).filter_by(id=session_id) is None:
session_id = create_new_session(
container_name=container_name, sid=session_id, type="ranker"
)

# TODO: At this point, the containername is given! The container name is overwritten here. Therefore I commented out the following lines.
# ranking_system_id = (
# db.session.query(Session).filter_by(id=session_id).first().system_ranking
# )
# container_name = db.session.query(System).filter_by(id=ranking_system_id).first().name

# Query the experimental and baseline system
ranking_exp = query_system(container_name, query, rpp, page, session_id)

ranking_base = query_system(
current_app.config["RANKING_BASELINE_CONTAINER"],
query,
rpp,
page,
session_id,
type="BASE",
)

if current_app.config["INTERLEAVE"]:
response = interleave_rankings(ranking_exp, ranking_base)
response_complete = {
"header": {
"sid": ranking_exp.session_id,
"rid": ranking_exp.tdi,
"q": query,
"page": page,
"rpp": rpp,
"hits": ranking_base.num_found,
"container": {
"base": current_app.config["RANKING_BASELINE_CONTAINER"],
"exp": container_name,
},
},
"body": response,
}
else:
response_complete = {
"header": {
"sid": ranking_exp.session_id,
"rid": ranking_exp.id,
"q": query,
"page": page,
"rpp": rpp,
"hits": ranking_exp.num_found,
"container": {"exp": container_name},
},
"body": ranking_exp.items,
}

return jsonify(response_complete)

response = make_ranking(container_name, query, rpp, page, session_id)

return jsonify(response)
18 changes: 9 additions & 9 deletions web/app/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,11 @@
import sys

from app.api import api as api_blueprint
from app.commands import index_systems, init_db_command, seed_db_command
from app.extensions import bootstrap, cache, db, migrate, scheduler
from app.main import main as main_blueprint
from app.commands import init_db_command, seed_db_command, index_systems
from app.extensions import bootstrap, db, migrate, scheduler
from config import config
from flask import Flask
import sys, socket


def create_app(config_name=None):
Expand Down Expand Up @@ -50,12 +49,12 @@ def configure_logger(app):
app.logger.addHandler(stream_handler)

# FileHandler for logging to a file
file_handler = logging.FileHandler(
"data/log/stella-app.log"
) # Define your log file name
file_handler.setFormatter(logging.Formatter(log_format))
if not any(isinstance(h, logging.FileHandler) for h in app.logger.handlers):
app.logger.addHandler(file_handler)
# file_handler = logging.FileHandler(
# "data/log/stella-app.log"
# ) # Define your log file name
# file_handler.setFormatter(logging.Formatter(log_format))
# if not any(isinstance(h, logging.FileHandler) for h in app.logger.handlers):
# app.logger.addHandler(file_handler)

app.logger.info("Logging setup complete.")

Expand All @@ -65,6 +64,7 @@ def register_extensions(app):
db.init_app(app)
migrate.init_app(app, db)
bootstrap.init_app(app)
cache.init_app(app)

'''
if scheduler.state == 0:
Expand Down
3 changes: 1 addition & 2 deletions web/app/commands.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
import threading
import time
from typing import List

import click
import config
from app.extensions import db
from app.models import System
from app.services.system_indexer_service import cmd_index, rest_index
from app.services.system_service import cmd_index, rest_index
from flask import current_app
from flask.cli import with_appcontext

Expand Down
10 changes: 6 additions & 4 deletions web/app/extensions.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
from flask_sqlalchemy import SQLAlchemy
from flask_migrate import Migrate
from flask_login import LoginManager
from flask_bootstrap import Bootstrap
from flask_apscheduler import APScheduler
from flask_bootstrap import Bootstrap
from flask_caching import Cache
from flask_login import LoginManager
from flask_migrate import Migrate
from flask_sqlalchemy import SQLAlchemy

db = SQLAlchemy()
migrate = Migrate()
login_manager = LoginManager()
bootstrap = Bootstrap()
scheduler = APScheduler()
cache = Cache()
6 changes: 4 additions & 2 deletions web/app/main/index.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
import threading
from . import main

from app.services.system_service import cmd_index, rest_index
from flask import current_app
from app.services.system_indexer_service import rest_index, cmd_index

from . import main


@main.route("/index/<string:container_name>")
Expand Down
Loading

0 comments on commit 44cfb37

Please sign in to comment.