-
Notifications
You must be signed in to change notification settings - Fork 27
Route to register aiModel #8127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@valentin-pinkau I created the route id: String, // must be unique and a valid MongoDB ObjectId, compare https://pymongo.readthedocs.io/en/stable/api/bson/objectid.html
dataStoreName: String, // "localhost" for standard dev setup, "webknossos.org" for webknossos.org (data-humerus)
name: String, // Must be unique.
comment: Option[String],
category: Option[String] // if present, must be one of em_neurons, em_nuclei, em_synapses, em_neuron_types, em_cell_organelles It expects that the model is already present in the .aiModels directory, compare worker. The route does not currently assert that the model is there on disk and usable. The Do you think that will work for the PublishModelTask? |
@@ -47,7 +47,7 @@ abstract class SQLDAO[C, R, X <: AbstractTable[R]] @Inject()(sqlClient: SqlClien | |||
case Some(r) => | |||
parse(r) ?~> ("sql: could not parse database row for object" + id) | |||
case _ => | |||
Fox.failure("sql: could not find object " + id) | |||
Fox.empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used Fox.empty already in many overwriting findOnes, but not this one in the superclass. I don’t think it will hurt, though, and this way I can distinguish between error and empty downstream.
WalkthroughThe changes in this pull request introduce several enhancements and modifications across the WEBKNOSSOS application. A new SQL migration file is added to the migration guide, and a new case class for registering AI models is introduced in the Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Outside diff range and nitpick comments (6)
conf/evolutions/reversions/123-more-model-categories.sql (1)
5-5
: Enhance the comment about enum limitations.While the comment correctly references PostgreSQL's enum limitations, it would be helpful to add more context about:
- What happens to existing records using the new enum values
- Why removing enum values could be dangerous
-- removing enum values is not supported in postgres, see https://www.postgresql.org/docs/current/datatype-enum.html#DATATYPE-ENUM-IMPLEMENTATION-DETAILS +-- NOTE: Removing enum values is not supported in postgres and could break existing data. +-- Any records using these enum values must be updated before attempting removal. +-- See: https://www.postgresql.org/docs/current/datatype-enum.html#DATATYPE-ENUM-IMPLEMENTATION-DETAILSconf/evolutions/123-more-model-categories.sql (1)
2-3
: Enhance the comment about transaction limitations.While the current comment is helpful, it would be more informative to explain why ALTER TYPE ADD VALUE cannot run in a transaction (it acquires an ACCESS EXCLUSIVE lock that conflicts with transaction semantics).
--- no transaction here, since ALTER TYPE ... ADD cannot run inside a transaction block +-- No transaction block here: ALTER TYPE ADD VALUE requires an ACCESS EXCLUSIVE lock +-- which cannot be acquired within a transaction block in PostgreSQLMIGRATIONS.unreleased.md (1)
15-15
: Consider documenting the new model categories.For better clarity and documentation, consider adding a brief description of the new EM-specific categories (
em_synapses
,em_neuron_types
,em_cell_organelles
) that are being introduced in this migration.Example addition:
- [123-more-model-categories.sql](conf/evolutions/123-more-model-categories.sql) + [123-more-model-categories.sql](conf/evolutions/123-more-model-categories.sql) + - Adds new electron microscopy specific categories: synapses, neuron types, and cell organellesapp/models/aimodels/AiModel.scala (2)
147-152
: LGTM! Consider adding an index for performance.The implementation is well-structured, secure, and properly integrated with the existing access control system. However, since this method will be used by the new registration route to check for duplicate names, consider adding an index on the
name
column to optimize query performance.Add the following index creation to your SQL migration script:
CREATE INDEX IF NOT EXISTS idx_aimodels_name ON webknossos.aimodels(name);Additionally, consider specifying case sensitivity requirements for name matching if needed:
- r <- run(q"SELECT $columns FROM $existingCollectionName WHERE name = $name AND $accessQuery".as[AimodelsRow]) + r <- run(q"SELECT $columns FROM $existingCollectionName WHERE LOWER(name) = LOWER($name) AND $accessQuery".as[AimodelsRow])
Line range hint
1-153
: Consider adding validation methods for registration requirements.To support the new registration route's requirements, consider adding validation methods to ensure:
- DataStore name is either "localhost" or "webknossos.org"
- Category matches allowed EM-related options
Add these validation methods to the companion object:
object AiModel { private val AllowedDataStores = Set("localhost", "webknossos.org") def validateDataStore(dataStore: String): Fox[String] = Fox.bool2Fox(AllowedDataStores.contains(dataStore.trim))( s"DataStore must be one of: ${AllowedDataStores.mkString(", ")}") def validateCategory(category: Option[AiModelCategory]): Fox[Unit] = category match { case None => Fox.successful(()) case Some(cat) if cat.toString.startsWith("em_") => Fox.successful(()) case _ => Fox.failure("Category must be EM-related") } }tools/postgres/schema.sql (1)
Line range hint
549-589
: Well-structured schema for AI model management.The schema design effectively supports the requirements outlined in the PR objectives:
- Proper organization-level model management
- Support for model registration with required fields
- Training annotation tracking
- Inference result management
Some architectural considerations:
- The
aiModels
table's unique constraint on(organization, name)
ensures proper model naming within organizations- The
aiInferences
table captures both dataset and annotation outputs, allowing flexible inference targets- Soft deletion is consistently implemented across tables
Consider adding:
- An index on
(organization, category)
to optimize queries filtering models by category- A check constraint on
newSegmentationLayerName
to ensure it follows the same pattern as dataset layer names
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (9)
- MIGRATIONS.unreleased.md (1 hunks)
- app/controllers/AiModelController.scala (2 hunks)
- app/models/aimodels/AiModel.scala (1 hunks)
- app/models/aimodels/AiModelCategory.scala (1 hunks)
- app/utils/sql/SQLDAO.scala (1 hunks)
- conf/evolutions/123-more-model-categories.sql (1 hunks)
- conf/evolutions/reversions/123-more-model-categories.sql (1 hunks)
- conf/webknossos.latest.routes (1 hunks)
- tools/postgres/schema.sql (2 hunks)
🔇 Additional comments (12)
app/models/aimodels/AiModelCategory.scala (2)
7-7
: LGTM! New category values follow consistent naming.The added enumeration values (
em_synapses
,em_neuron_types
,em_cell_organelles
) maintain consistency with existing electron microscopy categories and provide clear, domain-specific classifications.
7-7
: Verify enum value handling across the codebase.Let's ensure that code handling these enum values (serialization, validation, UI components) properly supports the new categories.
conf/evolutions/reversions/123-more-model-categories.sql (2)
1-3
: LGTM! Transaction and version assertion look good.The script correctly uses a transaction and includes a schema version assertion to prevent out-of-order execution.
7-9
: Verify impact on existing data.The schema version update is correct, but we should verify there are no records using the new enum values before applying this reversion.
Consider adding a validation query before the version update to prevent data inconsistencies:
DO $$ BEGIN IF EXISTS ( SELECT 1 FROM webknossos.ai_models WHERE category IN ('em_synapses', 'em_neuron_types', 'em_cell_organelles') ) THEN RAISE EXCEPTION 'Cannot revert: Found models using the new categories'; END IF; END $$;MIGRATIONS.unreleased.md (1)
15-15
: LGTM! Migration file reference follows the established pattern.The new migration file
123-more-model-categories.sql
is correctly referenced and follows the sequential numbering convention.app/utils/sql/SQLDAO.scala (2)
50-50
: LGTM! Consistent error handling approach.The change from
Fox.failure
toFox.empty
aligns with the existing pattern in otherfindOne
implementations and provides better semantics for "not found" scenarios.
50-50
: Verify consistent handling of Fox.empty across the codebase.Since this changes how "not found" scenarios are communicated, we should verify that all consumers handle both
Fox.empty
andFox.failure
appropriately.✅ Verification successful
Let me analyze the findings from the shell scripts to determine if Fox.empty is handled consistently across the codebase.
The search results show that Fox.empty is used in several key patterns:
- In SQLDAO.scala, Fox.empty is used to indicate "not found" scenarios
- Throughout the codebase, Fox.empty is consistently used to represent:
- Missing/not found elements (e.g., in database queries)
- Empty results in recursive searches
- Optional data that may legitimately be absent
- Failed validations that should be handled gracefully
The usage in SQLDAO.scala aligns with these patterns, particularly for database queries returning no results. The codebase consistently handles Fox.empty through:
- Pattern matching (case Empty => ...)
- Chaining with ?~> for error messages
- Transformations via map/flatMap
- Fallback handling with getOrElse
Fox.empty is being used appropriately in SQLDAO.scala to indicate "not found" scenarios, consistent with usage patterns across the codebase.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Find all usages of findOne to verify consistent error handling # Look for patterns that might expect the old failure behavior # Search for findOne calls followed by error handling rg -A 5 'findOne.*\?~>' # Search for map/flatMap on findOne results to see error handling rg -A 5 'findOne.*\.(?:map|flatMap)' # Search for direct matches on Fox.empty to understand existing patterns rg 'Fox\.empty'Length of output: 126393
app/controllers/AiModelController.scala (2)
227-241
:⚠️ Potential issueUse atomic operation to prevent race conditions.
The current implementation has a potential race condition between checking existence and insertion. Consider using an atomic upsert operation with a unique index.
Let's verify if unique indexes exist:
#!/bin/bash # Search for unique index definitions rg -A 3 'createIndex|ensureIndex.*unique'
60-68
: 🛠️ Refactor suggestionAdd input validation constraints to prevent invalid data.
While the case class structure matches the requirements, consider adding input validation:
- Add size limits to prevent DoS:
name: String Refined MaxSize[64], comment: Option[String Refined MaxSize[1024]]
- Add enum validation for dataStoreName:
dataStoreName: DataStoreName // sealed trait with two cases: Localhost and Production
- Consider using smart constructors or refinement types to validate at compile-time.
Let's verify the current validation in the codebase:
tools/postgres/schema.sql (2)
23-23
: LGTM: Schema version increment is appropriate.The schema version update from 122 to 123 is properly wrapped in a transaction.
Line range hint
893-903
: LGTM: Foreign key constraints are properly configured.The foreign key constraints:
- Maintain referential integrity
- Use appropriate CASCADE behaviors
- Are consistent with the schema's overall design
conf/webknossos.latest.routes (1)
286-286
: LGTM! The new route follows RESTful conventions.The route is well-placed in the AI Models section and follows the established pattern of other AI model endpoints.
✅ Verification successful
Superuser authentication is properly implemented
The implementation in
AiModelController.scala
shows that theregisterAiModel
endpoint is secured with:
sil.SecuredAction
- requiring authenticationuserService.assertIsSuperUser(request.identity)
- explicitly checking for superuser status🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the registerAiModel action implements superuser authentication. # Expected: Find authentication/authorization checks in the controller. echo "Searching for the controller implementation..." ast-grep --pattern 'def registerAiModel($$$) = { $$$ }' echo "Searching for authentication-related code..." rg -A 5 'registerAiModel.*=.*\{'Length of output: 296
Script:
#!/bin/bash # Let's try a different approach to find the controller implementation # First, find the AiModelController file echo "Finding AiModelController file..." fd "AiModelController" # Search for registerAiModel with broader context and case-insensitive echo "Searching for registerAiModel method..." rg -i "registerAiModel" -A 10 -B 5 # Look for authentication-related annotations or imports echo "Searching for authentication-related code..." rg "@Authenticated|@SuperUser|Authentication" -A 2Length of output: 26629
ALTER TYPE webknossos.AI_MODEL_CATEGORY ADD VALUE 'em_synapses'; | ||
ALTER TYPE webknossos.AI_MODEL_CATEGORY ADD VALUE 'em_neuron_types'; | ||
ALTER TYPE webknossos.AI_MODEL_CATEGORY ADD VALUE 'em_cell_organelles'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
These enum values already exist in schema.sql - migration will fail
The values 'em_synapses', 'em_neuron_types', and 'em_cell_organelles' are already defined in tools/postgres/schema.sql
as part of the initial enum definition. Attempting to add these values again through the migration will cause an error since PostgreSQL doesn't allow duplicate enum values.
- Remove these ALTER TYPE statements as these values are already part of the enum type definition in
tools/postgres/schema.sql
🔗 Analysis chain
Verify enum values don't already exist and consider rollback strategy.
The ALTER TYPE commands look correct, but we should verify these values don't already exist to prevent errors. Also, consider documenting the rollback strategy since enum values cannot be easily removed.
Consider adding a comment block documenting the rollback strategy, as removing enum values requires careful handling:
-- Rollback strategy:
-- 1. Create a new enum type without these values
-- 2. Update all columns to use the new type
-- 3. Drop the old type
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if any of the new enum values already exist in the database
# Test: Search for existing enum definitions or usage
rg -i "em_synapses|em_neuron_types|em_cell_organelles" --type sql
# Test: Look for any previous migrations that might have added these values
fd -e sql | xargs grep -l "ALTER TYPE.*AI_MODEL_CATEGORY.*ADD VALUE"
Length of output: 688
|
||
-- no transaction here, since ALTER TYPE ... ADD cannot run inside a transaction block | ||
|
||
do $$ begin ASSERT (select schemaVersion from webknossos.releaseInformation) = 122, 'Previous schema version mismatch'; end; $$ LANGUAGE plpgsql; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Strengthen the schema version assertion.
While the version check is good, it could be more robust by handling NULL values and providing a more descriptive error message.
-do $$ begin ASSERT (select schemaVersion from webknossos.releaseInformation) = 122, 'Previous schema version mismatch'; end; $$ LANGUAGE plpgsql;
+do $$
+declare
+ current_version int;
+begin
+ SELECT schemaVersion INTO current_version FROM webknossos.releaseInformation;
+ ASSERT current_version IS NOT NULL AND current_version = 122,
+ 'Expected schema version 122, but found ' || COALESCE(current_version::text, 'NULL');
+end; $$ LANGUAGE plpgsql;
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
do $$ begin ASSERT (select schemaVersion from webknossos.releaseInformation) = 122, 'Previous schema version mismatch'; end; $$ LANGUAGE plpgsql; | |
do $$ | |
declare | |
current_version int; | |
begin | |
SELECT schemaVersion INTO current_version FROM webknossos.releaseInformation; | |
ASSERT current_version IS NOT NULL AND current_version = 122, | |
'Expected schema version 122, but found ' || COALESCE(current_version::text, 'NULL'); | |
end; $$ LANGUAGE plpgsql; |
ALTER TYPE webknossos.AI_MODEL_CATEGORY ADD VALUE 'em_neuron_types'; | ||
ALTER TYPE webknossos.AI_MODEL_CATEGORY ADD VALUE 'em_cell_organelles'; | ||
|
||
UPDATE webknossos.releaseInformation SET schemaVersion = 123; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add verification for schema version update.
The schema version update should verify that exactly one row was updated to ensure the migration completed successfully.
-UPDATE webknossos.releaseInformation SET schemaVersion = 123;
+DO $$
+DECLARE
+ rows_affected int;
+BEGIN
+ WITH updated AS (
+ UPDATE webknossos.releaseInformation
+ SET schemaVersion = 123
+ WHERE schemaVersion = 122
+ RETURNING *
+ )
+ SELECT COUNT(*) INTO rows_affected FROM updated;
+
+ ASSERT rows_affected = 1,
+ 'Expected to update exactly one row, but updated ' || rows_affected || ' rows';
+END $$;
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
UPDATE webknossos.releaseInformation SET schemaVersion = 123; | |
DO $$ | |
DECLARE | |
rows_affected int; | |
BEGIN | |
WITH updated AS ( | |
UPDATE webknossos.releaseInformation | |
SET schemaVersion = 123 | |
WHERE schemaVersion = 122 | |
RETURNING * | |
) | |
SELECT COUNT(*) INTO rows_affected FROM updated; | |
ASSERT rows_affected = 1, | |
'Expected to update exactly one row, but updated ' || rows_affected || ' rows'; | |
END $$; |
} yield Ok | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return the created model in the response.
For consistency with other controller methods, return the created model instead of an empty Ok.
- } yield Ok
+ newModel <- aiModelDAO.findOne(request.body.id) ?~> "aiModel.notFound"
+ jsResult <- aiModelService.publicWrites(newModel)
+ } yield Ok(jsResult)
Committable suggestion was skipped due to low confidence.
def registerAiModel: Action[RegisterAiModelParameters] = | ||
sil.SecuredAction.async(validateJson[RegisterAiModelParameters]) { implicit request => | ||
for { | ||
_ <- userService.assertIsSuperUser(request.identity) | ||
_ <- dataStoreDAO.findOneByName(request.body.dataStoreName) ?~> "dataStore.notFound" | ||
_ <- aiModelDAO.findOne(request.body.id).reverse ?~> "aiModel.id.taken" | ||
_ <- aiModelDAO.findOneByName(request.body.name).reverse ?~> "aiModel.name.taken" | ||
_ <- aiModelDAO.insertOne( | ||
AiModel( | ||
request.body.id, | ||
_organization = request.identity._organization, | ||
request.body.dataStoreName, | ||
request.identity._id, | ||
None, | ||
List.empty, | ||
request.body.name, | ||
request.body.comment, | ||
request.body.category | ||
)) | ||
} yield Ok | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing validations and improve error handling.
The method should validate:
- That dataStoreName is either "localhost" or "webknossos.org"
- That category, when provided, is a valid EM type
- Input field sizes to prevent DoS
def registerAiModel: Action[RegisterAiModelParameters] =
sil.SecuredAction.async(validateJson[RegisterAiModelParameters]) { implicit request =>
for {
_ <- userService.assertIsSuperUser(request.identity)
+ _ <- bool2Fox(Set("localhost", "webknossos.org").contains(request.body.dataStoreName)) ?~> "dataStore.invalid"
+ _ <- request.body.category.traverse(cat => bool2Fox(AiModelCategory.isValidEMType(cat)) ?~> "category.invalid")
_ <- dataStoreDAO.findOneByName(request.body.dataStoreName) ?~> "dataStore.notFound"
Committable suggestion was skipped due to low confidence.
To be called by a voxelytics PublishModelTask. Needs the token of a superuser.
We tested that this works during the review of https://github.com/scalableminds/voxelytics/pull/3763
Issues:
Summary by CodeRabbit
Release Notes
New Features
POST /aiModels/register
.em_synapses
,em_neuron_types
, andem_cell_organelles
.Bug Fixes
Schema Updates