Skip to content

Implement e2e backend instance and fuzz testing for both GraphQL and REST endpoints#3386

Merged
arkid15r merged 10 commits intomainfrom
feature/e2e-backend
Jan 19, 2026
Merged

Implement e2e backend instance and fuzz testing for both GraphQL and REST endpoints#3386
arkid15r merged 10 commits intomainfrom
feature/e2e-backend

Conversation

@ahmedxgouda
Copy link
Collaborator

Proposed change

Resolves #(put the issue number here)

Add the PR description here.

Checklist

  • Required: I followed the contributing workflow
  • Required: I verified that my code works as intended and resolves the issue as described
  • Required: I ran make check-test locally: all warnings addressed, tests passed
  • I used AI for code, documentation, tests, or communication related to this PR

@github-actions github-actions bot added docs Improvements or additions to documentation backend frontend docker Pull requests that update Docker code backend-tests makefile ci labels Jan 17, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 17, 2026

Summary by CodeRabbit

  • New Features

    • CI workflows to run frontend E2E with real backend/cache and separate GraphQL/REST fuzz jobs; local/CI test entrypoints and Docker compose setups.
    • Example environment files for E2E and fuzzing.
  • Improvements

    • Unified validation error responses across REST endpoints.
    • GraphQL resolvers optimized for more efficient data loading.
    • Middleware to reject requests containing null characters.
  • Documentation

    • CONTRIBUTING updated with E2E and fuzz test instructions.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Adds E2E and fuzz test CI and compose infrastructure, new backend E2E/fuzz settings and middleware, Schemathesis-based fuzz tests and Dockerfiles, large Strawberry/Django GraphQL refactor (root-based fields, limits, query changes), DB indexes/migrations, dump_data command, and widespread test updates and Makefile targets.

Changes

Cohort / File(s) Summary
CI / Workflows
.github/workflows/run-ci-cd.yaml, .github/workflows/run-fuzz-tests.yaml, .github/workflows/setup-backend-environment/action.yaml, .github/workflows/update-nest-test-images.yaml
Added/run E2E and fuzz workflows, reusable backend setup action, services (Postgres pgvector, Redis), healthchecks, backend readiness/data-load, and fuzz job inputs.
Docker & Compose
docker/backend/... (docker/backend/Dockerfile.fuzz, docker/backend/Dockerfile.local, docker/backend/Dockerfile.test, docker/backend/Dockerfile), docker-compose/e2e/compose.yaml, docker-compose/fuzz/compose.yaml, docker-compose/.../compose.yaml
New fuzz Dockerfile, minor Dockerfile tweaks (postgres client, removed CMD), new e2e/fuzz compose manifests, added entrypoint in prod/staging compose files.
Backend settings & middleware
backend/settings/base.py, backend/settings/e2e.py, backend/settings/fuzz.py, backend/apps/common/middlewares/*
Introduced IS_E2E_ENVIRONMENT and IS_FUZZ_ENVIRONMENT, REDIS auth toggle, new E2E/Fuzz settings modules, and BlockNullCharactersMiddleware registered in MIDDLEWARE.
Makefiles & entrypoints
backend/Makefile, frontend/Makefile, backend/entrypoint.fuzz.sh
Added e2e/fuzz-specific Make targets (run/load/purge/etc.), changed frontend E2E orchestration to docker-compose, and added fuzz entrypoint script for Schemathesis/pytest.
GraphQL resolvers & queries
backend/apps/.../nodes/..., backend/apps/.../queries/... (many under apps/github, apps/owasp, apps/mentorship, etc.)
Mass refactor from @strawberry.field@strawberry_django.field, convert resolvers to accept root, add select_related/prefetch hints, introduce MAX_LIMIT caps, replace Subquery/OuterRef distinct logic with Window/Rank, add enums/pagination guards.
REST API validation & schemas
backend/apps/api/rest/v0/__init__.py, backend/apps/api/rest/v0/common.py, and many endpoints under apps/api/rest/v0
Added ValidationErrorSchema, global ValidationError handler, and included HTTP 400 response mappings across many REST endpoints; event date fields serialized to ISO strings.
Models & migrations
backend/apps/github/models/*, backend/apps/owasp/models/*, backend/apps/.../migrations/*
Added DB indexes (milestone, snapshot, project GIN/trigram), trigram extension migration, and model refactors (ORM counts, cached properties, eager loading, project health metrics changes).
Management commands & data handling
backend/apps/common/management/commands/dump_data.py, (removed) backend/apps/common/management/commands/load_data.py, backend/pyproject.toml
Added dump_data (temp DB, email redaction, pg_dump), removed legacy load_data command, and added per-file linter ignores for the new command.
Fuzz tests & test infra
backend/tests/fuzz/*, backend/tests/apps/common/management/commands/dump_data_test.py
Added Schemathesis fuzz tests for GraphQL and REST, dump_data unit tests, and CI fuzz orchestration artifacts.
Unit tests / test refactor
many backend/tests/apps/* files
Introduced GraphQLNodeBaseTest and migrated tests to resolver-wrapped invocation, updated mocks to match ORM/query changes, removed load_data tests, and adjusted expected data shapes.
Frontend env & queries
frontend/.env.e2e.example, frontend/.env.example, frontend/src/server/queries/homeQueries.ts, frontend/Makefile
Added frontend E2E env example, moved NEXTAUTH vars, removed milestone state arg from recentMilestones query, and updated frontend E2E target to use docker-compose.
Repo metadata & docs
.gitignore, CONTRIBUTING.md, cspell/custom-dict.txt
Updated ignores for fuzz/e2e artifacts, added CONTRIBUTING guidance for E2E/fuzz workflows, and extended custom dictionary for new tooling terms.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • kasya
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description check ❓ Inconclusive The description is a template with placeholder text and an incomplete checklist, but it references the PR objective of adding e2e backend and fuzz testing, which relates to the changeset. Replace template text with a concrete description of the e2e backend setup and fuzz testing implementation details.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main changes: implementing e2e backend instance and fuzz testing for GraphQL and REST endpoints.
Docstring Coverage ✅ Passed Docstring coverage is 93.85% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/e2e-backend

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

PR validation failed: No linked issue and no valid closing issue reference in PR description

@github-actions github-actions bot closed this Jan 17, 2026
@ahmedxgouda ahmedxgouda reopened this Jan 17, 2026
@ahmedxgouda ahmedxgouda marked this pull request as ready for review January 17, 2026 04:32
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
backend/apps/github/api/internal/queries/pull_request.py (1)

52-52: Docstring doesn't match implementation.

The docstring states "unique pull requests per author and repository," but the implementation at lines 94-96 only partitions by author_id. Either update the docstring to reflect "unique pull requests per author" or update the partition_by to include repository_id.

📝 Proposed docstring fix
-            distinct (bool): Whether to return unique pull requests per author and repository.
+            distinct (bool): Whether to return unique pull requests per author.

Or if the original intent was to partition by both:

                     partition_by=[F("author_id")],
+                    partition_by=[F("author_id"), F("repository_id")],
backend/apps/owasp/api/internal/queries/member_snapshot.py (1)

48-74: Remove decorator hints from member_snapshots method.

The @strawberry_django.field decorator should not use select_related or prefetch_related parameters for methods returning relationship objects (list[MemberSnapshotNode]). Per strawberry-graphql-django conventions, apply these optimizations inside the resolver instead. The current decorator hints also reference nested relationships (github_user__owasp_profile, user_badges__badge) that are not exposed by MemberSnapshotNode's schema, making them unnecessary. Follow the pattern used by the singular member_snapshot method: remove the decorator parameters and rely on the in-resolver optimizations.

♻️ Suggested fix
-    `@strawberry_django.field`(
-        select_related=["github_user__owasp_profile", "github_user__user_badges__badge"],
-        prefetch_related=["issues", "pull_requests", "messages"],
-    )
+    `@strawberry_django.field`
     def member_snapshots(
         self, user_login: str | None = None, limit: int = 10
     ) -> list[MemberSnapshotNode]:
backend/apps/owasp/api/internal/queries/project.py (1)

93-104: Avoid icontains("") which matches everything.

If github_user.name is None (or empty), leaders_raw__icontains="" will match all projects, making is_project_leader return True for any existing project. That’s a correctness bug and could grant unintended access.

🐛 Proposed fix
-        return Project.objects.filter(
-            Q(leaders_raw__icontains=github_user.login)
-            | Q(leaders_raw__icontains=(github_user.name or ""))
-        ).exists()
+        filters = Q(leaders_raw__icontains=github_user.login)
+        if github_user.name:
+            filters |= Q(leaders_raw__icontains=github_user.name)
+        return Project.objects.filter(filters).exists()
🤖 Fix all issues with AI agents
In `@backend/apps/common/management/commands/dump_data.py`:
- Around line 87-93: The finally block is catching CalledProcessError but
_execute_sql raises psycopg2 exceptions; change the exception handler to catch
psycopg2.Error (or psycopg2.DatabaseError) instead of CalledProcessError, and
log the exception details (e.g., include the exception message variable) in the
warning so failures in _execute_sql within dump_data.py are actually reported;
reference the _execute_sql call and replace the Except CalledProcessError branch
with Except psycopg2.Error as e and include e in the self.stderr.write warning.
- Around line 54-56: The CREATE DATABASE call in dump_data.py currently
interpolates temp_db and DB_NAME via f-strings in self._execute_sql which is
unsafe; update the code that builds the SQL for CREATE DATABASE (the call using
self._execute_sql("postgres", [...])) to construct the statement with
psycopg2.sql.SQL and psycopg2.sql.Identifier (e.g., SQL("CREATE DATABASE {}
TEMPLATE {}").format(Identifier(temp_db), Identifier(DB_NAME))) so identifiers
are properly quoted/escaped before passing to _execute_sql; ensure any other
places that interpolate database names (temp_db, DB_NAME) use the same
sql.Identifier approach.

In `@backend/apps/github/api/internal/nodes/pull_request.py`:
- Around line 38-41: In the url resolver method (def url(self, root:
PullRequest) -> str) remove the redundant fallback and return root.url directly
(i.e., replace "return root.url or \"\"" with "return root.url") so the
PullRequest url field matches other node resolvers like
RepositoryNode/UserNode/ReleaseNode/OrganizationNode that return root.url
without an empty-string fallback.

In `@backend/apps/github/api/internal/queries/issue.py`:
- Around line 72-81: The distinct logic in the
queryset.annotate(rank=Window(...)) currently partitions only by F("author_id")
so it returns one issue per author globally; update the partition_by argument in
the Window expression used in this block (the Rank() window annotation on
queryset in the distinct branch) to include F("repository_id") as well (i.e.,
partition_by=[F("author_id"), F("repository_id")]) so uniqueness is enforced per
author-and-repository as the docstring promises.

In `@backend/apps/github/api/internal/queries/pull_request.py`:
- Around line 91-102: The unit tests for the distinct=True branch in
pull_request_test.py use MagicMock and only assert
annotate()/filter()/order_by() call counts, so they don't validate actual SQL or
DB behavior; add an integration test that runs the distinct branch against a
real test database (using Django TestCase or TransactionTestCase) by invoking
the query construction in
backend/apps/github/api/internal/queries/pull_request.py (the function that
builds the queryset for pull requests) and asserting the returned rows and/or
generated SQL respect the windowed rank filtering (e.g., only the latest per
author), ensuring the Window(Rank()) annotate +
filter(rank=1).order_by("-created_at") path is executed against the DB rather
than mocked.

In `@backend/apps/github/api/internal/queries/release.py`:
- Around line 61-70: The window partition currently groups only by
F("author_id"), which contradicts the docstring that promises unique releases
per author and repository; update the distinct logic where queryset is annotated
with Window/Rank (the block referencing rank=Window(expression=Rank(),
partition_by=[F("author_id")], ...)) to include F("repository_id") in the
partition_by list so the partitioning becomes [F("author_id"),
F("repository_id")] before filtering rank=1.

In `@backend/apps/github/api/internal/queries/repository.py`:
- Around line 40-42: Remove the select_related hints from the
`@strawberry_django.field` decorator in repository.py and instead apply ORM
optimization inside the field's resolver (or rely on DjangoOptimizerExtension).
Concretely, delete the select_related argument from the decorator and in the
resolver for this field use queryset.select_related("organization",
"owner__owasp_profile").prefetch_related("owner__user_badges__badge") (or let
DjangoOptimizerExtension handle it) so reverse/many relationships are prefetched
correctly and Strawberry schema hints are not exposed.

In `@backend/apps/github/api/internal/queries/user.py`:
- Line 40: The decorator currently passes the reverse FK path to select_related
which cannot traverse reverse relations; update the `@strawberry_django.field`
call so that select_related only includes "owasp_profile" and add
prefetch_related with "user_badges__badge" (i.e., use
select_related=["owasp_profile"] and prefetch_related=["user_badges__badge"]) to
correctly prefetch the reverse user_badges relation while keeping owasp_profile
as a select_related.

In `@backend/apps/owasp/api/internal/nodes/committee.py`:
- Around line 18-22: The created_at resolver (def created_at(self, root:
Committee) -> float) can return None via root.idx_created_at which contradicts
its non-nullable float signature; add a guard that returns a default float
(e.g., 0.0) when root.idx_created_at is falsy/None, mirroring the pattern used
by contributors_count/forks_count—ensure the resolver always returns a float
value.

In `@backend/apps/owasp/api/internal/permissions/project_health_metrics.py`:
- Around line 12-22: In has_permission, guard against user.github_user being
None to avoid AttributeError; update the final condition in has_permission to
verify github_user exists before accessing is_owasp_staff (for example check
user.github_user is not None or use getattr(user.github_user, "is_owasp_staff",
False)) so the method returns False when github_user is missing while preserving
behavior in settings.IS_E2E_ENVIRONMENT/IS_FUZZ_ENVIRONMENT.

In `@backend/apps/owasp/api/internal/queries/post.py`:
- Around line 16-19: Update the recent_posts resolver docstring to reflect the
behavior: mention that it returns up to `limit` most recent posts (default 5)
and that `limit` is capped by `MAX_LIMIT`; locate the docstring on the
`recent_posts` method (decorated with `@strawberry_django.field`) which calls
`Post.recent_posts()` and refers to `MAX_LIMIT` and adjust the text so it no
longer claims a fixed "5" posts.

In `@backend/apps/owasp/api/internal/views/permissions.py`:
- Around line 11-15: Add a startup assertion inside the Production configuration
class to prevent accidentally enabling E2E/fuzz auth bypass in production: in
the Production config (the settings class used for production) override __init__
(call super().__init__(*args, **kwargs)) and then assert not
self.IS_E2E_ENVIRONMENT and not self.IS_FUZZ_ENVIRONMENT (or raise RuntimeError
with a clear message) so startup fails if those flags are true; reference the
Production config class and the IS_E2E_ENVIRONMENT / IS_FUZZ_ENVIRONMENT
attributes when making this change.

In `@backend/apps/owasp/models/project_health_metrics.py`:
- Around line 149-152: The property project_requirements currently calls
ProjectHealthRequirements.objects.get(level=self.project.level) which will raise
if no record exists; change it to a safe lookup that returns None when missing
(e.g., use
ProjectHealthRequirements.objects.filter(level=self.project.level).first() or
catch ProjectHealthRequirements.DoesNotExist and return None) so the annotated
return type ProjectHealthRequirements | None is honored and the other accessors
can continue to check if self.project_requirements before use.

In `@backend/entrypoint.fuzz.sh`:
- Around line 10-49: The curl call that sets CSRF_TOKEN should include connect
and overall timeouts to avoid hanging in CI and fail fast on network issues
(e.g., add appropriate --connect-timeout and --max-time options to the curl
invocation that produces CSRF_TOKEN); also protect the pytest invocations from
word-splitting/globbing by quoting the TEST_FILE expansion in both branches
(wrap the ./tests/${TEST_FILE} operand in quotes) so filenames with spaces or
special chars are handled safely.

In `@backend/pyproject.toml`:
- Around line 105-108: Update the per-file ruff ignores for
lint.per-file-ignores."**/management/commands/dump_data.py": remove the
unnecessary "S603" entry (since dump_data.py does not use shell=True) and keep
only "S607", or if you prefer to keep S603 for future caution, add an inline
comment next to the ignore explaining why it's retained; reference the ignore
list key "lint.per-file-ignores.\"**/management/commands/dump_data.py\"" and the
rule IDs "S603" and "S607" when making the change so reviewers can see you
either narrowed the suppression to S607 or documented the justification for
S603.

In `@docker-compose/e2e/compose.yaml`:
- Around line 22-28: The healthcheck for the backend service uses wget but the
backend image (python:3.13.7-alpine) doesn't include it, causing the check to
always fail; either install wget in the backend Dockerfile by adding a package
install (e.g., apk add --no-cache wget) so the healthcheck's test: wget --spider
http://backend:9000/a/ succeeds, or replace the healthcheck block
(healthcheck/test) with a Python-based check that performs an HTTP GET to
http://backend:9000/a/ (using python -c 'import sys,urllib.request;
urllib.request.urlopen("http://backend:9000/a/")' and exiting nonzero on
failure) so the backend service can become healthy.

In `@frontend/.env.e2e.example`:
- Around line 1-16: Reorder the env keys so NEXTAUTH_* entries appear before
NEXT_PUBLIC_* to satisfy dotenv-linter ordering: move NEXTAUTH_SECRET and
NEXTAUTH_URL to above the NEXT_PUBLIC_API_URL (or above the first NEXT_PUBLIC_*
block), keeping all other keys unchanged; ensure the file still contains
NEXT_SERVER_* entries and secrets unchanged, or alternatively add a
dotenv-linter exemption comment if you prefer to keep the current order.
🧹 Nitpick comments (34)
backend/apps/mentorship/api/internal/queries/mentorship.py (2)

53-88: Approve changes with a minor consistency suggestion.

The refactor to return None instead of raising exceptions is appropriate. However, consider adding exc_info=True to the exception handler at lines 85-88 for consistency with program.py and module.py, which include the traceback for debugging.

Optional: Add exc_info for consistency
         except (Module.DoesNotExist, GithubUser.DoesNotExist, Mentee.DoesNotExist) as e:
             message = f"Mentee details not found: {e}"
-            logger.warning(message)
+            logger.warning(message, exc_info=True)
             return None

129-132: Same consistency suggestion for this exception handler.

Consider adding exc_info=True here as well.

Optional: Add exc_info for consistency
         except (Module.DoesNotExist, GithubUser.DoesNotExist, Mentee.DoesNotExist) as e:
             message = f"Mentee issues not found: {e}"
-            logger.warning(message)
+            logger.warning(message, exc_info=True)
             return []
backend/apps/api/rest/v0/organization.py (1)

82-82: Minor documentation inconsistency (pre-existing).

The endpoint description (line 82) says "Retrieve project details" and the docstring (line 96) says "Get project", but this endpoint handles organizations. Consider fixing for clarity.

📝 Suggested fix
 `@router.get`(
     "/{str:organization_id}",
-    description="Retrieve project details.",
+    description="Retrieve organization details.",
     operation_id="get_organization",
 def get_organization(
     request: HttpRequest,
     organization_id: str = Path(example="OWASP"),
 ) -> OrganizationDetail | OrganizationError:
-    """Get project."""
+    """Get organization."""

Also applies to: 96-96

backend/apps/api/rest/v0/__init__.py (2)

61-82: E2E and Fuzz settings are nearly identical.

These two blocks differ only in the "description" string. Consider consolidating them:

elif settings.IS_E2E_ENVIRONMENT or settings.IS_FUZZ_ENVIRONMENT:
    env_name = "E2E" if settings.IS_E2E_ENVIRONMENT else "Fuzz"
    api_settings_customization = {
        "auth": None,
        "servers": [{"description": env_name, "url": settings.SITE_URL}],
        "throttle": [],
    }

That said, keeping them separate follows the existing pattern (e.g., IS_LOCAL_ENVIRONMENT) and allows for independent future changes—so this is purely optional.


107-114: Add type hints to the exception handler.

The handler is missing type annotations for better maintainability and IDE support.

♻️ Proposed fix
+from django.http import HttpRequest, HttpResponse
+
 `@api.exception_handler`(ValidationError)
-def validation_exception_handler(request, exc):
+def validation_exception_handler(request: HttpRequest, exc: ValidationError) -> HttpResponse:
     """Handle validation exceptions."""
     return api.create_response(
         request,
         {"message": "Invalid request", "errors": exc.errors},
         status=400,
     )
backend/tests/apps/api/rest/v0/event_test.py (1)

4-9: Simplify date construction; timezone lookup is redundant.

Line 9 captures the current timezone, but the value is immediately discarded by .date() in Lines 17, 22, and 32. Using date(...) directly is simpler and avoids the global timezone dependency.

♻️ Proposed simplification
-from datetime import datetime
+from datetime import date

-import pytest
-from django.utils import timezone
+import pytest

-from apps.api.rest.v0.event import EventDetail
-from apps.owasp.models.event import Event as EventModel
-
-current_timezone = timezone.get_current_timezone()
+from apps.api.rest.v0.event import EventDetail
+from apps.owasp.models.event import Event as EventModel
@@
-        EventModel(
+        EventModel(
             description="this is a sample event",
-            end_date=datetime(2023, 6, 15, tzinfo=current_timezone).date(),
+            end_date=date(2023, 6, 15),
             key="sample-event",
             latitude=59.9139,
             longitude=10.7522,
             name="sample event",
-            start_date=datetime(2023, 6, 14, tzinfo=current_timezone).date(),
+            start_date=date(2023, 6, 14),
             url="https://github.com/owasp/Nest",
         ),
@@
-        EventModel(
+        EventModel(
             description=None,
             end_date=None,
             key="event-without-end-date",
             latitude=None,
             longitude=None,
             name="event without end date",
-            start_date=datetime(2023, 7, 1, tzinfo=current_timezone).date(),
+            start_date=date(2023, 7, 1),
             url=None,
         ),

Also applies to: 15-33

backend/apps/owasp/migrations/0071_trigram_extension.py (1)

7-13: Verify pg_trgm availability in target DBs.
CREATE EXTENSION requires appropriate privileges; please confirm CI/staging/prod roles allow this migration.

.github/workflows/update-nest-test-images.yaml (1)

77-89: Add GHA cache-from for fuzz image builds.

Other image build steps use both GHA and registry caches; this step only pulls registry cache, so the GHA cache you push won’t be reused. Adding type=gha keeps caching consistent and speeds CI.

♻️ Proposed tweak
-          cache-from: type=registry,ref=owasp/nest:test-fuzz-backend-cache
+          cache-from: |
+            type=gha
+            type=registry,ref=owasp/nest:test-fuzz-backend-cache
frontend/Makefile (1)

82-88: Clear Redis volume to avoid cross-run cache bleed.

Only the DB volume is removed; the cache volume persists and can leak state between runs. Consider clearing nest-e2e_e2e-cache-data as well.

♻️ Suggested cleanup
 test-frontend-e2e:
 	`@docker` container rm -f e2e-nest-db >/dev/null 2>&1 || true
 	`@docker` volume rm -f nest-e2e_e2e-db-data >/dev/null 2>&1 || true
+	`@docker` volume rm -f nest-e2e_e2e-cache-data >/dev/null 2>&1 || true
 	`@DOCKER_BUILDKIT`=1 \
 	docker compose --project-name nest-e2e -f docker-compose/e2e/compose.yaml up --build --remove-orphans --abort-on-container-exit db cache backend data-loader
docker/backend/Dockerfile.fuzz (1)

26-29: Redundant chmod +x after COPY with --chmod=755.

Line 26 already sets executable permissions via --chmod=755, making line 29 unnecessary.

🔧 Suggested fix
 COPY --chown=root:root --chmod=755 ./entrypoint.fuzz.sh ./entrypoint.sh
 COPY --chown=root:root --chmod=755 tests/fuzz tests
 
-RUN chmod +x ./entrypoint.sh
-
 USER owasp
backend/tests/fuzz/rest_test.py (1)

26-30: Add type hint for consistency with GraphQL test.

The case parameter lacks a type hint, unlike test_graphql_api which uses case: schemathesis.Case.

🔧 Suggested fix
 `@schema.parametrize`()
-def test_rest_api(case):
+def test_rest_api(case: schemathesis.Case) -> None:
     """Test REST API endpoints."""
     logger.info(case.as_curl_command())
     case.call_and_validate(headers=HEADERS)
backend/tests/fuzz/graphql_test.py (1)

32-33: Consider narrowing the scope of suppressed errors or logging them.

Blanket suppression of GraphQLClientError may hide unexpected issues during fuzz testing. Consider logging suppressed errors at DEBUG level for traceability.

🔧 Suggested enhancement
+from schemathesis.graphql.checks import GraphQLClientError
+
 `@schema.parametrize`()
 def test_graphql_api(case: schemathesis.Case) -> None:
     """Test GraphQL API endpoints."""
     logger.info(case.as_curl_command())
-    with suppress(GraphQLClientError):
-        case.call_and_validate(headers=HEADERS)
+    try:
+        case.call_and_validate(headers=HEADERS)
+    except GraphQLClientError as e:
+        logger.debug("GraphQL client error (expected): %s", e)
docker-compose/fuzz/compose.yaml (1)

85-85: Redis password exposed in process list via -a flag.

The redis-cli -a $$REDIS_PASSWORD ping command exposes the password in the process list. For a fuzz testing environment this is likely acceptable, but consider using --pass from a file or REDISCLI_AUTH environment variable for production-like setups.

.github/workflows/run-ci-cd.yaml (1)

288-301: Redundant port mapping with --network host.

The -p 9000:9000 flag is ignored when using --network host since host networking doesn't use Docker's port mapping. While harmless, removing it would reduce confusion.

♻️ Suggested fix
       - name: Start Backend in the background
         run: |
           docker run -d --rm --name e2e-nest-backend \
             --env-file backend/.env.e2e.example \
             --network host \
             -e DJANGO_DB_HOST=localhost \
             -e DJANGO_REDIS_AUTH_ENABLED=False \
             -e DJANGO_REDIS_HOST=localhost \
-            -p 9000:9000 \
             owasp/nest:test-backend-latest \
             sh -c '
               python manage.py migrate &&
               gunicorn wsgi:application --bind 0.0.0.0:9000
           '
.github/workflows/run-fuzz-tests.yaml (1)

57-70: Redundant port mapping with --network host.

Same as the E2E workflow, the -p 9500:9500 flag is ignored when using --network host.

♻️ Suggested fix
       - name: Run backend with fuzz environment variables
         run: |
           docker run -d --rm --name fuzz-nest-backend \
             --env-file backend/.env.fuzz.example \
             --network host \
             -e DJANGO_DB_HOST=localhost \
             -e DJANGO_REDIS_AUTH_ENABLED=False \
             -e DJANGO_REDIS_HOST=localhost \
-            -p 9500:9500 \
             owasp/nest:test-backend-latest \
             sh -c '
               python manage.py migrate &&
               gunicorn wsgi:application --bind 0.0.0.0:9500
             '
backend/apps/common/management/commands/dump_data.py (2)

102-106: Type hint mismatch: method returns sql.Composed objects, not str.

sql.SQL(...).format(...) returns sql.Composed objects. The return type hint list[str] is incorrect and may confuse static analyzers.

✏️ Suggested fix
+from psycopg2.sql import Composed
+
-    def _remove_emails(self, tables: list[str]) -> list[str]:
+    def _remove_emails(self, tables: list[str]) -> list[Composed]:
         return [
             sql.SQL("UPDATE {table} SET email = '';").format(table=sql.Identifier(table))
             for table in tables
         ]

108-130: Connection may leak on exception; use context manager.

If an exception occurs during query execution, connection.close() is never called. Use the connection as a context manager or wrap in try/finally.

Additionally, the sql_queries parameter accepts both str and sql.Composed objects but the type hint only specifies str.

♻️ Proposed fix
+from typing import Union
+from psycopg2.sql import Composable
+
     def _execute_sql(
         self,
         dbname: str,
-        sql_queries: list[str],
+        sql_queries: list[Union[str, Composable]],
     ):
-        connection = connect(
+        with connect(
             dbname=dbname,
             user=DB_USER,
             password=DB_PASSWORD,
             host=DB_HOST,
             port=DB_PORT,
-        )
-        connection.autocommit = True
-
-        rows = []
-        with connection.cursor() as cursor:
-            for sql_query in sql_queries:
-                cursor.execute(sql_query)
-                with contextlib.suppress(ProgrammingError):
-                    rows.extend(cursor.fetchall())
-        connection.close()
-
-        return rows
+        ) as connection:
+            connection.autocommit = True
+            rows = []
+            with connection.cursor() as cursor:
+                for sql_query in sql_queries:
+                    cursor.execute(sql_query)
+                    with contextlib.suppress(ProgrammingError):
+                        rows.extend(cursor.fetchall())
+            return rows
backend/Makefile (1)

124-125: Consider validating APP_NAME is set.

If APP_NAME is not provided, the command will run with an empty argument, potentially causing a confusing Django error. Consider adding validation or a usage hint.

✏️ Optional enhancement
 migrations-empty:
+ifndef APP_NAME
+	$(error APP_NAME is required. Usage: make migrations-empty APP_NAME=<app_name>)
+endif
 	`@CMD`="python manage.py makemigrations --empty $(APP_NAME)" $(MAKE) exec-backend-command
backend/apps/github/models/repository.py (1)

175-177: Keep recent_milestones return type consistent.
Returning [] changes the type from QuerySet to list, which can break callers that chain queryset methods. Consider returning an empty QuerySet instead.

♻️ Proposed adjustment
-        return self.milestones.order_by("-created_at") if self.pk else []
+        if not self.pk:
+            return self.milestones.model.objects.none()
+        return self.milestones.order_by("-created_at")
backend/tests/apps/owasp/models/committee_test.py (1)

28-33: Consider clearing the lru_cache to ensure test isolation.

The active_committees_count method uses @lru_cache, which persists across test runs in the same process. If multiple parameterized values are tested or other tests call this method first, the cached result may be returned instead of the mocked value.

♻️ Suggested fix to clear the cache
     `@pytest.mark.parametrize`(
         ("value"),
         [
             23,
         ],
     )
     def test_active_committees_count(self, value):
+        Committee.active_committees_count.cache_clear()
         with patch.object(Committee.objects, "filter") as mock_filter:
             mock_filter.return_value.count.return_value = value
             count = Committee.active_committees_count()
             assert count == value
             mock_filter.assert_called_once_with(has_active_repositories=True)
backend/tests/apps/owasp/models/chapter_test.py (1)

109-124: Consider clearing lru_cache between tests.

The active_chapters_count method uses @lru_cache, which persists across test runs within the same process. This could cause test pollution if other tests call this method.

Consider adding cache clearing in test setup or using Chapter.active_chapters_count.cache_clear() to ensure test isolation.

♻️ Suggested fix
     `@pytest.mark.parametrize`(
         ("value"),
         [
             42,
         ],
     )
     def test_active_chapters_count(self, value):
+        Chapter.active_chapters_count.cache_clear()
         with patch.object(Chapter.objects, "filter") as mock_filter:
             mock_filter.return_value.count.return_value = value
             assert Chapter.active_chapters_count() == value
             mock_filter.assert_called_once_with(
                 is_active=True,
                 latitude__isnull=False,
                 longitude__isnull=False,
                 owasp_repository__is_empty=False,
             )
backend/tests/apps/owasp/api/internal/queries/member_snapshot_test.py (1)

35-50: Tighten eager‑loading assertions to lock in the intended relations.

Right now the test only checks call counts. Asserting the exact select_related/prefetch_related args (and filter args) makes regressions easier to catch.

♻️ Suggested test hardening
-            # select_related and prefetch_related are called, then filter twice
-            mock_queryset.select_related.assert_called_once()
-            mock_queryset.prefetch_related.assert_called_once()
-            assert mock_queryset.filter.call_count == 2
+            # select_related and prefetch_related are called, then filter twice
+            mock_queryset.select_related.assert_called_once_with("github_user")
+            mock_queryset.prefetch_related.assert_called_once_with(
+                "issues", "pull_requests", "messages"
+            )
+            mock_queryset.filter.assert_any_call(github_user=mock_user)
+            mock_queryset.filter.assert_any_call(start_at__year=2025)
backend/tests/apps/owasp/api/internal/nodes/chapter_test.py (1)

77-78: Add an explicit null-guard before invoking the resolver.

This makes failures clearer if the field lookup ever changes.

💡 Suggested tweak
 field = self._get_field_by_name("contribution_stats", ChapterNode)
+assert field is not None
 result = field.base_resolver.wrapped_func(None, instance)
backend/apps/owasp/models/project.py (1)

222-225: Consider prefetch-friendly access for last_health_metrics.

Ordering on the related manager issues a query per project and bypasses any prefetch cache; if this property is used in lists, consider a Prefetch(..., to_attr=...) or an annotation-based approach for the latest metric.

backend/tests/apps/owasp/api/internal/nodes/member_snapshot_test.py (1)

35-36: Guard against missing field lookups to avoid AttributeError.

_get_field_by_name can return None, which would make field.base_resolver crash with a less clear failure. Consider asserting the field exists (or updating the helper to raise a clearer error) before invocation.

🔧 Suggested tweak (apply similarly to each resolver test)
-        field = self._get_field_by_name("commits_count", MemberSnapshotNode)
-        result = field.base_resolver.wrapped_func(None, mock_snapshot)
+        field = self._get_field_by_name("commits_count", MemberSnapshotNode)
+        assert field is not None
+        result = field.base_resolver.wrapped_func(None, mock_snapshot)

Also applies to: 46-47, 56-57, 66-67, 76-77, 86-87

backend/tests/apps/github/api/internal/nodes/issue_test.py (1)

45-47: Add a guard for missing fields to improve test failure clarity.

If _get_field_by_name returns None, the failure will be an attribute error instead of a clear “field not found.” Consider asserting the field exists before invoking base_resolver.

💡 Suggested tweak (apply to each lookup)
-        field = self._get_field_by_name("organization_name", IssueNode)
-        result = field.base_resolver.wrapped_func(None, mock_issue)
+        field = self._get_field_by_name("organization_name", IssueNode)
+        assert field is not None
+        result = field.base_resolver.wrapped_func(None, mock_issue)

Also applies to: 56-58, 65-67, 76-78, 85-87

backend/tests/apps/github/api/internal/nodes/pull_request_test.py (1)

39-41: Consider asserting field existence before calling the resolver.

This makes schema drift failures more readable than an attribute error.

💡 Suggested tweak (apply to each lookup)
-        field = self._get_field_by_name("organization_name", PullRequestNode)
-        result = field.base_resolver.wrapped_func(None, mock_pr)
+        field = self._get_field_by_name("organization_name", PullRequestNode)
+        assert field is not None
+        result = field.base_resolver.wrapped_func(None, mock_pr)

Also applies to: 50-52, 59-61, 70-72, 79-81, 88-90, 97-99

backend/apps/github/api/internal/nodes/organization.py (1)

42-71: Consider adding select_related or query optimization hints.

The stats resolver makes multiple database queries (repositories count, aggregate, and contributors count). While the aggregation logic is correct, the resolver doesn't leverage strawberry_django.field's query optimization hints like other resolvers in the PR do.

If this resolver is called frequently, consider whether any of the related data could benefit from caching or whether the queries could be combined. However, since the queries are already using aggregations efficiently, this is a minor optimization opportunity.

backend/apps/github/api/internal/queries/repository.py (1)

59-67: Consider simplifying the limit validation for readability.

The walrus operator with the ternary is functionally correct but somewhat dense. A more explicit approach could improve readability:

♻️ Optional refactor for clarity
-        return (
-            (
-                Repository.objects.filter(
-                    organization__login__iexact=organization,
-                ).order_by("-stars_count")[:limit]
-            )
-            if (limit := min(limit, MAX_LIMIT)) > 0
-            else []
-        )
+        limit = min(limit, MAX_LIMIT)
+        if limit <= 0:
+            return []
+        return list(
+            Repository.objects.filter(
+                organization__login__iexact=organization,
+            ).order_by("-stars_count")[:limit]
+        )
backend/apps/github/api/internal/nodes/issue.py (2)

48-51: Avoid per-issue queries in is_merged.

filter(...).exists() bypasses the prefetched collection and triggers a query per issue. Consider an in-memory check (or annotate at the queryset) to leverage prefetch. Please confirm query counts in tests.

♻️ Suggested update
-        return root.pull_requests.filter(state="closed", merged_at__isnull=False).exists()
+        return any(
+            pr.state == "closed" and pr.merged_at is not None
+            for pr in root.pull_requests.all()
+        )

53-61: Use prefetched participant_interests to avoid extra queries.

select_related().order_by() forces a new query, so the prefetch hint isn’t used. Sorting the prefetched list keeps it in-memory. Please confirm query counts in tests.

♻️ Suggested update
-        return [
-            interest.user
-            for interest in root.participant_interests.select_related("user").order_by(
-                "user__login"
-            )
-        ]
+        interests = sorted(
+            root.participant_interests.all(),
+            key=lambda interest: interest.user.login,
+        )
+        return [interest.user for interest in interests]
backend/apps/github/api/internal/nodes/repository.py (3)

46-51: Drop unused prefetch on issues.

order_by(...).[:limit] forces a fresh query per repository, so the prefetch is unused and can add redundant work. Consider removing it (or refactor to a prefetched to_attr list). Please confirm query counts in tests.

♻️ Suggested update
-    `@strawberry_django.field`(prefetch_related=["issues"])
+    `@strawberry_django.field`

69-72: Drop unused prefetch on recent_milestones.

The ordered slice triggers a new query per repository, so the prefetch isn’t used. Consider removing it (or refactor to use a prefetched to_attr list). Please confirm query counts in tests.

♻️ Suggested update
-    `@strawberry_django.field`(prefetch_related=["milestones"])
+    `@strawberry_django.field`

74-78: Drop unused prefetch on releases.

The ordered slice triggers a new query per repository, so the prefetch isn’t used. Consider removing it (or refactor to use a prefetched to_attr list). Please confirm query counts in tests.

♻️ Suggested update
-    `@strawberry_django.field`(prefetch_related=["releases"])
+    `@strawberry_django.field`

coderabbitai[bot]
coderabbitai bot previously approved these changes Jan 17, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/apps/owasp/models/project_health_metrics.py (1)

212-228: Monthly aggregation ignores year boundaries, causing data duplication when spanning multiple years.

ExtractMonth returns only the month number (1-12), so records from January 2025 and January 2026 are aggregated together. With a 365-day lookback, this will occur in any query executed after the first year of operation, producing misleading monthly trends.

Use TruncMonth instead to preserve year-month granularity. However, this requires downstream adjustments: TruncMonth returns a date object (e.g., 2025-01-01) rather than an integer. You'll need to either extract .month when building the months list, format the date as a string, or update the ProjectHealthStatsNode API contract if month numbers alone are insufficient for the UI.

🤖 Fix all issues with AI agents
In `@backend/apps/github/api/internal/queries/issue.py`:
- Around line 18-39: The select_related list in the strawberry_django.field
decorator contains reverse FK traversal paths that should be prefetched instead:
remove "author__user_badges__badge" and "milestone__author__user_badges__badge"
from the select_related argument and add those exact paths to the
prefetch_related list; update the decorator on the field (the
strawberry_django.field call) so select_related only contains forward relations
and prefetch_related includes the two moved reverse FK paths.

In `@backend/apps/github/api/internal/queries/pull_request.py`:
- Around line 19-38: The select_related list in the strawberry_django.field
decorator includes reverse FK traversal strings that must be prefetched instead;
remove "author__user_badges__badge" and "milestone__author__user_badges__badge"
from the select_related argument and add them to the prefetch_related argument
so they are loaded via prefetching (keep the existing strings exactly and
preserve the other select_related/prefetch_related entries and ordering).

In `@backend/apps/github/api/internal/queries/release.py`:
- Around line 18-24: Remove the reverse FK path "author__user_badges__badge"
from the strawberry_django.field select_related list and instead add it to
prefetch_related so the reverse relation is prefetched; specifically update the
decorator on the field (the strawberry_django.field call) to keep the existing
select_related entries ("author__owasp_profile", "repository__organization") but
move "author__user_badges__badge" into a prefetch_related parameter (or use a
Prefetch for "author__user_badges" with select_related("badge") if you want to
prefetch badges efficiently).

In `@backend/apps/owasp/api/internal/nodes/committee.py`:
- Around line 33-36: The repositories_count resolver currently always returns 1
which overcounts when a Committee has no attached repository; update the
repositories_count method on Committee to check root.owasp_repository and return
0 if it's missing, otherwise return the correct count (e.g., 1 when a repository
exists). Ensure you modify the repositories_count function to use the guard "if
not root.owasp_repository: return 0" before returning the repository count.
🧹 Nitpick comments (3)
backend/tests/apps/common/management/commands/dump_data_test.py (1)

19-100: Good test coverage for the happy path.

The test properly verifies the core workflow: temp DB creation, email redaction query, pg_dump arguments, and cleanup. A couple of observations:

  1. Lines 44-45: The # ruff: noqa comment placement inside the function call arguments is unusual but functional.

  2. Consider adding tests for error scenarios:

    • Empty table_list (no tables with email column)
    • CalledProcessError from pg_dump to verify CommandError is raised
    • Database connection failure during cleanup to verify warning is logged
backend/apps/common/management/commands/dump_data.py (1)

122-144: Consider using a context manager for the connection.

If an exception occurs during query execution (after line 134 but before line 142), the connection won't be explicitly closed. While Python's garbage collector will eventually clean it up, using a context manager ensures deterministic cleanup.

♻️ Suggested improvement
     def _execute_sql(
         self,
         dbname: str,
         sql_queries: list[sql.Composable],
     ):
-        connection = connect(
+        with connect(
             dbname=dbname,
             user=DB_USER,
             password=DB_PASSWORD,
             host=DB_HOST,
             port=DB_PORT,
-        )
-        connection.autocommit = True
-
-        rows = []
-        with connection.cursor() as cursor:
-            for sql_query in sql_queries:
-                cursor.execute(sql_query)
-                with contextlib.suppress(ProgrammingError):
-                    rows.extend(cursor.fetchall())
-        connection.close()
-
-        return rows
+        ) as connection:
+            connection.autocommit = True
+            rows = []
+            with connection.cursor() as cursor:
+                for sql_query in sql_queries:
+                    cursor.execute(sql_query)
+                    with contextlib.suppress(ProgrammingError):
+                        rows.extend(cursor.fetchall())
+            return rows
backend/apps/owasp/models/project_health_metrics.py (1)

173-181: Remove unnecessary select_related in subquery.

The select_related("project") on line 177 adds a JOIN that serves no purpose since the subquery only extracts id values via .values_list("id", flat=True). This adds overhead without benefit.

♻️ Suggested fix
         return ProjectHealthMetrics.objects.filter(
             id__in=ProjectHealthMetrics.objects.filter(project__is_active=True)
-            .select_related("project")
             .order_by("project_id", "-nest_created_at")
             .distinct("project_id")
             .values_list("id", flat=True)
         )

Comment on lines +33 to 36
@strawberry_django.field
def repositories_count(self, root: Committee) -> int:
"""Resolve repositories count."""
return 1
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard repositories_count when no repository is attached.

Other resolvers return 0 when root.owasp_repository is missing; returning 1 here can overcount in that case.

🛠️ Suggested fix
-        return 1
+        return 1 if root.owasp_repository else 0
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@strawberry_django.field
def repositories_count(self, root: Committee) -> int:
"""Resolve repositories count."""
return 1
`@strawberry_django.field`
def repositories_count(self, root: Committee) -> int:
"""Resolve repositories count."""
return 1 if root.owasp_repository else 0
🤖 Prompt for AI Agents
In `@backend/apps/owasp/api/internal/nodes/committee.py` around lines 33 - 36, The
repositories_count resolver currently always returns 1 which overcounts when a
Committee has no attached repository; update the repositories_count method on
Committee to check root.owasp_repository and return 0 if it's missing, otherwise
return the correct count (e.g., 1 when a repository exists). Ensure you modify
the repositories_count function to use the guard "if not root.owasp_repository:
return 0" before returning the repository count.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arkid15r can you check this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Collaborator

@arkid15r arkid15r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahmedxgouda please check new CR's comments.

@ahmedxgouda
Copy link
Collaborator Author

They are all false alerts @arkid15r

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/apps/owasp/api/internal/queries/project.py (1)

77-80: Bug: Empty name causes false positive matches for all projects.

When github_user.name is None or an empty string, the expression (github_user.name or "") evaluates to "", and icontains="" matches all non-null leaders_raw values. This incorrectly identifies such users as leaders of every project.

🐛 Proposed fix: Skip the name check when name is empty
-        return Project.objects.filter(
-            Q(leaders_raw__icontains=github_user.login)
-            | Q(leaders_raw__icontains=(github_user.name or ""))
-        ).exists()
+        query = Q(leaders_raw__icontains=github_user.login)
+        if github_user.name:
+            query |= Q(leaders_raw__icontains=github_user.name)
+        return Project.objects.filter(query).exists()
🤖 Fix all issues with AI agents
In `@backend/apps/github/api/internal/queries/issue.py`:
- Around line 52-58: The current queryset.annotate uses
Window(expression=Rank(), partition_by=[F("author_id")],
order_by=F("created_at").desc()) which can give ties and allow multiple issues
per author; replace Rank() with RowNumber() and make the Window order_by
deterministic by adding a tie-breaker (e.g., F("created_at").desc(),
F("id").desc()) so each author partition gets a unique sequential number, then
the existing filter that selects rank == 1 will reliably return a single issue
per author.

In `@backend/apps/github/api/internal/queries/pull_request.py`:
- Around line 72-83: The current Window/Rank used in the queryset annotate
(partition_by=[F("author_id")], order_by=F("created_at").desc()) can return
multiple rows per author when created_at ties occur; update the order_by in the
Window expression to include id as a deterministic secondary key (e.g., order_by
created_at desc then id desc) so Rank() will break ties consistently and the
subsequent .filter(rank=1) returns exactly one PR per author; modify the
annotate call where Rank(), Window, partition_by and order_by are used to
include F("id") (or the PK field) as the second sort criterion.

In `@backend/apps/owasp/api/internal/queries/member_snapshot.py`:
- Around line 34-38: The MemberSnapshot query currently uses
select_related("github_user") and prefetch_related("issues", "pull_requests",
"messages") but the
commits_count/issues_count/pull_requests_count/messages_count properties trigger
M2M .count() calls and cause N+1s; fix by either adding "commits" to the
prefetch_related call on MemberSnapshot.objects (so
prefetch_related("issues","pull_requests","messages","commits")) or, preferably,
switch to annotated counts using Django's Count (e.g.,
annotate(commits_count=Count("commits"), issues_count=Count("issues"), ...)) on
the same query that builds MemberSnapshot.objects to avoid per-row .count()
queries.

In `@backend/apps/owasp/models/common.py`:
- Around line 102-105: The return type for the cached_property entity_leaders is
incorrect: it returns a QuerySet, not a list. Update the annotation on
entity_leaders to return QuerySet[EntityMember] (importing the appropriate
typing/ORM QuerySet if needed) so it matches the actual return value from
self.entity_members.filter(...).order_by(...); alternatively, if a concrete list
is intended change the implementation to evaluate the queryset (e.g.,
list(self.entity_members.filter(...).order_by(...))) and keep the
list[EntityMember] annotation—also check child classes in project.py that slice
this property and ensure they still expect a QuerySet or adjust them if you
convert to a list.
🧹 Nitpick comments (4)
backend/apps/owasp/api/internal/queries/snapshot.py (1)

27-38: LGTM - Capped limit logic is correct.

The walrus operator correctly caps limit at MAX_LIMIT before slicing, and gracefully returns an empty list for non-positive values. The conditional expression guarantees the assignment happens before the slice is evaluated.

One optional readability consideration: the walrus operator inside the conditional can be subtle. A more explicit approach could separate the capping and validation:

♻️ Optional: More explicit limit handling
     `@strawberry_django.field`
     def snapshots(self, limit: int = 12) -> list[SnapshotNode]:
         """Resolve snapshots."""
+        limit = min(limit, MAX_LIMIT)
+        if limit <= 0:
+            return []
         return (
             Snapshot.objects.filter(
                 status=Snapshot.Status.COMPLETED,
             ).order_by(
                 "-created_at",
-            )[:limit]
-            if (limit := min(limit, MAX_LIMIT)) > 0
-            else []
+            )[:limit]
         )
backend/apps/github/api/internal/nodes/user.py (1)

35-48: Prefetch hint conflicts with subsequent .filter() call.

The prefetch_related=["user_badges__badge"] hint is negated by the .filter() call, which creates a new database query. The .select_related("badge") on line 42 becomes necessary because the prefetch is bypassed.

Consider either:

  1. Remove the prefetch_related hint (since it's not used) and rely on select_related
  2. Filter in Python to use the prefetch:
Option 2: Filter in Python to leverage prefetch
 `@strawberry_django.field`(prefetch_related=["user_badges__badge"])
 def badges(self, root: User) -> list[BadgeNode]:
     """Return user badges."""
-    user_badges = (
-        root.user_badges.filter(
-            is_active=True,
-        )
-        .select_related("badge")
-        .order_by(
-            "badge__weight",
-            "badge__name",
-        )
-    )
-    return [user_badge.badge for user_badge in user_badges]
+    return [
+        ub.badge
+        for ub in sorted(
+            (ub for ub in root.user_badges.all() if ub.is_active),
+            key=lambda ub: (ub.badge.weight, ub.badge.name),
+        )
+    ]
backend/apps/owasp/api/internal/queries/member_snapshot.py (1)

64-68: Simplify user filter to avoid extra query/exception.

You can filter by github_user__login directly and drop the User.objects.get call.

♻️ Proposed simplification
-        if user_login:
-            try:
-                snapshots = snapshots.filter(github_user=User.objects.get(login=user_login))
-            except User.DoesNotExist:
-                return []
+        if user_login:
+            snapshots = snapshots.filter(github_user__login=user_login)
backend/apps/github/api/internal/queries/repository.py (1)

9-10: Prefer an explicit capped-limit variable for readability.

The walrus-in-conditional works, but computing the cap once makes the flow easier to read and safer to extend.

♻️ Suggested refactor
-        return (
-            (
-                Repository.objects.filter(
-                    organization__login__iexact=organization,
-                ).order_by("-stars_count")[:limit]
-            )
-            if (limit := min(limit, MAX_LIMIT)) > 0
-            else []
-        )
+        capped_limit = min(limit, MAX_LIMIT)
+        if capped_limit <= 0:
+            return []
+        return (
+            Repository.objects.filter(
+                organization__login__iexact=organization,
+            ).order_by("-stars_count")[:capped_limit]
+        )

Also applies to: 58-65

Comment on lines +52 to +58
queryset = (
queryset.annotate(
rank=Window(
expression=Rank(),
partition_by=[F("author_id")],
order_by=F("created_at").desc(),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n backend/apps/github/api/internal/queries/issue.py | head -100

Repository: OWASP/Nest

Length of output: 2375


🏁 Script executed:

# Check if RowNumber is available in Django
rg "RowNumber" --type py

Repository: OWASP/Nest

Length of output: 36


🌐 Web query:

Django db.models.functions RowNumber window function

💡 Result:

Short answer: Django provides RowNumber as a window function in django.db.models.functions.window which emits ROW_NUMBER() (integer) and is used with Window(...). Example:

  • import and basic usage:
    from django.db.models import F, Window
    from django.db.models.functions.window import RowNumber
    qs = MyModel.objects.annotate(
    rn=Window(
    expression=RowNumber(),
    partition_by=[F('group_field')], # optional
    order_by=F('created_at').desc()
    )
    )

  • Notes: RowNumber computes sequential row numbers per partition (or over the whole result if no partition). It requires a database that supports SQL window functions (some older SQLite versions do not). See Django docs and source for details. [1][2][3]

Sources:
[1] Django docs — Database functions (RowNumber).
[2] Django source for django.db.models.functions.window (RowNumber).
[3] Django ticket about older SQLite lacking window support (OperationalError).


🏁 Script executed:

# Check for tests related to recent_issues distinct behavior
fd "test.*issue" -t f | head -20

Repository: OWASP/Nest

Length of output: 36


🏁 Script executed:

# Search for test files that reference distinct or recent_issues
rg "recent_issues|distinct.*author" --type py -l | grep -i test

Repository: OWASP/Nest

Length of output: 174


🏁 Script executed:

cat -n backend/tests/apps/github/api/internal/queries/issue_test.py

Repository: OWASP/Nest

Length of output: 9137


Use RowNumber() instead of Rank() to ensure one issue per author.

Rank() assigns identical ranks to tied rows, so multiple issues with the same created_at per author will both have rank=1 and both pass the filter (line 60). This violates the documented contract at line 30: "Whether to return unique issues per author". Use RowNumber() instead, which assigns unique sequential numbers per partition, and add a deterministic tie-breaker like F("id") to guarantee consistent results.

Proposed fix
-from django.db.models.functions import Rank
+from django.db.models.functions import RowNumber
@@
-                queryset.annotate(
+                queryset.annotate(
                     rank=Window(
-                        expression=Rank(),
+                        expression=RowNumber(),
                         partition_by=[F("author_id")],
-                        order_by=F("created_at").desc(),
+                        order_by=[F("created_at").desc(), F("id").desc()],
                     )
                 )
🤖 Prompt for AI Agents
In `@backend/apps/github/api/internal/queries/issue.py` around lines 52 - 58, The
current queryset.annotate uses Window(expression=Rank(),
partition_by=[F("author_id")], order_by=F("created_at").desc()) which can give
ties and allow multiple issues per author; replace Rank() with RowNumber() and
make the Window order_by deterministic by adding a tie-breaker (e.g.,
F("created_at").desc(), F("id").desc()) so each author partition gets a unique
sequential number, then the existing filter that selects rank == 1 will reliably
return a single issue per author.

Comment on lines 72 to 83
if distinct:
latest_pull_request_per_author = (
queryset.filter(
author_id=OuterRef("author_id"),
queryset = (
queryset.annotate(
rank=Window(
expression=Rank(),
partition_by=[F("author_id")],
order_by=F("created_at").desc(),
)
)
.order_by(
"-created_at",
)
.values("id")[:1]
)
queryset = queryset.filter(
id__in=Subquery(latest_pull_request_per_author),
).order_by(
"-created_at",
.filter(rank=1)
.order_by("-created_at")
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cd backend/apps/github/api/internal/queries && cat -n pull_request.py | head -100

Repository: OWASP/Nest

Length of output: 3445


🏁 Script executed:

cd backend && find . -name "pull_request.py" -type f

Repository: OWASP/Nest

Length of output: 272


🏁 Script executed:

rg "distinct" backend/apps/github/api/internal/queries/pull_request.py -A 3 -B 3

Repository: OWASP/Nest

Length of output: 847


🏁 Script executed:

rg "Rank\(\)" . --type py -B 5 -A 10

Repository: OWASP/Nest

Length of output: 3972


🏁 Script executed:

cd backend && rg -i "test.*recent_pull_requests" --type py -l

Repository: OWASP/Nest

Length of output: 165


🏁 Script executed:

cd backend && rg "recent_pull_requests" --type py -B 2 -A 2

Repository: OWASP/Nest

Length of output: 13248


🏁 Script executed:

cd backend && find . -path "*/test*" -name "*.py" -exec grep -l "Rank\|Window" {} \;

Repository: OWASP/Nest

Length of output: 227


🌐 Web query:

Django Rank() window function behavior ties multiple rows SQL

💡 Result:

Short answer: Django's Rank() maps to SQL RANK(), so rows that tie on the ORDER BY get the same rank and the next rank value is skipped (gaps). Use DenseRank() to avoid gaps, or RowNumber() / add extra ORDER BY tie-breakers if you need exactly one row per rank. [1][2]


🏁 Script executed:

cat -n tests/apps/github/api/internal/queries/pull_request_test.py | grep -A 20 "test_recent_pull_requests_distinct"

Repository: OWASP/Nest

Length of output: 145


🏁 Script executed:

find . -name "pull_request_test.py" -type f

Repository: OWASP/Nest

Length of output: 367


🏁 Script executed:

cd backend && cat -n tests/apps/github/api/internal/queries/pull_request_test.py | grep -A 20 "test_recent_pull_requests_distinct"

Repository: OWASP/Nest

Length of output: 3194


Add deterministic tie-breaker to ensure one PR per author when created_at timestamps match.

Rank() assigns the same rank to rows with identical created_at values, so multiple PRs from one author with identical timestamps will all pass rank=1, violating the "unique per author" contract. Add id as a secondary sort key to deterministically select only one PR per author.

Suggested fix
                 queryset.annotate(
                     rank=Window(
                         expression=Rank(),
                         partition_by=[F("author_id")],
-                        order_by=F("created_at").desc(),
+                        order_by=[F("created_at").desc(), F("id").desc()],
                     )
                 )
🤖 Prompt for AI Agents
In `@backend/apps/github/api/internal/queries/pull_request.py` around lines 72 -
83, The current Window/Rank used in the queryset annotate
(partition_by=[F("author_id")], order_by=F("created_at").desc()) can return
multiple rows per author when created_at ties occur; update the order_by in the
Window expression to include id as a deterministic secondary key (e.g., order_by
created_at desc then id desc) so Rank() will break ties consistently and the
subsequent .filter(rank=1) returns exactly one PR per author; modify the
annotate call where Rank(), Window, partition_by and order_by are used to
include F("id") (or the PK field) as the second sort criterion.

Comment on lines +34 to +38
query = (
MemberSnapshot.objects.select_related("github_user")
.prefetch_related("issues", "pull_requests", "messages")
.filter(github_user=user)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

*Avoid N+1 queries for _count fields.

commits_count/issues_count/pull_requests_count/messages_count trigger M2M .count() calls; without prefetching (and with limit up to 1000), this can explode into thousands of queries. Add prefetches (including commits) or switch to annotated counts.

🔧 Proposed fix (prefetch all M2Ms used by count fields)
-            query = (
-                MemberSnapshot.objects.select_related("github_user")
-                .prefetch_related("issues", "pull_requests", "messages")
-                .filter(github_user=user)
-            )
+            query = (
+                MemberSnapshot.objects.select_related("github_user")
+                .prefetch_related("commits", "issues", "pull_requests", "messages")
+                .filter(github_user=user)
+            )
@@
-        snapshots = MemberSnapshot.objects.all().select_related("github_user")
+        snapshots = (
+            MemberSnapshot.objects.all()
+            .select_related("github_user")
+            .prefetch_related("commits", "issues", "pull_requests", "messages")
+        )

Also applies to: 62-71

🤖 Prompt for AI Agents
In `@backend/apps/owasp/api/internal/queries/member_snapshot.py` around lines 34 -
38, The MemberSnapshot query currently uses select_related("github_user") and
prefetch_related("issues", "pull_requests", "messages") but the
commits_count/issues_count/pull_requests_count/messages_count properties trigger
M2M .count() calls and cause N+1s; fix by either adding "commits" to the
prefetch_related call on MemberSnapshot.objects (so
prefetch_related("issues","pull_requests","messages","commits")) or, preferably,
switch to annotated counts using Django's Count (e.g.,
annotate(commits_count=Count("commits"), issues_count=Count("issues"), ...)) on
the same query that builds MemberSnapshot.objects to avoid per-row .count()
queries.

Comment on lines +102 to +105
@cached_property
def entity_leaders(self) -> list[EntityMember]:
"""Return entity's leaders."""
return EntityMember.objects.filter(
entity_id=self.id,
entity_type=ContentType.objects.get_for_model(self.__class__),
is_active=True,
is_reviewed=True,
role=EntityMember.Role.LEADER,
).order_by("order")
return self.entity_members.filter(role=EntityMember.Role.LEADER).order_by("order")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Return type annotation should be QuerySet instead of list.

The method returns self.entity_members.filter(...).order_by(...), which is a QuerySet[EntityMember], not a list[EntityMember]. This type mismatch can cause incorrect type hints for downstream consumers and static analysis tools.

Note: The child class in project.py inherits this issue since slicing a QuerySet also returns a QuerySet.

Suggested fix
+from django.db.models import QuerySet
+
 ...
 
     `@cached_property`
-    def entity_leaders(self) -> list[EntityMember]:
+    def entity_leaders(self) -> QuerySet[EntityMember]:
         """Return entity's leaders."""
         return self.entity_members.filter(role=EntityMember.Role.LEADER).order_by("order")

Alternatively, if a list is genuinely intended for the API contract, evaluate the QuerySet:

     `@cached_property`
     def entity_leaders(self) -> list[EntityMember]:
         """Return entity's leaders."""
-        return self.entity_members.filter(role=EntityMember.Role.LEADER).order_by("order")
+        return list(self.entity_members.filter(role=EntityMember.Role.LEADER).order_by("order"))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@cached_property
def entity_leaders(self) -> list[EntityMember]:
"""Return entity's leaders."""
return EntityMember.objects.filter(
entity_id=self.id,
entity_type=ContentType.objects.get_for_model(self.__class__),
is_active=True,
is_reviewed=True,
role=EntityMember.Role.LEADER,
).order_by("order")
return self.entity_members.filter(role=EntityMember.Role.LEADER).order_by("order")
`@cached_property`
def entity_leaders(self) -> QuerySet[EntityMember]:
"""Return entity's leaders."""
return self.entity_members.filter(role=EntityMember.Role.LEADER).order_by("order")
Suggested change
@cached_property
def entity_leaders(self) -> list[EntityMember]:
"""Return entity's leaders."""
return EntityMember.objects.filter(
entity_id=self.id,
entity_type=ContentType.objects.get_for_model(self.__class__),
is_active=True,
is_reviewed=True,
role=EntityMember.Role.LEADER,
).order_by("order")
return self.entity_members.filter(role=EntityMember.Role.LEADER).order_by("order")
`@cached_property`
def entity_leaders(self) -> list[EntityMember]:
"""Return entity's leaders."""
return list(self.entity_members.filter(role=EntityMember.Role.LEADER).order_by("order"))
🤖 Prompt for AI Agents
In `@backend/apps/owasp/models/common.py` around lines 102 - 105, The return type
for the cached_property entity_leaders is incorrect: it returns a QuerySet, not
a list. Update the annotation on entity_leaders to return QuerySet[EntityMember]
(importing the appropriate typing/ORM QuerySet if needed) so it matches the
actual return value from self.entity_members.filter(...).order_by(...);
alternatively, if a concrete list is intended change the implementation to
evaluate the queryset (e.g.,
list(self.entity_members.filter(...).order_by(...))) and keep the
list[EntityMember] annotation—also check child classes in project.py that slice
this property and ensure they still expect a QuerySet or adjust them if you
convert to a list.

ahmedxgouda and others added 8 commits January 18, 2026 18:26
* Add dockerfile and compose and update makefile

* Dump db-data

* Reorder makefile

* Dump db-data

* Remove e2e dockerfile

* Add CI/CD

* Apply rabbit's suggestions

* Add postgres instead of pgvector

* Remove needs

* Update envs

* Fix migrations step

* Add envs to docker

* Remove migrations step

* Remove --without test from dockerfile

* Copy tests in dockerfile and add needs to gh workflow

* Update dockerfile

* Apply suggestion

* Use the e2e instance in the frontend e2e tests

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD and dockerfile

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Restore needs for the job

* Update Makefiles

* Update docs

* Apply suggestions

* Update load-data

* Skip sonar error

* update port numbers

* Update docs

* Update code

* Add required shell property back

* Bump config.webServer timeout

* Update code

---------

Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
* Add DB environment variables to e2e.yaml and add csrf_decorate function

* Skip sonar suggestion

* Fix rest api internal error

* Add timeout

* Update docs

* Update code

* Revert csrf update

* Add command to dump local data

* Update dump and load data

* Update rest api config and docs

* Apply check-spelling

* Use .env.e2e.example for frontend e2e tests in gh actions

* Apply rabbit's suggestions

* Migrate dump_data to django command and dump owasp, github, and slack tables only

* Apply rabbit's suggestions

* Update code

* Refactor dump_data

* Use local cache for e2e

* Remove old load_data command

* Add tests

* Skip sonar

* Apply rabbit suggestions

* Update code

---------

Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>

Add backend/data/nest.sql.gz
* Add hypothesis to poetry and fuzz test the index endpoint.

* Refactor import statements in algolia_test.py for consistency

* Add the client ip address to the request META.

* Add fuzz testing for Slack event handlers and refactor algolia fuzz testing.

* Refactor fuzz tests for Slack event handlers to improve readability and structure

* Fix the poetry lock file.

* Remove fuzz testing from algolia_search unit tests

* Create a docker file for fuzz-testing, add the run commands to the MakeFile and install model_bakery for creating randomized data for all models.

* Refactor to improve quality

* Update fuzz testing setup: modify Makefile and Dockerfile, add entrypoint script for environment configuration

* Update poetry.lock to reflect dependency changes and version updates

* Create a fuzz configuration, update docker file, makefile, and the tests.

* Refactor fuzz configuration by reorganizing imports and cleaning up whitespace

* Update Dockerfile and entrypoint script to use Alpine base image and shell

* Run the server on port 8000 after the tests.

* Create a docker compose file for fuzz testing.

* Add 'graphqler' to custom dictionary

* Load data from nest.json and add graphqler to cspell dict.

* Remove model-bakery dependency from pyproject.toml and update poetry.lock

* Update graphqler command in docker compose and the healthcheck

* Update graphql command to use backend service URL in docker-compose

* Refactor docker-compose to build graphqler service from Dockerfile and add entrypoint script for fuzzing tests

* Enhance fuzz testing setup: update Dockerfile and entrypoint scripts, improve CSRF handling, and refine healthcheck command in docker-compose

* Update fuzz-test-backend command to abort on container exit

* Add fuzz testing workflow and update image build steps

* Add .env file creation step for fuzz tests in CI/CD workflow

* Add Docker Hub login step for fuzz tests in CI/CD workflow

* Refactor for the checks

* Refactor fuzz testing workflow: replace Docker Hub login with buildx setup and update docker-compose handling

* Fix fuzz tests workflow: rename docker-compose file

* Refactor fuzz-tests job.

* Add environment variables for fuzz tests configuration

* Update fuzz tests environment variables

* Fix poetry lock file

* Sort the custom-dict.

* Update content hash in poetry.lock

* Add docker cache mounts to the backend image

* Add Redis configuration

* refactor yaml

* Add docker cache mounts to graphql file

* Remove unnecessary chmod command for cache directories in Dockerfile

* Fix poetry lock file

* Add cache mounts to backend tests

* Update cache mounts in graphql image

* Update mkdir in graphql image

* Remove duplicates

* Update tests

* Rename docker compose

* Update poetry lock

* Apply sonar

* Migrate to OWASP repo

* Update docker

* Use graphqler maintainer docker image

* Add disable permissions, update docker compose, and update entrypoint

* Establish an e2e backend instance locally and in CI/CD (#2429)

* Add dockerfile and compose and update makefile

* Dump db-data

* Reorder makefile

* Dump db-data

* Remove e2e dockerfile

* Add CI/CD

* Apply rabbit's suggestions

* Add postgres instead of pgvector

* Remove needs

* Update envs

* Fix migrations step

* Add envs to docker

* Remove migrations step

* Remove --without test from dockerfile

* Copy tests in dockerfile and add needs to gh workflow

* Update dockerfile

* Apply suggestion

* Use the e2e instance in the frontend e2e tests

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD and dockerfile

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Restore needs for the job

* Update Makefiles

* Update docs

* Apply suggestions

* Update load-data

* Skip sonar error

* update port numbers

* Update docs

* Update code

* Add required shell property back

* Bump config.webServer timeout

* Update code

---------

Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>

* Fix running e2e backend (#2710)

* Add DB environment variables to e2e.yaml and add csrf_decorate function

* Skip sonar suggestion

* Fix rest api internal error

* Add timeout

* Update docs

* Update code

* Revert csrf update

* Add command to dump local data

* Update dump and load data

* Update rest api config and docs

* Apply check-spelling

* Use .env.e2e.example for frontend e2e tests in gh actions

* Apply rabbit's suggestions

* Migrate dump_data to django command and dump owasp, github, and slack tables only

* Apply rabbit's suggestions

* Update code

* Refactor dump_data

* Use local cache for e2e

* Remove old load_data command

* Add tests

* Skip sonar

* Apply rabbit suggestions

* Update code

---------

Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>

* Update code for e2e

* Add runs-on

* Skip sonar and fix ci/cd

* Apply rabbit suggestion and override entrypoint in ci/cd

* Use env with csrf

* Add timeout

* Remove hypothesis and old test files

* Apply rabbit's suggestions

* Update ci/cd and makefile

* Use digest pinning with graphqler image

* Update dockerfile and fix the typeerror issue

* Apply sonar suggestion

* Apply sonar and rabbit suggestions

* Remove cache from ci/cd

* Use curl instead of wget

* Separate e2e from fuzz

* Update fuzz ci/cd

* Update CI/CD

* Run precommit

* Update code

* Update code

* Update docs, ci/cd, and apply suggestions

* Use digest pinning and parameters in workflow

* Apply sonar suggestions

* Apply rabbit suggestions

* Run migrations for fuzz testing in ci/cd

* Apply rabbit suggestions

* Fix exceptions

* Establish an e2e backend instance locally and in CI/CD (#2429)

* Add dockerfile and compose and update makefile

* Dump db-data

* Reorder makefile

* Dump db-data

* Remove e2e dockerfile

* Add CI/CD

* Apply rabbit's suggestions

* Add postgres instead of pgvector

* Remove needs

* Update envs

* Fix migrations step

* Add envs to docker

* Remove migrations step

* Remove --without test from dockerfile

* Copy tests in dockerfile and add needs to gh workflow

* Update dockerfile

* Apply suggestion

* Use the e2e instance in the frontend e2e tests

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD and dockerfile

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Restore needs for the job

* Update Makefiles

* Update docs

* Apply suggestions

* Update load-data

* Skip sonar error

* update port numbers

* Update docs

* Update code

* Add required shell property back

* Bump config.webServer timeout

* Update code

---------

Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>

* Fix running e2e backend (#2710)

* Add DB environment variables to e2e.yaml and add csrf_decorate function

* Skip sonar suggestion

* Fix rest api internal error

* Add timeout

* Update docs

* Update code

* Revert csrf update

* Add command to dump local data

* Update dump and load data

* Update rest api config and docs

* Apply check-spelling

* Use .env.e2e.example for frontend e2e tests in gh actions

* Apply rabbit's suggestions

* Migrate dump_data to django command and dump owasp, github, and slack tables only

* Apply rabbit's suggestions

* Update code

* Refactor dump_data

* Use local cache for e2e

* Remove old load_data command

* Add tests

* Skip sonar

* Apply rabbit suggestions

* Update code

---------

Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>

* Add backend/data/nest.sql.gz

* Update code

* Automate data loading

* Update dump_data to avoid exceptions

* Update dump_data and automated data_loading

* Update CI/CD

* Update tests

* Add timeout for fuzz tests

* Update timeout for fuzz

* Update CI/CD

* Update CI/CD

* Update CI/CD

* Apply rabbit's suggestions

* Update backend/Makefile

* Update make targets and docker compose

* Add volume for graphql fuzzing results and add upload artifacts in ci/cd

* Update ci/cd

* Update ci/cd

* Update ci/cd

* Update ci/cd

* Update docker compose and makefile

* Apply rabbit's suggestions

* Update dump to match the last nest.json.gz

---------

Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
* Add redis cache locally and in ci/cd

* Update env and makefile

* Update make target

* Add cache to fuzz and apply rabbit suggestions

* Update makefile target

* Add cache as dependency to the backend in docker compose

* Update file naming to match the remaining compose projects
* Update volumes

* Update compose folders

Update docker compose, dump file and ci/cd

Update ci/cd
* Migrate to schemathesis and add rest fuzz tests

* Apply cspell

* Update Fuzz Dockerfile context

* Update rest auth for fuzz

* Optimize Project Health Stats query and split rest and graphql tests

* Split rest and graphql tests workflows

* Update ci/cd

* Apply rabbit suggestions

* Update ci/cd

* Apply rabbit's suggestions

* Increase number of examples

* Apply rabbit's suggestions

* Apply pre-commit checks

* Update CI/CD

* Update makefile

* Update CI/CD

* Update CI/CD

* Update ci/cd

* Update ci/cd

* Update CI/CD

* Update settings.base.py and ci/cd

* Update configuration and ci/cd

* Update alphabitical order in env files

* Fix negative indexing

* Add EscapeNullCharactersMiddleware

* Update middleware, schemathesis config and add HTTP BadRequest status code to the rest api docs

* Update rest api schema

* Update tests

* Optimize recentIssues

* Add optimiazations and fixes

* Update tests, ci/cd and apply rabbit suggestions

* Optimize N+1 queries

* Update tests

* Update rest schema and add a name for Fuzz Tests job in CI/CD

* Fix negative indexing

* Update project health metrics filters and pagination

* Update mentorship app, ci/cd, and entrypoint

* Add trigram index to project

* Update nest.dump

* Update entrypoint

* Apply checks

* Add
QueryDepthLimiter

* Add optimizations

* Update tests

* Update CI/CD

* Add fixes

* Apply rabbit's suggestion

* Refactor docker files

* Apply cspell

* Refactor limits

* Update milestone enum and rest api endpoints

* Apply middleware suggestions

* Migrate to strawberry_django.field

* Update REDIS_AUTH_ENABLED default value

* Update queries to use strawberry_django.field

* Apply rabbit suggestions

* Update tests and appply rabbit suggestion

* Fix pagination.limit

* Add optimizations and fixes

* Update code

* Add optimizations

* Add optimizations

* Add optimizations

* Add fixes

* Add milestone index migration

* Update nest.dump

* Add optimizations and fixes

* Update snapshot query

* Update backend tests

* Update project model

* Apply rabbit suggestion

* Apply rabbit suggestions

* Apply rabbit suggestion and update tests

* Update code

---------

Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
ahmedxgouda and others added 2 commits January 18, 2026 18:26
* Update dump_data

* Apply rabbit suggestions

* Update tests and dump_data command

* Update update-nest-test-images.yaml
* Update strawberry_django decorators

* Update entity_leaders

* Update tests

* Update code

---------

Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
@arkid15r arkid15r force-pushed the feature/e2e-backend branch from 711a5f1 to eb1354d Compare January 19, 2026 02:28
@sonarqubecloud
Copy link

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
backend/apps/api/rest/v0/committee.py (1)

81-85: Function name and docstring mismatch the endpoint purpose.

The function is named get_chapter with docstring "Get chapter", but it handles the committee endpoint /{str:committee_id}. This appears to be a copy-paste error.

Suggested fix
-def get_chapter(
+def get_committee(
     request: HttpRequest,
     committee_id: str = Path(example="project"),
 ) -> CommitteeDetail | CommitteeError:
-    """Get chapter."""
+    """Get committee."""
backend/tests/apps/owasp/api/internal/nodes/project_health_metrics_test.py (2)

19-44: Missing fields in expected_field_names.

The expected_field_names set appears to be missing age_days_requirement and last_pull_request_days_requirement, which are defined in ProjectHealthMetricsNode (lines 40-42 and 65-67 of the node file).

🐛 Suggested fix
         expected_field_names = {
             "age_days",
+            "age_days_requirement",
             "created_at",
             "contributors_count",
             "forks_count",
             "is_funding_requirements_compliant",
             "is_leader_requirements_compliant",
             "last_commit_days",
             "last_commit_days_requirement",
             "last_pull_request_days",
+            "last_pull_request_days_requirement",
             "last_release_days",
             "last_release_days_requirement",
             "open_issues_count",

46-76: Missing test cases for requirement fields.

The parameterized test is also missing test cases for age_days_requirement and last_pull_request_days_requirement fields.

💚 Add missing test cases
     `@pytest.mark.parametrize`(
         ("field_name", "expected_type"),
         [
             ("age_days", int),
+            ("age_days_requirement", int),
             ("created_at", datetime),
             ...
             ("last_pull_request_days", int),
+            ("last_pull_request_days_requirement", int),
             ("last_release_days", int),
🤖 Fix all issues with AI agents
In @.github/workflows/run-fuzz-tests.yaml:
- Around line 93-105: The GitHub Actions step "Build Fuzz-testing image" sets
context: backend but uses file: docker/backend/Dockerfile.fuzz which is resolved
relative to the context and therefore incorrect; update the step so the
Dockerfile path is correct relative to the context—either change context to .
(repo root) or change file to a path relative to backend (e.g.,
../../docker/backend/Dockerfile.fuzz) so that the Docker build action can find
the Dockerfile.

In `@backend/apps/github/api/internal/queries/release.py`:
- Around line 55-66: The Window ordering used when annotating rank can produce
non-deterministic ties when multiple releases by the same author share the same
published_at; update the Window in the distinct branch (the annotate(... Rank()
... order_by=F("published_at").desc())) to include F("id") as a secondary sort
key (e.g. order_by=[F("published_at").desc(), F("id").desc()]) so Rank() yields
a single rank=1 row per author.

In `@backend/apps/owasp/api/internal/queries/post.py`:
- Line 1: Update the module docstring in post.py to correctly describe the file
contents: replace the incorrect "OWASP event GraphQL queries." text with
something like "OWASP post GraphQL queries." to reflect that this module
contains GraphQL queries for the Post model (module: post.py).

In `@backend/apps/owasp/api/internal/queries/project_health_metrics.py`:
- Around line 46-55: The current logic validates and mutates
pagination.offset/limit but never slices the queryset returned by
ProjectHealthMetrics.get_latest_health_metrics(), so apply the computed
offset/limit to the queryset before returning; fetch queryset =
ProjectHealthMetrics.get_latest_health_metrics(), validate pagination.offset and
pagination.limit as you already do (using MAX_OFFSET, MAX_LIMIT, UNSET), compute
local offset = min(pagination.offset, MAX_OFFSET) and, if a valid limit is
provided, compute limit = min(pagination.limit, MAX_LIMIT) and return
list(queryset[offset:offset+limit]) (or list(queryset[offset:]) when no limit),
otherwise return list(queryset) when pagination is absent.

In `@backend/Makefile`:
- Around line 103-113: The Makefile targets load-data, load-data-e2e and
load-data-fuzz run pg_restore against ./backend/data/nest.dump inside DB
containers that don't have that host path mounted; either mount the host
backend/data into the containers or change the targets to use the containers'
mounted path or remove them and invoke the existing data-loader services.
Concretely: for local (nest-db) add a volume mapping so ./backend/data is
available or remove/disable load-data and use the data-loader service; for
e2e/fuzz update load-data-e2e and load-data-fuzz to reference the
container-mounted path (/data/nest.dump) or call the e2e/fuzz data-loader
service instead of exec-db-command-e2e/exec-db-command-fuzz; ensure
exec-db-command, exec-db-command-e2e and exec-db-command-fuzz targets are
consistent with the chosen approach.

In `@backend/tests/apps/owasp/models/committee_test.py`:
- Around line 28-33: The test may see stale results because
Committee.active_committees_count is lru_cache-decorated; before invoking
Committee.active_committees_count() in test_active_committees_count, clear its
cache by calling Committee.active_committees_count.cache_clear() (or
functools.cache_clear equivalent) so the mocked Committee.objects.filter.count()
is actually used; add that cache_clear call just prior to the call to
Committee.active_committees_count() and keep the existing mock/asserts.
♻️ Duplicate comments (9)
backend/pyproject.toml (1)

105-108: Narrow the suppression to S607 or justify S603.

If dump_data.py doesn’t use shell=True, S603 won’t trigger. Either remove S603 or add a brief justification comment to avoid over-broad security lint suppression.

🔧 Suggested minimal change
 lint.per-file-ignores."**/management/commands/dump_data.py" = [
-  "S603", # https://docs.astral.sh/ruff/rules/subprocess-without-shell-equals-true/,
   "S607", # https://docs.astral.sh/ruff/rules/start-process-with-partial-path/,
 ]

Please confirm the current Ruff rule guidance before deciding:

Ruff rule S603 subprocess-without-shell-equals-true and S607 start-process-with-partial-path guidance; when should they be suppressed?
backend/apps/github/api/internal/queries/issue.py (1)

51-62: Use RowNumber() instead of Rank() to guarantee one issue per author.

Rank() assigns identical ranks to tied rows. If an author has multiple issues with the same created_at timestamp, all will have rank=1 and pass the filter at line 60, violating the documented "unique issues per author" contract at line 30. Use RowNumber() with a deterministic tie-breaker (e.g., id) to ensure exactly one issue per author.

🐛 Proposed fix
-from django.db.models.functions import Rank
+from django.db.models.functions import RowNumber
 
 ...
 
         if distinct:
             queryset = (
                 queryset.annotate(
                     rank=Window(
-                        expression=Rank(),
+                        expression=RowNumber(),
                         partition_by=[F("author_id")],
-                        order_by=F("created_at").desc(),
+                        order_by=[F("created_at").desc(), F("id").desc()],
                     )
                 )
                 .filter(rank=1)
                 .order_by("-created_at")
             )
backend/apps/github/api/internal/queries/pull_request.py (1)

72-83: Add deterministic tie-breaker to ensure one PR per author when created_at timestamps match.

Rank() assigns the same rank to rows with identical created_at values, so multiple PRs from one author with identical timestamps will all pass rank=1, violating the "unique per author" contract. Add id as a secondary sort key.

Suggested fix
             queryset = (
                 queryset.annotate(
                     rank=Window(
                         expression=Rank(),
                         partition_by=[F("author_id")],
-                        order_by=F("created_at").desc(),
+                        order_by=[F("created_at").desc(), F("id").desc()],
                     )
                 )
                 .filter(rank=1)
                 .order_by("-created_at")
             )
backend/apps/owasp/models/common.py (1)

101-104: Return type annotation mismatch.

The method returns a QuerySet from .filter().order_by(), but the annotation declares list[EntityMember]. This was already flagged in a previous review.

backend/apps/owasp/api/internal/nodes/committee.py (1)

33-36: Guard repositories_count when no repository is attached.

Returning 1 unconditionally can overcount when root.owasp_repository is missing.

🛠️ Suggested fix
-        return 1
+        return 1 if root.owasp_repository else 0
backend/entrypoint.fuzz.sh (2)

11-11: Add curl timeouts to prevent hanging.

The curl request lacks timeouts, which can cause the entrypoint to hang indefinitely in CI if the network is slow or unresponsive.

Suggested fix
-CSRF_TOKEN=$(curl -fsSL "$BASE_URL/csrf" | jq -r '.csrftoken')
+CSRF_TOKEN=$(curl -fsSL --connect-timeout 5 --max-time 15 "$BASE_URL/csrf" | jq -r '.csrftoken')

46-48: Quote TEST_FILE variable to prevent word splitting.

The unquoted ${TEST_FILE} variable can cause issues if the path contains spaces or special characters.

Suggested fix
-    pytest ./tests/${TEST_FILE}
+    pytest "./tests/${TEST_FILE}"
 else
-    pytest --log-cli-level=INFO -s ./tests/${TEST_FILE}
+    pytest --log-cli-level=INFO -s "./tests/${TEST_FILE}"
docker-compose/e2e/compose.yaml (1)

22-28: Install wget in the backend Dockerfile or use a Python-based healthcheck.

This issue was previously flagged. The backend image (python:3.13.7-alpine) does not include wget, causing the healthcheck to fail.

backend/apps/owasp/api/internal/queries/member_snapshot.py (1)

34-38: Avoid N+1 count queries by prefetching all M2Ms used in count fields.

commits_count/issues_count/pull_requests_count/messages_count call .count() on M2M relations, which can lead to per-row queries without prefetching. Consider prefetching commits alongside the existing relations for both single and list queries.

🔧 Suggested fix (add commits to prefetch)
-            query = (
-                MemberSnapshot.objects.select_related("github_user")
-                .prefetch_related("issues", "pull_requests", "messages")
-                .filter(github_user=user)
-            )
+            query = (
+                MemberSnapshot.objects.select_related("github_user")
+                .prefetch_related("commits", "issues", "pull_requests", "messages")
+                .filter(github_user=user)
+            )
@@
-        snapshots = MemberSnapshot.objects.all().select_related("github_user")
+        snapshots = (
+            MemberSnapshot.objects.all()
+            .select_related("github_user")
+            .prefetch_related("commits", "issues", "pull_requests", "messages")
+        )

Also applies to: 62-71

🧹 Nitpick comments (34)
.gitignore (1)

51-52: Remove duplicate Snyk ignore entry to reduce noise.

.cursor/rules/snyk_rules.mdc is already listed at Line 6, so the auto-generated block repeats it unnecessarily. Consider removing the earlier entry or the duplicate to keep the ignore list clean.

backend/apps/owasp/api/internal/nodes/snapshot.py (2)

25-28: Add prefetch_related hint for new_chapters to avoid potential N+1 queries.

The new_chapters field is declared as a simple strawberry_django.field() without a prefetch_related hint, unlike the other relationship fields (new_issues, new_projects, new_releases, new_users) which all use prefetch_related in their decorators. This inconsistency could cause N+1 queries when querying chapters with nested fields.

Consider adding a custom resolver with prefetch_related for consistency:

♻️ Suggested change
-    new_chapters: list[ChapterNode] = strawberry_django.field()
+    `@strawberry_django.field`(prefetch_related=["new_chapters"])
+    def new_chapters(self, root: Snapshot) -> list[ChapterNode]:
+        """Resolve new chapters."""
+        return root.new_chapters.order_by("-created_at")

40-53: Consider adding limits to new_projects, new_releases, and new_users for consistency.

Unlike new_issues which has RECENT_ISSUES_LIMIT, these resolvers return unbounded results. If snapshots can have many related entities, this could lead to performance issues or large response payloads.

♻️ Example with limits (if applicable)
+RECENT_PROJECTS_LIMIT = 100
+RECENT_RELEASES_LIMIT = 100
+RECENT_USERS_LIMIT = 100
+
 `@strawberry_django.field`(prefetch_related=["new_projects"])
 def new_projects(self, root: Snapshot) -> list[ProjectNode]:
     """Resolve new projects."""
-    return root.new_projects.order_by("-created_at")
+    return root.new_projects.order_by("-created_at")[:RECENT_PROJECTS_LIMIT]
backend/apps/github/api/internal/queries/milestone.py (1)

79-82: Consider extracting the capped limit for clarity.

The walrus operator inside the ternary condition works but slightly obscures the flow. A minor readability improvement:

♻️ Suggested refactor
-        return (
-            milestones.order_by("-created_at")[:limit]
-            if (limit := min(limit, MAX_LIMIT)) > 0
-            else []
-        )
+        capped_limit = min(limit, MAX_LIMIT)
+        if capped_limit <= 0:
+            return []
+        return list(milestones.order_by("-created_at")[:capped_limit])
backend/tests/apps/github/api/internal/queries/milestone_test.py (1)

91-107: Consider adding tests for limit edge cases.

The implementation introduces MAX_LIMIT capping and handles limit <= 0 by returning an empty list. These behaviors should be tested.

💡 Suggested additional test cases
def test_recent_milestones_limit_exceeds_max(self, get_queryset):
    """Test that limit is capped at MAX_LIMIT."""
    from apps.github.api.internal.queries.milestone import MAX_LIMIT

    with patch.object(Milestone, "objects", new_callable=Mock) as mock_manager:
        mock_manager.all.return_value = get_queryset

        MilestoneQuery().recent_milestones(
            distinct=False,
            limit=MAX_LIMIT + 500,
            login=None,
            organization=None,
            state=None,
        )

        # Verify slice uses MAX_LIMIT, not the provided limit
        get_queryset.__getitem__.assert_called_with(slice(None, MAX_LIMIT))

def test_recent_milestones_zero_limit_returns_empty(self):
    """Test that limit <= 0 returns an empty list."""
    result = MilestoneQuery().recent_milestones(
        distinct=False,
        limit=0,
        login=None,
        organization=None,
        state=None,
    )
    assert result == []
backend/apps/api/rest/v0/event.py (1)

97-117: Unused BAD_REQUEST response mapping and type annotation mismatch.

The HTTPStatus.BAD_REQUEST: ValidationErrorSchema is declared in the response mapping (line 102), but the handler never returns this status—it only returns the event or NOT_FOUND. Additionally, the return type annotation EventDetail | EventError doesn't include ValidationErrorSchema.

If this is for framework-level path validation (though str:event_id accepts any string), consider documenting that intent. Otherwise, if explicit validation will be added later, the return type should be updated to include Response or the validation schema.

💡 Option 1: Remove unused BAD_REQUEST if not needed
 `@router.get`(
     "/{str:event_id}",
     description="Retrieve an event details.",
     operation_id="get_event",
     response={
-        HTTPStatus.BAD_REQUEST: ValidationErrorSchema,
         HTTPStatus.NOT_FOUND: EventError,
         HTTPStatus.OK: EventDetail,
     },
     summary="Get event",
 )
💡 Option 2: Update return type if BAD_REQUEST is intentional
 def get_event(
     request: HttpRequest,
     event_id: str = Path(..., example="owasp-global-appsec-usa-2025-washington-dc"),
-) -> EventDetail | EventError:
+) -> EventDetail | EventError | Response:
backend/tests/apps/github/api/internal/nodes/milestone_test.py (1)

43-45: Guard against missing field metadata before dereference.

_get_field_by_name can return None; dereferencing base_resolver then raises a generic AttributeError. Add a local assertion (or make the helper raise) to fail with a clear message. Apply this pattern to all resolver tests.

♻️ Suggested tweak (apply similarly to each resolver test)
-        field = self._get_field_by_name("organization_name", MilestoneNode)
-        result = field.base_resolver.wrapped_func(None, mock_milestone)
+        field = self._get_field_by_name("organization_name", MilestoneNode)
+        assert field is not None, "organization_name field not found on MilestoneNode"
+        result = field.base_resolver.wrapped_func(None, mock_milestone)

Also applies to: 54-56, 63-65, 73-75, 83-85, 94-96, 103-105

backend/tests/apps/github/api/internal/nodes/organization_test.py (1)

74-75: Add explicit asserts when fetching fields for clearer failures.
Right now a missing field would throw an attribute error; an assert provides a clearer test failure message.

♻️ Suggested tweak
-        field = self._get_field_by_name("stats", OrganizationNode)
-        result = field.base_resolver.wrapped_func(None, mock_organization)
+        field = self._get_field_by_name("stats", OrganizationNode)
+        assert field is not None
+        result = field.base_resolver.wrapped_func(None, mock_organization)
@@
-        field = self._get_field_by_name("stats", OrganizationNode)
-        result = field.base_resolver.wrapped_func(None, mock_organization)
+        field = self._get_field_by_name("stats", OrganizationNode)
+        assert field is not None
+        result = field.base_resolver.wrapped_func(None, mock_organization)
@@
-        field = self._get_field_by_name("url", OrganizationNode)
-        result = field.base_resolver.wrapped_func(None, mock_organization)
+        field = self._get_field_by_name("url", OrganizationNode)
+        assert field is not None
+        result = field.base_resolver.wrapped_func(None, mock_organization)

Also applies to: 110-111, 126-127

backend/apps/mentorship/api/internal/queries/mentorship.py (1)

108-132: Verify empty-list semantics for missing/unenrolled mentees.

Returning [] for missing entities or unenrolled mentees removes the distinction from “no issues”; please confirm this is the intended API behavior. Also consider adding program/module/mentee identifiers to the warning messages for easier debugging.

backend/tests/apps/github/models/repository_contributor_test.py (1)

268-276: Consider removing unused id field from mock data.

The mock queryset return at line 274 includes "id": "test-id", but get_top_contributors derives the id from tc["user__login"], not from a separate id field in the aggregation result. This extraneous field in the mock is ignored and could be misleading to future maintainers.

♻️ Proposed fix
                     mock_order_by.__getitem__.return_value = [
                         {
                             "total_contributions": 10,
                             "user__avatar_url": MOCK_AVATAR_URL,
                             "user__login": "testuser",
                             "user__name": name,
-                            "id": "test-id",
                         }
                     ]
backend/apps/github/api/internal/nodes/user.py (2)

30-33: Prefetched queryset is ignored due to .filter() on related manager.

Calling root.user_badges.filter(is_active=True) on a prefetched relation creates a new query instead of filtering the prefetched cache. To leverage the prefetch, filter the prefetched objects in Python:

Proposed fix
     `@strawberry_django.field`(prefetch_related=["user_badges"])
     def badge_count(self, root: User) -> int:
         """Resolve badge count."""
-        return root.user_badges.filter(is_active=True).count()
+        return sum(1 for ub in root.user_badges.all() if ub.is_active)

Alternatively, use a Prefetch object with a filtered queryset to push the filter to the prefetch:

from django.db.models import Prefetch

`@strawberry_django.field`(
    prefetch_related=[
        Prefetch("user_badges", queryset=UserBadge.objects.filter(is_active=True))
    ]
)
def badge_count(self, root: User) -> int:
    """Resolve badge count."""
    return root.user_badges.all().count()

35-48: Prefetch is bypassed by .filter().select_related() chain.

The prefetch_related=["user_badges__badge"] hint is ineffective because .filter() and .select_related() on the related manager create a fresh query. Use a Prefetch object to apply the filter and select at prefetch time:

Proposed fix using Prefetch
from django.db.models import Prefetch
from apps.nest.models.user_badge import UserBadge  # adjust import as needed

`@strawberry_django.field`(
    prefetch_related=[
        Prefetch(
            "user_badges",
            queryset=UserBadge.objects.filter(is_active=True)
            .select_related("badge")
            .order_by("badge__weight", "badge__name"),
        )
    ]
)
def badges(self, root: User) -> list[BadgeNode]:
    """Return user badges."""
    return [user_badge.badge for user_badge in root.user_badges.all()]
backend/apps/github/api/internal/nodes/repository.py (2)

69-72: The limit parameter default duplicates the constant pattern used elsewhere.

Other resolvers use module-level constants (e.g., RECENT_ISSUES_LIMIT = 5). Consider extracting RECENT_MILESTONES_LIMIT = 5 for consistency, or alternatively, remove the parameter if the limit should always be 5 to match other "recent" patterns.


46-50: Consider consistency with similar operations that don't use prefetch_related for slicing.

The prefetch_related=["issues"] decorator fetches all related issues, though the resolver returns only the top 5 by date. While this pattern is used elsewhere in the codebase (snapshot.py), project.py implements an identical operation (recent_issues with slicing) without prefetch_related. With DjangoOptimizerExtension enabled, the optimizer may handle query optimization automatically. Consider whether the decorator hint is necessary or if the pattern can be simplified to match the approach in project.py.

backend/apps/github/api/internal/nodes/issue.py (3)

29-36: Remove redundant "repository" from select_related.

The select_related=["repository__organization", "repository"] hint is redundant—"repository__organization" already includes "repository" in the join path. Simplify to match the pattern in PullRequestNode.

♻️ Suggested fix
-    `@strawberry_django.field`(select_related=["repository__organization", "repository"])
+    `@strawberry_django.field`(select_related=["repository__organization"])

48-51: Potential inefficiency: filter() on prefetched relation triggers a new query.

The prefetch_related=["pull_requests"] hint caches all related pull requests, but .filter(state="closed", merged_at__isnull=False).exists() executes a new database query rather than filtering the prefetched cache. Consider using Python-side filtering on the prefetched data or removing the prefetch hint if a database query is acceptable.

♻️ Option A: Filter prefetched data in Python
     `@strawberry_django.field`(prefetch_related=["pull_requests"])
     def is_merged(self, root: Issue) -> bool:
         """Return True if this issue has at least one merged pull request."""
-        return root.pull_requests.filter(state="closed", merged_at__isnull=False).exists()
+        return any(
+            pr.state == "closed" and pr.merged_at is not None
+            for pr in root.pull_requests.all()
+        )
♻️ Option B: Remove prefetch hint and use database query
-    `@strawberry_django.field`(prefetch_related=["pull_requests"])
+    `@strawberry_django.field`
     def is_merged(self, root: Issue) -> bool:
         """Return True if this issue has at least one merged pull request."""
         return root.pull_requests.filter(state="closed", merged_at__isnull=False).exists()

53-61: Remove redundant .select_related("user") inside the resolver.

The prefetch_related=["participant_interests__user"] hint already eagerly loads the user relation via the prefetch path. Calling .select_related("user") inside the resolver is redundant and may cause unexpected query behavior.

♻️ Suggested fix
     `@strawberry_django.field`(prefetch_related=["participant_interests__user"])
     def interested_users(self, root: Issue) -> list[UserNode]:
         """Return all users who have expressed interest in this issue."""
         return [
             interest.user
-            for interest in root.participant_interests.select_related("user").order_by(
-                "user__login"
-            )
+            for interest in root.participant_interests.order_by("user__login")
         ]

Note: If ordering is needed, consider using a Prefetch object with a custom queryset in the decorator to apply the ordering at prefetch time.

backend/apps/owasp/api/internal/nodes/common.py (1)

14-42: Add type annotations to root parameters for clarity and static analysis.

All resolver methods have untyped root parameters. Adding type annotations improves IDE support, documentation, and static analysis. Based on the relevant code snippets from user.py and repository.py, the pattern uses explicit types.

Suggested improvement

Since GenericEntityNode is a base class for multiple entity types, you could either:

  1. Use a Union type or Protocol for the root parameter
  2. Leave it untyped if multiple model types are expected (current approach)

If a common base model exists:

+from apps.owasp.models.common import RepositoryBasedEntityModel
+
 `@strawberry_django.field`(prefetch_related=["entity_members__member"])
-def entity_leaders(self, root) -> list[EntityMemberNode]:
+def entity_leaders(self, root: RepositoryBasedEntityModel) -> list[EntityMemberNode]:
     """Resolve entity leaders."""
     return root.entity_leaders
backend/tests/apps/owasp/api/internal/nodes/common_test.py (2)

18-18: Inconsistent test invocation patterns.

The entity_leaders test (line 18) creates an instance and calls the method, while other tests (lines 27, 36, 45, 54) call the method on the class with None as self. This inconsistency could cause confusion.

Suggested fix: Use consistent instance-based invocation
 def test_leaders_resolver(self):
     """Test leaders returns indexed leaders list."""
     mock_entity = Mock()
     mock_entity.idx_leaders = ["leader1", "leader2"]

-    result = GenericEntityNode.leaders(None, mock_entity)
+    result = GenericEntityNode().leaders(mock_entity)

     assert result == ["leader1", "leader2"]

 def test_related_urls_resolver(self):
     """Test related_urls returns URLs list."""
     mock_entity = Mock()
     mock_entity.related_urls = ["https://example.com", "https://test.com"]

-    result = GenericEntityNode.related_urls(None, mock_entity)
+    result = GenericEntityNode().related_urls(mock_entity)

     assert result == ["https://example.com", "https://test.com"]

 def test_updated_at_resolver(self):
     """Test updated_at returns indexed timestamp."""
     mock_entity = Mock()
     mock_entity.idx_updated_at = 1234567890.0

-    result = GenericEntityNode.updated_at(None, mock_entity)
+    result = GenericEntityNode().updated_at(mock_entity)

     assert math.isclose(result, 1234567890.0)

 def test_url_resolver(self):
     """Test url returns indexed URL."""
     mock_entity = Mock()
     mock_entity.idx_url = "https://owasp.org/www-project-example"

-    result = GenericEntityNode.url(None, mock_entity)
+    result = GenericEntityNode().url(mock_entity)

     assert result == "https://owasp.org/www-project-example"

Also applies to: 27-27, 36-36, 45-45, 54-54


31-38: Test coverage for top_contributors resolver is missing.

The GenericEntityNode class has six resolvers (entity_leaders, leaders, related_urls, top_contributors, updated_at, url), but there's no test for top_contributors. Consider adding coverage for completeness.

Suggested test
def test_top_contributors_resolver(self):
    """Test top_contributors returns list of RepositoryContributorNode."""
    mock_entity = Mock()
    mock_entity.idx_top_contributors = [
        {
            "avatar_url": "https://example.com/avatar.png",
            "contributions_count": 10,
            "id": "1",
            "login": "contributor1",
            "name": "Contributor One",
        }
    ]

    result = GenericEntityNode().top_contributors(mock_entity)

    assert len(result) == 1
    assert result[0].login == "contributor1"
    assert result[0].contributions_count == 10
backend/apps/owasp/models/project_health_metrics.py (1)

173-181: Consider removing unnecessary select_related in subquery.

The select_related("project") on line 177 is not needed in the inner subquery since only id values are extracted via values_list("id", flat=True). Removing it would reduce query overhead.

♻️ Suggested optimization
         return ProjectHealthMetrics.objects.filter(
             id__in=ProjectHealthMetrics.objects.filter(project__is_active=True)
-            .select_related("project")
             .order_by("project_id", "-nest_created_at")
             .distinct("project_id")
             .values_list("id", flat=True)
         )
backend/apps/owasp/api/internal/nodes/project.py (1)

87-104: Consider prefetching repositories for the milestones query.

The recent_milestones resolver accesses root.repositories.all() without a prefetch_related=["repositories"] hint on the decorator. If this field is queried alongside other fields that don't prefetch repositories, it may result in an additional query.

♻️ Suggested improvement
-    `@strawberry_django.field`
+    `@strawberry_django.field`(prefetch_related=["repositories"])
     def recent_milestones(self, root: Project, limit: int = 5) -> list[MilestoneNode]:
backend/.env.fuzz.example (2)

15-15: Remove quotes from IP address value.

The quotes around 127.0.0.1 may be interpreted literally by some environment variable parsers, resulting in the value "127.0.0.1" instead of 127.0.0.1. This aligns with the static analysis hint flagging the quote character.

Suggested fix
-DJANGO_PUBLIC_IP_ADDRESS="127.0.0.1"
+DJANGO_PUBLIC_IP_ADDRESS=127.0.0.1

20-20: Consider adding a note about generating a proper secret key.

DJANGO_SECRET_KEY=None is fine for an example file, but users copying this file might forget to set a real secret key. Consider adding a comment or using a more obviously placeholder value like DJANGO_SECRET_KEY=change-me-to-a-random-string.

backend/tests/fuzz/rest_test.py (1)

8-10: Consider using pytest fixtures for environment validation.

Raising ValueError at module import time will cause a less informative error in pytest output. Using a pytest fixture or pytest.skip() would provide cleaner test collection behavior and clearer error messages.

Alternative using pytest fixture
import pytest

`@pytest.fixture`(scope="module")
def rest_config():
    rest_url = os.getenv("REST_URL")
    csrf_token = os.getenv("CSRF_TOKEN")
    if not rest_url or not csrf_token:
        pytest.skip("REST_URL and CSRF_TOKEN must be set in the environment.")
    return rest_url, csrf_token
docker/backend/Dockerfile.fuzz (1)

26-29: Remove redundant chmod command.

Line 29 (chmod +x ./entrypoint.sh) is redundant since line 26 already sets --chmod=755 during the COPY operation.

Suggested fix
 COPY --chown=root:root --chmod=755 ./entrypoint.fuzz.sh ./entrypoint.sh
 COPY --chown=root:root --chmod=755 tests/fuzz tests
 
-RUN chmod +x ./entrypoint.sh
-
 USER owasp
docker-compose/fuzz/compose.yaml (1)

48-53: Consider adding --no-owner and --no-privileges to pg_restore.

The pg_restore command may fail or produce warnings if the dump file contains ownership/privileges from a different user. Adding these flags ensures compatibility when restoring across different environments.

♻️ Suggested fix
     command: >
       sh -c '
         echo "Loading data from dump..." &&
-        pg_restore -h db -U $$POSTGRES_USER -d $$POSTGRES_DB /data/nest.dump &&
+        pg_restore -h db -U $$POSTGRES_USER -d $$POSTGRES_DB --no-owner --no-privileges /data/nest.dump &&
         echo "Data loading completed."
       '
backend/apps/common/middlewares/block_null_characters.py (1)

22-31: Improve readability with tuple unpacking.

The current iteration pattern for values in request.GET.lists() for value in values[1] works but is confusing since values is actually a (key, [values]) tuple. Using tuple unpacking improves clarity.

♻️ Suggested refactor
         if (
             NULL_CHARACTER in request.path
             or NULL_CHARACTER in request.path_info
             or any(
-                NULL_CHARACTER in value for values in request.GET.lists() for value in values[1]
+                NULL_CHARACTER in value for _, vals in request.GET.lists() for value in vals
             )
             or any(
-                NULL_CHARACTER in value for values in request.POST.lists() for value in values[1]
+                NULL_CHARACTER in value for _, vals in request.POST.lists() for value in vals
             )
         ):
backend/.env.e2e.example (1)

15-15: Remove unnecessary quotes around IP address.

The quotes around "127.0.0.1" are inconsistent with other values in this file and may cause issues depending on how the value is parsed.

♻️ Suggested fix
-DJANGO_PUBLIC_IP_ADDRESS="127.0.0.1"
+DJANGO_PUBLIC_IP_ADDRESS=127.0.0.1
.github/workflows/update-nest-test-images.yaml (1)

78-89: Inconsistent cache-from configuration.

This step only uses type=registry for cache-from, while other steps in this workflow use both type=gha and type=registry. This inconsistency may result in slower builds when the GHA cache is available.

♻️ Suggested fix for consistency
       - name: Build and push fuzz-test-backend image
         uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83
         with:
-          cache-from: type=registry,ref=owasp/nest:test-fuzz-backend-cache
+          cache-from: |
+            type=gha
+            type=registry,ref=owasp/nest:test-fuzz-backend-cache
           cache-to: |
             type=gha,compression=zstd
             type=registry,ref=owasp/nest:test-fuzz-backend-cache
docker-compose/e2e/compose.yaml (1)

82-86: Consider using REDISCLI_AUTH to avoid password warning in logs.

Using redis-cli -a prints a security warning to stderr on every healthcheck (every 5s), polluting logs. Setting REDISCLI_AUTH as an environment variable avoids this.

🔧 Suggested fix
     healthcheck:
       interval: 5s
       retries: 5
-      test: [CMD, redis-cli, -a, $$REDIS_PASSWORD, ping]
+      test: [CMD-SHELL, "REDISCLI_AUTH=$$REDIS_PASSWORD redis-cli ping"]
       timeout: 5s
.github/workflows/run-fuzz-tests.yaml (2)

57-70: Redundant port mapping with host networking.

When using --network host, the -p 9500:9500 flag is ignored since the container shares the host's network namespace directly. Remove the redundant port mapping.

Proposed fix
          docker run -d --rm --name fuzz-nest-backend \
            --env-file backend/.env.fuzz.example \
            --network host \
            -e DJANGO_DB_HOST=localhost \
            -e DJANGO_REDIS_AUTH_ENABLED=False \
            -e DJANGO_REDIS_HOST=localhost \
-           -p 9500:9500 \
            owasp/nest:test-backend-latest \

72-80: The /a endpoint is the Django admin interface; this is intentional.

The endpoint http://localhost:9500/a maps to Django's admin site (defined at ./backend/settings/urls.py:27). This is a common Django convention to place the admin interface at a non-obvious short path for security. The readiness check verifies backend availability by testing if the admin interface responds.

While this approach works, consider implementing a dedicated /health or /healthz endpoint as a more conventional and explicit health check mechanism that doesn't rely on admin functionality.

backend/tests/apps/owasp/api/internal/queries/member_snapshot_test.py (1)

35-50: Assert eager-loading arguments to catch regressions.

These assertions only check call counts; consider validating the exact arguments to ensure the resolver keeps the intended eager-loading paths.

♻️ Suggested tightening of assertions
-            mock_queryset.select_related.assert_called_once()
-            mock_queryset.prefetch_related.assert_called_once()
+            mock_queryset.select_related.assert_called_once_with("github_user")
+            mock_queryset.prefetch_related.assert_called_once_with(
+                "issues",
+                "pull_requests",
+                "messages",
+            )

Comment on lines 55 to 66
if distinct:
latest_release_per_author = (
queryset.filter(author_id=OuterRef("author_id"))
queryset = (
queryset.annotate(
rank=Window(
expression=Rank(),
partition_by=[F("author_id")],
order_by=F("published_at").desc(),
)
)
.filter(rank=1)
.order_by("-published_at")
.values("id")[:1]
)
queryset = queryset.filter(
id__in=Subquery(latest_release_per_author),
).order_by(
"-published_at",
)
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a deterministic tie-breaker to the Window ordering.

When multiple releases from the same author have identical published_at timestamps, Rank() assigns them the same rank, causing multiple releases to pass rank=1. Add F("id") as a secondary sort key to guarantee exactly one release per author.

Proposed fix
         if distinct:
             queryset = (
                 queryset.annotate(
                     rank=Window(
                         expression=Rank(),
                         partition_by=[F("author_id")],
-                        order_by=F("published_at").desc(),
+                        order_by=[F("published_at").desc(), F("id").desc()],
                     )
                 )
                 .filter(rank=1)
                 .order_by("-published_at")
             )
🤖 Prompt for AI Agents
In `@backend/apps/github/api/internal/queries/release.py` around lines 55 - 66,
The Window ordering used when annotating rank can produce non-deterministic ties
when multiple releases by the same author share the same published_at; update
the Window in the distinct branch (the annotate(... Rank() ...
order_by=F("published_at").desc())) to include F("id") as a secondary sort key
(e.g. order_by=[F("published_at").desc(), F("id").desc()]) so Rank() yields a
single rank=1 row per author.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will the same author have two releases that have identical published_at timestamps?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahmedxgouda, you're right that this is quite unlikely in practice. The published_at timestamp typically has microsecond precision in PostgreSQL, and GitHub's API generates distinct timestamps for separate release events.

That said, there are a few theoretical scenarios where duplicates could occur:

  • Bulk data migrations where timestamps might be normalized
  • Database precision mismatches if the field loses precision during storage or conversion
  • Manual data manipulation during testing or maintenance

The tie-breaker is defensive programming to guarantee deterministic results in all cases, but given the low probability, it's not strictly necessary. If you're confident the data won't have duplicate timestamps, the current implementation should work fine.

@@ -1,16 +1,19 @@
"""OWASP event GraphQL queries."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Module docstring refers to "event" instead of "post".

The docstring says "OWASP event GraphQL queries" but this file contains GraphQL queries for the Post model. This appears to be a copy-paste artifact.

✏️ Suggested fix
-"""OWASP event GraphQL queries."""
+"""OWASP post GraphQL queries."""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"""OWASP event GraphQL queries."""
"""OWASP post GraphQL queries."""
🤖 Prompt for AI Agents
In `@backend/apps/owasp/api/internal/queries/post.py` at line 1, Update the module
docstring in post.py to correctly describe the file contents: replace the
incorrect "OWASP event GraphQL queries." text with something like "OWASP post
GraphQL queries." to reflect that this module contains GraphQL queries for the
Post model (module: post.py).

@arkid15r arkid15r merged commit f975598 into main Jan 19, 2026
76 of 80 checks passed
@arkid15r arkid15r deleted the feature/e2e-backend branch January 19, 2026 02:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend backend-tests ci docker Pull requests that update Docker code docs Improvements or additions to documentation frontend makefile

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments