Implement e2e backend instance and fuzz testing for both GraphQL and REST endpoints#3386
Implement e2e backend instance and fuzz testing for both GraphQL and REST endpoints#3386
Conversation
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughAdds E2E and fuzz test CI and compose infrastructure, new backend E2E/fuzz settings and middleware, Schemathesis-based fuzz tests and Dockerfiles, large Strawberry/Django GraphQL refactor (root-based fields, limits, query changes), DB indexes/migrations, dump_data command, and widespread test updates and Makefile targets. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
PR validation failed: No linked issue and no valid closing issue reference in PR description |
There was a problem hiding this comment.
Actionable comments posted: 17
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
backend/apps/github/api/internal/queries/pull_request.py (1)
52-52: Docstring doesn't match implementation.The docstring states "unique pull requests per author and repository," but the implementation at lines 94-96 only partitions by
author_id. Either update the docstring to reflect "unique pull requests per author" or update thepartition_byto includerepository_id.📝 Proposed docstring fix
- distinct (bool): Whether to return unique pull requests per author and repository. + distinct (bool): Whether to return unique pull requests per author.Or if the original intent was to partition by both:
partition_by=[F("author_id")], + partition_by=[F("author_id"), F("repository_id")],backend/apps/owasp/api/internal/queries/member_snapshot.py (1)
48-74: Remove decorator hints frommember_snapshotsmethod.The
@strawberry_django.fielddecorator should not useselect_relatedorprefetch_relatedparameters for methods returning relationship objects (list[MemberSnapshotNode]). Per strawberry-graphql-django conventions, apply these optimizations inside the resolver instead. The current decorator hints also reference nested relationships (github_user__owasp_profile,user_badges__badge) that are not exposed by MemberSnapshotNode's schema, making them unnecessary. Follow the pattern used by the singularmember_snapshotmethod: remove the decorator parameters and rely on the in-resolver optimizations.♻️ Suggested fix
- `@strawberry_django.field`( - select_related=["github_user__owasp_profile", "github_user__user_badges__badge"], - prefetch_related=["issues", "pull_requests", "messages"], - ) + `@strawberry_django.field` def member_snapshots( self, user_login: str | None = None, limit: int = 10 ) -> list[MemberSnapshotNode]:backend/apps/owasp/api/internal/queries/project.py (1)
93-104: Avoidicontains("")which matches everything.If
github_user.nameisNone(or empty),leaders_raw__icontains=""will match all projects, makingis_project_leaderreturnTruefor any existing project. That’s a correctness bug and could grant unintended access.🐛 Proposed fix
- return Project.objects.filter( - Q(leaders_raw__icontains=github_user.login) - | Q(leaders_raw__icontains=(github_user.name or "")) - ).exists() + filters = Q(leaders_raw__icontains=github_user.login) + if github_user.name: + filters |= Q(leaders_raw__icontains=github_user.name) + return Project.objects.filter(filters).exists()
🤖 Fix all issues with AI agents
In `@backend/apps/common/management/commands/dump_data.py`:
- Around line 87-93: The finally block is catching CalledProcessError but
_execute_sql raises psycopg2 exceptions; change the exception handler to catch
psycopg2.Error (or psycopg2.DatabaseError) instead of CalledProcessError, and
log the exception details (e.g., include the exception message variable) in the
warning so failures in _execute_sql within dump_data.py are actually reported;
reference the _execute_sql call and replace the Except CalledProcessError branch
with Except psycopg2.Error as e and include e in the self.stderr.write warning.
- Around line 54-56: The CREATE DATABASE call in dump_data.py currently
interpolates temp_db and DB_NAME via f-strings in self._execute_sql which is
unsafe; update the code that builds the SQL for CREATE DATABASE (the call using
self._execute_sql("postgres", [...])) to construct the statement with
psycopg2.sql.SQL and psycopg2.sql.Identifier (e.g., SQL("CREATE DATABASE {}
TEMPLATE {}").format(Identifier(temp_db), Identifier(DB_NAME))) so identifiers
are properly quoted/escaped before passing to _execute_sql; ensure any other
places that interpolate database names (temp_db, DB_NAME) use the same
sql.Identifier approach.
In `@backend/apps/github/api/internal/nodes/pull_request.py`:
- Around line 38-41: In the url resolver method (def url(self, root:
PullRequest) -> str) remove the redundant fallback and return root.url directly
(i.e., replace "return root.url or \"\"" with "return root.url") so the
PullRequest url field matches other node resolvers like
RepositoryNode/UserNode/ReleaseNode/OrganizationNode that return root.url
without an empty-string fallback.
In `@backend/apps/github/api/internal/queries/issue.py`:
- Around line 72-81: The distinct logic in the
queryset.annotate(rank=Window(...)) currently partitions only by F("author_id")
so it returns one issue per author globally; update the partition_by argument in
the Window expression used in this block (the Rank() window annotation on
queryset in the distinct branch) to include F("repository_id") as well (i.e.,
partition_by=[F("author_id"), F("repository_id")]) so uniqueness is enforced per
author-and-repository as the docstring promises.
In `@backend/apps/github/api/internal/queries/pull_request.py`:
- Around line 91-102: The unit tests for the distinct=True branch in
pull_request_test.py use MagicMock and only assert
annotate()/filter()/order_by() call counts, so they don't validate actual SQL or
DB behavior; add an integration test that runs the distinct branch against a
real test database (using Django TestCase or TransactionTestCase) by invoking
the query construction in
backend/apps/github/api/internal/queries/pull_request.py (the function that
builds the queryset for pull requests) and asserting the returned rows and/or
generated SQL respect the windowed rank filtering (e.g., only the latest per
author), ensuring the Window(Rank()) annotate +
filter(rank=1).order_by("-created_at") path is executed against the DB rather
than mocked.
In `@backend/apps/github/api/internal/queries/release.py`:
- Around line 61-70: The window partition currently groups only by
F("author_id"), which contradicts the docstring that promises unique releases
per author and repository; update the distinct logic where queryset is annotated
with Window/Rank (the block referencing rank=Window(expression=Rank(),
partition_by=[F("author_id")], ...)) to include F("repository_id") in the
partition_by list so the partitioning becomes [F("author_id"),
F("repository_id")] before filtering rank=1.
In `@backend/apps/github/api/internal/queries/repository.py`:
- Around line 40-42: Remove the select_related hints from the
`@strawberry_django.field` decorator in repository.py and instead apply ORM
optimization inside the field's resolver (or rely on DjangoOptimizerExtension).
Concretely, delete the select_related argument from the decorator and in the
resolver for this field use queryset.select_related("organization",
"owner__owasp_profile").prefetch_related("owner__user_badges__badge") (or let
DjangoOptimizerExtension handle it) so reverse/many relationships are prefetched
correctly and Strawberry schema hints are not exposed.
In `@backend/apps/github/api/internal/queries/user.py`:
- Line 40: The decorator currently passes the reverse FK path to select_related
which cannot traverse reverse relations; update the `@strawberry_django.field`
call so that select_related only includes "owasp_profile" and add
prefetch_related with "user_badges__badge" (i.e., use
select_related=["owasp_profile"] and prefetch_related=["user_badges__badge"]) to
correctly prefetch the reverse user_badges relation while keeping owasp_profile
as a select_related.
In `@backend/apps/owasp/api/internal/nodes/committee.py`:
- Around line 18-22: The created_at resolver (def created_at(self, root:
Committee) -> float) can return None via root.idx_created_at which contradicts
its non-nullable float signature; add a guard that returns a default float
(e.g., 0.0) when root.idx_created_at is falsy/None, mirroring the pattern used
by contributors_count/forks_count—ensure the resolver always returns a float
value.
In `@backend/apps/owasp/api/internal/permissions/project_health_metrics.py`:
- Around line 12-22: In has_permission, guard against user.github_user being
None to avoid AttributeError; update the final condition in has_permission to
verify github_user exists before accessing is_owasp_staff (for example check
user.github_user is not None or use getattr(user.github_user, "is_owasp_staff",
False)) so the method returns False when github_user is missing while preserving
behavior in settings.IS_E2E_ENVIRONMENT/IS_FUZZ_ENVIRONMENT.
In `@backend/apps/owasp/api/internal/queries/post.py`:
- Around line 16-19: Update the recent_posts resolver docstring to reflect the
behavior: mention that it returns up to `limit` most recent posts (default 5)
and that `limit` is capped by `MAX_LIMIT`; locate the docstring on the
`recent_posts` method (decorated with `@strawberry_django.field`) which calls
`Post.recent_posts()` and refers to `MAX_LIMIT` and adjust the text so it no
longer claims a fixed "5" posts.
In `@backend/apps/owasp/api/internal/views/permissions.py`:
- Around line 11-15: Add a startup assertion inside the Production configuration
class to prevent accidentally enabling E2E/fuzz auth bypass in production: in
the Production config (the settings class used for production) override __init__
(call super().__init__(*args, **kwargs)) and then assert not
self.IS_E2E_ENVIRONMENT and not self.IS_FUZZ_ENVIRONMENT (or raise RuntimeError
with a clear message) so startup fails if those flags are true; reference the
Production config class and the IS_E2E_ENVIRONMENT / IS_FUZZ_ENVIRONMENT
attributes when making this change.
In `@backend/apps/owasp/models/project_health_metrics.py`:
- Around line 149-152: The property project_requirements currently calls
ProjectHealthRequirements.objects.get(level=self.project.level) which will raise
if no record exists; change it to a safe lookup that returns None when missing
(e.g., use
ProjectHealthRequirements.objects.filter(level=self.project.level).first() or
catch ProjectHealthRequirements.DoesNotExist and return None) so the annotated
return type ProjectHealthRequirements | None is honored and the other accessors
can continue to check if self.project_requirements before use.
In `@backend/entrypoint.fuzz.sh`:
- Around line 10-49: The curl call that sets CSRF_TOKEN should include connect
and overall timeouts to avoid hanging in CI and fail fast on network issues
(e.g., add appropriate --connect-timeout and --max-time options to the curl
invocation that produces CSRF_TOKEN); also protect the pytest invocations from
word-splitting/globbing by quoting the TEST_FILE expansion in both branches
(wrap the ./tests/${TEST_FILE} operand in quotes) so filenames with spaces or
special chars are handled safely.
In `@backend/pyproject.toml`:
- Around line 105-108: Update the per-file ruff ignores for
lint.per-file-ignores."**/management/commands/dump_data.py": remove the
unnecessary "S603" entry (since dump_data.py does not use shell=True) and keep
only "S607", or if you prefer to keep S603 for future caution, add an inline
comment next to the ignore explaining why it's retained; reference the ignore
list key "lint.per-file-ignores.\"**/management/commands/dump_data.py\"" and the
rule IDs "S603" and "S607" when making the change so reviewers can see you
either narrowed the suppression to S607 or documented the justification for
S603.
In `@docker-compose/e2e/compose.yaml`:
- Around line 22-28: The healthcheck for the backend service uses wget but the
backend image (python:3.13.7-alpine) doesn't include it, causing the check to
always fail; either install wget in the backend Dockerfile by adding a package
install (e.g., apk add --no-cache wget) so the healthcheck's test: wget --spider
http://backend:9000/a/ succeeds, or replace the healthcheck block
(healthcheck/test) with a Python-based check that performs an HTTP GET to
http://backend:9000/a/ (using python -c 'import sys,urllib.request;
urllib.request.urlopen("http://backend:9000/a/")' and exiting nonzero on
failure) so the backend service can become healthy.
In `@frontend/.env.e2e.example`:
- Around line 1-16: Reorder the env keys so NEXTAUTH_* entries appear before
NEXT_PUBLIC_* to satisfy dotenv-linter ordering: move NEXTAUTH_SECRET and
NEXTAUTH_URL to above the NEXT_PUBLIC_API_URL (or above the first NEXT_PUBLIC_*
block), keeping all other keys unchanged; ensure the file still contains
NEXT_SERVER_* entries and secrets unchanged, or alternatively add a
dotenv-linter exemption comment if you prefer to keep the current order.
🧹 Nitpick comments (34)
backend/apps/mentorship/api/internal/queries/mentorship.py (2)
53-88: Approve changes with a minor consistency suggestion.The refactor to return
Noneinstead of raising exceptions is appropriate. However, consider addingexc_info=Trueto the exception handler at lines 85-88 for consistency withprogram.pyandmodule.py, which include the traceback for debugging.Optional: Add exc_info for consistency
except (Module.DoesNotExist, GithubUser.DoesNotExist, Mentee.DoesNotExist) as e: message = f"Mentee details not found: {e}" - logger.warning(message) + logger.warning(message, exc_info=True) return None
129-132: Same consistency suggestion for this exception handler.Consider adding
exc_info=Truehere as well.Optional: Add exc_info for consistency
except (Module.DoesNotExist, GithubUser.DoesNotExist, Mentee.DoesNotExist) as e: message = f"Mentee issues not found: {e}" - logger.warning(message) + logger.warning(message, exc_info=True) return []backend/apps/api/rest/v0/organization.py (1)
82-82: Minor documentation inconsistency (pre-existing).The endpoint description (line 82) says "Retrieve project details" and the docstring (line 96) says "Get project", but this endpoint handles organizations. Consider fixing for clarity.
📝 Suggested fix
`@router.get`( "/{str:organization_id}", - description="Retrieve project details.", + description="Retrieve organization details.", operation_id="get_organization",def get_organization( request: HttpRequest, organization_id: str = Path(example="OWASP"), ) -> OrganizationDetail | OrganizationError: - """Get project.""" + """Get organization."""Also applies to: 96-96
backend/apps/api/rest/v0/__init__.py (2)
61-82: E2E and Fuzz settings are nearly identical.These two blocks differ only in the
"description"string. Consider consolidating them:elif settings.IS_E2E_ENVIRONMENT or settings.IS_FUZZ_ENVIRONMENT: env_name = "E2E" if settings.IS_E2E_ENVIRONMENT else "Fuzz" api_settings_customization = { "auth": None, "servers": [{"description": env_name, "url": settings.SITE_URL}], "throttle": [], }That said, keeping them separate follows the existing pattern (e.g.,
IS_LOCAL_ENVIRONMENT) and allows for independent future changes—so this is purely optional.
107-114: Add type hints to the exception handler.The handler is missing type annotations for better maintainability and IDE support.
♻️ Proposed fix
+from django.http import HttpRequest, HttpResponse + `@api.exception_handler`(ValidationError) -def validation_exception_handler(request, exc): +def validation_exception_handler(request: HttpRequest, exc: ValidationError) -> HttpResponse: """Handle validation exceptions.""" return api.create_response( request, {"message": "Invalid request", "errors": exc.errors}, status=400, )backend/tests/apps/api/rest/v0/event_test.py (1)
4-9: Simplify date construction; timezone lookup is redundant.Line 9 captures the current timezone, but the value is immediately discarded by
.date()in Lines 17, 22, and 32. Usingdate(...)directly is simpler and avoids the global timezone dependency.♻️ Proposed simplification
-from datetime import datetime +from datetime import date -import pytest -from django.utils import timezone +import pytest -from apps.api.rest.v0.event import EventDetail -from apps.owasp.models.event import Event as EventModel - -current_timezone = timezone.get_current_timezone() +from apps.api.rest.v0.event import EventDetail +from apps.owasp.models.event import Event as EventModel @@ - EventModel( + EventModel( description="this is a sample event", - end_date=datetime(2023, 6, 15, tzinfo=current_timezone).date(), + end_date=date(2023, 6, 15), key="sample-event", latitude=59.9139, longitude=10.7522, name="sample event", - start_date=datetime(2023, 6, 14, tzinfo=current_timezone).date(), + start_date=date(2023, 6, 14), url="https://github.com/owasp/Nest", ), @@ - EventModel( + EventModel( description=None, end_date=None, key="event-without-end-date", latitude=None, longitude=None, name="event without end date", - start_date=datetime(2023, 7, 1, tzinfo=current_timezone).date(), + start_date=date(2023, 7, 1), url=None, ),Also applies to: 15-33
backend/apps/owasp/migrations/0071_trigram_extension.py (1)
7-13: Verify pg_trgm availability in target DBs.
CREATE EXTENSIONrequires appropriate privileges; please confirm CI/staging/prod roles allow this migration..github/workflows/update-nest-test-images.yaml (1)
77-89: Add GHA cache-from for fuzz image builds.Other image build steps use both GHA and registry caches; this step only pulls registry cache, so the GHA cache you push won’t be reused. Adding
type=ghakeeps caching consistent and speeds CI.♻️ Proposed tweak
- cache-from: type=registry,ref=owasp/nest:test-fuzz-backend-cache + cache-from: | + type=gha + type=registry,ref=owasp/nest:test-fuzz-backend-cachefrontend/Makefile (1)
82-88: Clear Redis volume to avoid cross-run cache bleed.Only the DB volume is removed; the cache volume persists and can leak state between runs. Consider clearing
nest-e2e_e2e-cache-dataas well.♻️ Suggested cleanup
test-frontend-e2e: `@docker` container rm -f e2e-nest-db >/dev/null 2>&1 || true `@docker` volume rm -f nest-e2e_e2e-db-data >/dev/null 2>&1 || true + `@docker` volume rm -f nest-e2e_e2e-cache-data >/dev/null 2>&1 || true `@DOCKER_BUILDKIT`=1 \ docker compose --project-name nest-e2e -f docker-compose/e2e/compose.yaml up --build --remove-orphans --abort-on-container-exit db cache backend data-loaderdocker/backend/Dockerfile.fuzz (1)
26-29: Redundantchmod +xafter COPY with--chmod=755.Line 26 already sets executable permissions via
--chmod=755, making line 29 unnecessary.🔧 Suggested fix
COPY --chown=root:root --chmod=755 ./entrypoint.fuzz.sh ./entrypoint.sh COPY --chown=root:root --chmod=755 tests/fuzz tests -RUN chmod +x ./entrypoint.sh - USER owaspbackend/tests/fuzz/rest_test.py (1)
26-30: Add type hint for consistency with GraphQL test.The
caseparameter lacks a type hint, unliketest_graphql_apiwhich usescase: schemathesis.Case.🔧 Suggested fix
`@schema.parametrize`() -def test_rest_api(case): +def test_rest_api(case: schemathesis.Case) -> None: """Test REST API endpoints.""" logger.info(case.as_curl_command()) case.call_and_validate(headers=HEADERS)backend/tests/fuzz/graphql_test.py (1)
32-33: Consider narrowing the scope of suppressed errors or logging them.Blanket suppression of
GraphQLClientErrormay hide unexpected issues during fuzz testing. Consider logging suppressed errors at DEBUG level for traceability.🔧 Suggested enhancement
+from schemathesis.graphql.checks import GraphQLClientError + `@schema.parametrize`() def test_graphql_api(case: schemathesis.Case) -> None: """Test GraphQL API endpoints.""" logger.info(case.as_curl_command()) - with suppress(GraphQLClientError): - case.call_and_validate(headers=HEADERS) + try: + case.call_and_validate(headers=HEADERS) + except GraphQLClientError as e: + logger.debug("GraphQL client error (expected): %s", e)docker-compose/fuzz/compose.yaml (1)
85-85: Redis password exposed in process list via-aflag.The
redis-cli -a $$REDIS_PASSWORD pingcommand exposes the password in the process list. For a fuzz testing environment this is likely acceptable, but consider using--passfrom a file orREDISCLI_AUTHenvironment variable for production-like setups..github/workflows/run-ci-cd.yaml (1)
288-301: Redundant port mapping with--network host.The
-p 9000:9000flag is ignored when using--network hostsince host networking doesn't use Docker's port mapping. While harmless, removing it would reduce confusion.♻️ Suggested fix
- name: Start Backend in the background run: | docker run -d --rm --name e2e-nest-backend \ --env-file backend/.env.e2e.example \ --network host \ -e DJANGO_DB_HOST=localhost \ -e DJANGO_REDIS_AUTH_ENABLED=False \ -e DJANGO_REDIS_HOST=localhost \ - -p 9000:9000 \ owasp/nest:test-backend-latest \ sh -c ' python manage.py migrate && gunicorn wsgi:application --bind 0.0.0.0:9000 '.github/workflows/run-fuzz-tests.yaml (1)
57-70: Redundant port mapping with--network host.Same as the E2E workflow, the
-p 9500:9500flag is ignored when using--network host.♻️ Suggested fix
- name: Run backend with fuzz environment variables run: | docker run -d --rm --name fuzz-nest-backend \ --env-file backend/.env.fuzz.example \ --network host \ -e DJANGO_DB_HOST=localhost \ -e DJANGO_REDIS_AUTH_ENABLED=False \ -e DJANGO_REDIS_HOST=localhost \ - -p 9500:9500 \ owasp/nest:test-backend-latest \ sh -c ' python manage.py migrate && gunicorn wsgi:application --bind 0.0.0.0:9500 'backend/apps/common/management/commands/dump_data.py (2)
102-106: Type hint mismatch: method returnssql.Composedobjects, notstr.
sql.SQL(...).format(...)returnssql.Composedobjects. The return type hintlist[str]is incorrect and may confuse static analyzers.✏️ Suggested fix
+from psycopg2.sql import Composed + - def _remove_emails(self, tables: list[str]) -> list[str]: + def _remove_emails(self, tables: list[str]) -> list[Composed]: return [ sql.SQL("UPDATE {table} SET email = '';").format(table=sql.Identifier(table)) for table in tables ]
108-130: Connection may leak on exception; use context manager.If an exception occurs during query execution,
connection.close()is never called. Use the connection as a context manager or wrap in try/finally.Additionally, the
sql_queriesparameter accepts bothstrandsql.Composedobjects but the type hint only specifiesstr.♻️ Proposed fix
+from typing import Union +from psycopg2.sql import Composable + def _execute_sql( self, dbname: str, - sql_queries: list[str], + sql_queries: list[Union[str, Composable]], ): - connection = connect( + with connect( dbname=dbname, user=DB_USER, password=DB_PASSWORD, host=DB_HOST, port=DB_PORT, - ) - connection.autocommit = True - - rows = [] - with connection.cursor() as cursor: - for sql_query in sql_queries: - cursor.execute(sql_query) - with contextlib.suppress(ProgrammingError): - rows.extend(cursor.fetchall()) - connection.close() - - return rows + ) as connection: + connection.autocommit = True + rows = [] + with connection.cursor() as cursor: + for sql_query in sql_queries: + cursor.execute(sql_query) + with contextlib.suppress(ProgrammingError): + rows.extend(cursor.fetchall()) + return rowsbackend/Makefile (1)
124-125: Consider validatingAPP_NAMEis set.If
APP_NAMEis not provided, the command will run with an empty argument, potentially causing a confusing Django error. Consider adding validation or a usage hint.✏️ Optional enhancement
migrations-empty: +ifndef APP_NAME + $(error APP_NAME is required. Usage: make migrations-empty APP_NAME=<app_name>) +endif `@CMD`="python manage.py makemigrations --empty $(APP_NAME)" $(MAKE) exec-backend-commandbackend/apps/github/models/repository.py (1)
175-177: Keeprecent_milestonesreturn type consistent.
Returning[]changes the type from QuerySet to list, which can break callers that chain queryset methods. Consider returning an empty QuerySet instead.♻️ Proposed adjustment
- return self.milestones.order_by("-created_at") if self.pk else [] + if not self.pk: + return self.milestones.model.objects.none() + return self.milestones.order_by("-created_at")backend/tests/apps/owasp/models/committee_test.py (1)
28-33: Consider clearing thelru_cacheto ensure test isolation.The
active_committees_countmethod uses@lru_cache, which persists across test runs in the same process. If multiple parameterized values are tested or other tests call this method first, the cached result may be returned instead of the mocked value.♻️ Suggested fix to clear the cache
`@pytest.mark.parametrize`( ("value"), [ 23, ], ) def test_active_committees_count(self, value): + Committee.active_committees_count.cache_clear() with patch.object(Committee.objects, "filter") as mock_filter: mock_filter.return_value.count.return_value = value count = Committee.active_committees_count() assert count == value mock_filter.assert_called_once_with(has_active_repositories=True)backend/tests/apps/owasp/models/chapter_test.py (1)
109-124: Consider clearinglru_cachebetween tests.The
active_chapters_countmethod uses@lru_cache, which persists across test runs within the same process. This could cause test pollution if other tests call this method.Consider adding cache clearing in test setup or using
Chapter.active_chapters_count.cache_clear()to ensure test isolation.♻️ Suggested fix
`@pytest.mark.parametrize`( ("value"), [ 42, ], ) def test_active_chapters_count(self, value): + Chapter.active_chapters_count.cache_clear() with patch.object(Chapter.objects, "filter") as mock_filter: mock_filter.return_value.count.return_value = value assert Chapter.active_chapters_count() == value mock_filter.assert_called_once_with( is_active=True, latitude__isnull=False, longitude__isnull=False, owasp_repository__is_empty=False, )backend/tests/apps/owasp/api/internal/queries/member_snapshot_test.py (1)
35-50: Tighten eager‑loading assertions to lock in the intended relations.Right now the test only checks call counts. Asserting the exact
select_related/prefetch_relatedargs (and filter args) makes regressions easier to catch.♻️ Suggested test hardening
- # select_related and prefetch_related are called, then filter twice - mock_queryset.select_related.assert_called_once() - mock_queryset.prefetch_related.assert_called_once() - assert mock_queryset.filter.call_count == 2 + # select_related and prefetch_related are called, then filter twice + mock_queryset.select_related.assert_called_once_with("github_user") + mock_queryset.prefetch_related.assert_called_once_with( + "issues", "pull_requests", "messages" + ) + mock_queryset.filter.assert_any_call(github_user=mock_user) + mock_queryset.filter.assert_any_call(start_at__year=2025)backend/tests/apps/owasp/api/internal/nodes/chapter_test.py (1)
77-78: Add an explicit null-guard before invoking the resolver.This makes failures clearer if the field lookup ever changes.
💡 Suggested tweak
field = self._get_field_by_name("contribution_stats", ChapterNode) +assert field is not None result = field.base_resolver.wrapped_func(None, instance)backend/apps/owasp/models/project.py (1)
222-225: Consider prefetch-friendly access for last_health_metrics.Ordering on the related manager issues a query per project and bypasses any prefetch cache; if this property is used in lists, consider a
Prefetch(..., to_attr=...)or an annotation-based approach for the latest metric.backend/tests/apps/owasp/api/internal/nodes/member_snapshot_test.py (1)
35-36: Guard against missing field lookups to avoid AttributeError.
_get_field_by_namecan returnNone, which would makefield.base_resolvercrash with a less clear failure. Consider asserting the field exists (or updating the helper to raise a clearer error) before invocation.🔧 Suggested tweak (apply similarly to each resolver test)
- field = self._get_field_by_name("commits_count", MemberSnapshotNode) - result = field.base_resolver.wrapped_func(None, mock_snapshot) + field = self._get_field_by_name("commits_count", MemberSnapshotNode) + assert field is not None + result = field.base_resolver.wrapped_func(None, mock_snapshot)Also applies to: 46-47, 56-57, 66-67, 76-77, 86-87
backend/tests/apps/github/api/internal/nodes/issue_test.py (1)
45-47: Add a guard for missing fields to improve test failure clarity.If
_get_field_by_namereturnsNone, the failure will be an attribute error instead of a clear “field not found.” Consider asserting the field exists before invokingbase_resolver.💡 Suggested tweak (apply to each lookup)
- field = self._get_field_by_name("organization_name", IssueNode) - result = field.base_resolver.wrapped_func(None, mock_issue) + field = self._get_field_by_name("organization_name", IssueNode) + assert field is not None + result = field.base_resolver.wrapped_func(None, mock_issue)Also applies to: 56-58, 65-67, 76-78, 85-87
backend/tests/apps/github/api/internal/nodes/pull_request_test.py (1)
39-41: Consider asserting field existence before calling the resolver.This makes schema drift failures more readable than an attribute error.
💡 Suggested tweak (apply to each lookup)
- field = self._get_field_by_name("organization_name", PullRequestNode) - result = field.base_resolver.wrapped_func(None, mock_pr) + field = self._get_field_by_name("organization_name", PullRequestNode) + assert field is not None + result = field.base_resolver.wrapped_func(None, mock_pr)Also applies to: 50-52, 59-61, 70-72, 79-81, 88-90, 97-99
backend/apps/github/api/internal/nodes/organization.py (1)
42-71: Consider addingselect_relatedor query optimization hints.The
statsresolver makes multiple database queries (repositories count, aggregate, and contributors count). While the aggregation logic is correct, the resolver doesn't leveragestrawberry_django.field's query optimization hints like other resolvers in the PR do.If this resolver is called frequently, consider whether any of the related data could benefit from caching or whether the queries could be combined. However, since the queries are already using aggregations efficiently, this is a minor optimization opportunity.
backend/apps/github/api/internal/queries/repository.py (1)
59-67: Consider simplifying the limit validation for readability.The walrus operator with the ternary is functionally correct but somewhat dense. A more explicit approach could improve readability:
♻️ Optional refactor for clarity
- return ( - ( - Repository.objects.filter( - organization__login__iexact=organization, - ).order_by("-stars_count")[:limit] - ) - if (limit := min(limit, MAX_LIMIT)) > 0 - else [] - ) + limit = min(limit, MAX_LIMIT) + if limit <= 0: + return [] + return list( + Repository.objects.filter( + organization__login__iexact=organization, + ).order_by("-stars_count")[:limit] + )backend/apps/github/api/internal/nodes/issue.py (2)
48-51: Avoid per-issue queries inis_merged.
filter(...).exists()bypasses the prefetched collection and triggers a query per issue. Consider an in-memory check (or annotate at the queryset) to leverage prefetch. Please confirm query counts in tests.♻️ Suggested update
- return root.pull_requests.filter(state="closed", merged_at__isnull=False).exists() + return any( + pr.state == "closed" and pr.merged_at is not None + for pr in root.pull_requests.all() + )
53-61: Use prefetched participant_interests to avoid extra queries.
select_related().order_by()forces a new query, so the prefetch hint isn’t used. Sorting the prefetched list keeps it in-memory. Please confirm query counts in tests.♻️ Suggested update
- return [ - interest.user - for interest in root.participant_interests.select_related("user").order_by( - "user__login" - ) - ] + interests = sorted( + root.participant_interests.all(), + key=lambda interest: interest.user.login, + ) + return [interest.user for interest in interests]backend/apps/github/api/internal/nodes/repository.py (3)
46-51: Drop unused prefetch onissues.
order_by(...).[:limit]forces a fresh query per repository, so the prefetch is unused and can add redundant work. Consider removing it (or refactor to a prefetchedto_attrlist). Please confirm query counts in tests.♻️ Suggested update
- `@strawberry_django.field`(prefetch_related=["issues"]) + `@strawberry_django.field`
69-72: Drop unused prefetch onrecent_milestones.The ordered slice triggers a new query per repository, so the prefetch isn’t used. Consider removing it (or refactor to use a prefetched
to_attrlist). Please confirm query counts in tests.♻️ Suggested update
- `@strawberry_django.field`(prefetch_related=["milestones"]) + `@strawberry_django.field`
74-78: Drop unused prefetch onreleases.The ordered slice triggers a new query per repository, so the prefetch isn’t used. Consider removing it (or refactor to use a prefetched
to_attrlist). Please confirm query counts in tests.♻️ Suggested update
- `@strawberry_django.field`(prefetch_related=["releases"]) + `@strawberry_django.field`
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/apps/owasp/models/project_health_metrics.py (1)
212-228: Monthly aggregation ignores year boundaries, causing data duplication when spanning multiple years.
ExtractMonthreturns only the month number (1-12), so records from January 2025 and January 2026 are aggregated together. With a 365-day lookback, this will occur in any query executed after the first year of operation, producing misleading monthly trends.Use
TruncMonthinstead to preserve year-month granularity. However, this requires downstream adjustments:TruncMonthreturns a date object (e.g.,2025-01-01) rather than an integer. You'll need to either extract.monthwhen building the months list, format the date as a string, or update theProjectHealthStatsNodeAPI contract if month numbers alone are insufficient for the UI.
🤖 Fix all issues with AI agents
In `@backend/apps/github/api/internal/queries/issue.py`:
- Around line 18-39: The select_related list in the strawberry_django.field
decorator contains reverse FK traversal paths that should be prefetched instead:
remove "author__user_badges__badge" and "milestone__author__user_badges__badge"
from the select_related argument and add those exact paths to the
prefetch_related list; update the decorator on the field (the
strawberry_django.field call) so select_related only contains forward relations
and prefetch_related includes the two moved reverse FK paths.
In `@backend/apps/github/api/internal/queries/pull_request.py`:
- Around line 19-38: The select_related list in the strawberry_django.field
decorator includes reverse FK traversal strings that must be prefetched instead;
remove "author__user_badges__badge" and "milestone__author__user_badges__badge"
from the select_related argument and add them to the prefetch_related argument
so they are loaded via prefetching (keep the existing strings exactly and
preserve the other select_related/prefetch_related entries and ordering).
In `@backend/apps/github/api/internal/queries/release.py`:
- Around line 18-24: Remove the reverse FK path "author__user_badges__badge"
from the strawberry_django.field select_related list and instead add it to
prefetch_related so the reverse relation is prefetched; specifically update the
decorator on the field (the strawberry_django.field call) to keep the existing
select_related entries ("author__owasp_profile", "repository__organization") but
move "author__user_badges__badge" into a prefetch_related parameter (or use a
Prefetch for "author__user_badges" with select_related("badge") if you want to
prefetch badges efficiently).
In `@backend/apps/owasp/api/internal/nodes/committee.py`:
- Around line 33-36: The repositories_count resolver currently always returns 1
which overcounts when a Committee has no attached repository; update the
repositories_count method on Committee to check root.owasp_repository and return
0 if it's missing, otherwise return the correct count (e.g., 1 when a repository
exists). Ensure you modify the repositories_count function to use the guard "if
not root.owasp_repository: return 0" before returning the repository count.
🧹 Nitpick comments (3)
backend/tests/apps/common/management/commands/dump_data_test.py (1)
19-100: Good test coverage for the happy path.The test properly verifies the core workflow: temp DB creation, email redaction query, pg_dump arguments, and cleanup. A couple of observations:
Lines 44-45: The
# ruff: noqacomment placement inside the function call arguments is unusual but functional.Consider adding tests for error scenarios:
- Empty
table_list(no tables with email column)CalledProcessErrorfrom pg_dump to verifyCommandErroris raised- Database connection failure during cleanup to verify warning is logged
backend/apps/common/management/commands/dump_data.py (1)
122-144: Consider using a context manager for the connection.If an exception occurs during query execution (after line 134 but before line 142), the connection won't be explicitly closed. While Python's garbage collector will eventually clean it up, using a context manager ensures deterministic cleanup.
♻️ Suggested improvement
def _execute_sql( self, dbname: str, sql_queries: list[sql.Composable], ): - connection = connect( + with connect( dbname=dbname, user=DB_USER, password=DB_PASSWORD, host=DB_HOST, port=DB_PORT, - ) - connection.autocommit = True - - rows = [] - with connection.cursor() as cursor: - for sql_query in sql_queries: - cursor.execute(sql_query) - with contextlib.suppress(ProgrammingError): - rows.extend(cursor.fetchall()) - connection.close() - - return rows + ) as connection: + connection.autocommit = True + rows = [] + with connection.cursor() as cursor: + for sql_query in sql_queries: + cursor.execute(sql_query) + with contextlib.suppress(ProgrammingError): + rows.extend(cursor.fetchall()) + return rowsbackend/apps/owasp/models/project_health_metrics.py (1)
173-181: Remove unnecessaryselect_relatedin subquery.The
select_related("project")on line 177 adds a JOIN that serves no purpose since the subquery only extractsidvalues via.values_list("id", flat=True). This adds overhead without benefit.♻️ Suggested fix
return ProjectHealthMetrics.objects.filter( id__in=ProjectHealthMetrics.objects.filter(project__is_active=True) - .select_related("project") .order_by("project_id", "-nest_created_at") .distinct("project_id") .values_list("id", flat=True) )
| @strawberry_django.field | ||
| def repositories_count(self, root: Committee) -> int: | ||
| """Resolve repositories count.""" | ||
| return 1 |
There was a problem hiding this comment.
Guard repositories_count when no repository is attached.
Other resolvers return 0 when root.owasp_repository is missing; returning 1 here can overcount in that case.
🛠️ Suggested fix
- return 1
+ return 1 if root.owasp_repository else 0📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @strawberry_django.field | |
| def repositories_count(self, root: Committee) -> int: | |
| """Resolve repositories count.""" | |
| return 1 | |
| `@strawberry_django.field` | |
| def repositories_count(self, root: Committee) -> int: | |
| """Resolve repositories count.""" | |
| return 1 if root.owasp_repository else 0 |
🤖 Prompt for AI Agents
In `@backend/apps/owasp/api/internal/nodes/committee.py` around lines 33 - 36, The
repositories_count resolver currently always returns 1 which overcounts when a
Committee has no attached repository; update the repositories_count method on
Committee to check root.owasp_repository and return 0 if it's missing, otherwise
return the correct count (e.g., 1 when a repository exists). Ensure you modify
the repositories_count function to use the guard "if not root.owasp_repository:
return 0" before returning the repository count.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
arkid15r
left a comment
There was a problem hiding this comment.
@ahmedxgouda please check new CR's comments.
|
They are all false alerts @arkid15r |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/apps/owasp/api/internal/queries/project.py (1)
77-80: Bug: Empty name causes false positive matches for all projects.When
github_user.nameisNoneor an empty string, the expression(github_user.name or "")evaluates to"", andicontains=""matches all non-nullleaders_rawvalues. This incorrectly identifies such users as leaders of every project.🐛 Proposed fix: Skip the name check when name is empty
- return Project.objects.filter( - Q(leaders_raw__icontains=github_user.login) - | Q(leaders_raw__icontains=(github_user.name or "")) - ).exists() + query = Q(leaders_raw__icontains=github_user.login) + if github_user.name: + query |= Q(leaders_raw__icontains=github_user.name) + return Project.objects.filter(query).exists()
🤖 Fix all issues with AI agents
In `@backend/apps/github/api/internal/queries/issue.py`:
- Around line 52-58: The current queryset.annotate uses
Window(expression=Rank(), partition_by=[F("author_id")],
order_by=F("created_at").desc()) which can give ties and allow multiple issues
per author; replace Rank() with RowNumber() and make the Window order_by
deterministic by adding a tie-breaker (e.g., F("created_at").desc(),
F("id").desc()) so each author partition gets a unique sequential number, then
the existing filter that selects rank == 1 will reliably return a single issue
per author.
In `@backend/apps/github/api/internal/queries/pull_request.py`:
- Around line 72-83: The current Window/Rank used in the queryset annotate
(partition_by=[F("author_id")], order_by=F("created_at").desc()) can return
multiple rows per author when created_at ties occur; update the order_by in the
Window expression to include id as a deterministic secondary key (e.g., order_by
created_at desc then id desc) so Rank() will break ties consistently and the
subsequent .filter(rank=1) returns exactly one PR per author; modify the
annotate call where Rank(), Window, partition_by and order_by are used to
include F("id") (or the PK field) as the second sort criterion.
In `@backend/apps/owasp/api/internal/queries/member_snapshot.py`:
- Around line 34-38: The MemberSnapshot query currently uses
select_related("github_user") and prefetch_related("issues", "pull_requests",
"messages") but the
commits_count/issues_count/pull_requests_count/messages_count properties trigger
M2M .count() calls and cause N+1s; fix by either adding "commits" to the
prefetch_related call on MemberSnapshot.objects (so
prefetch_related("issues","pull_requests","messages","commits")) or, preferably,
switch to annotated counts using Django's Count (e.g.,
annotate(commits_count=Count("commits"), issues_count=Count("issues"), ...)) on
the same query that builds MemberSnapshot.objects to avoid per-row .count()
queries.
In `@backend/apps/owasp/models/common.py`:
- Around line 102-105: The return type for the cached_property entity_leaders is
incorrect: it returns a QuerySet, not a list. Update the annotation on
entity_leaders to return QuerySet[EntityMember] (importing the appropriate
typing/ORM QuerySet if needed) so it matches the actual return value from
self.entity_members.filter(...).order_by(...); alternatively, if a concrete list
is intended change the implementation to evaluate the queryset (e.g.,
list(self.entity_members.filter(...).order_by(...))) and keep the
list[EntityMember] annotation—also check child classes in project.py that slice
this property and ensure they still expect a QuerySet or adjust them if you
convert to a list.
🧹 Nitpick comments (4)
backend/apps/owasp/api/internal/queries/snapshot.py (1)
27-38: LGTM - Capped limit logic is correct.The walrus operator correctly caps
limitatMAX_LIMITbefore slicing, and gracefully returns an empty list for non-positive values. The conditional expression guarantees the assignment happens before the slice is evaluated.One optional readability consideration: the walrus operator inside the conditional can be subtle. A more explicit approach could separate the capping and validation:
♻️ Optional: More explicit limit handling
`@strawberry_django.field` def snapshots(self, limit: int = 12) -> list[SnapshotNode]: """Resolve snapshots.""" + limit = min(limit, MAX_LIMIT) + if limit <= 0: + return [] return ( Snapshot.objects.filter( status=Snapshot.Status.COMPLETED, ).order_by( "-created_at", - )[:limit] - if (limit := min(limit, MAX_LIMIT)) > 0 - else [] + )[:limit] )backend/apps/github/api/internal/nodes/user.py (1)
35-48: Prefetch hint conflicts with subsequent.filter()call.The
prefetch_related=["user_badges__badge"]hint is negated by the.filter()call, which creates a new database query. The.select_related("badge")on line 42 becomes necessary because the prefetch is bypassed.Consider either:
- Remove the
prefetch_relatedhint (since it's not used) and rely onselect_related- Filter in Python to use the prefetch:
Option 2: Filter in Python to leverage prefetch
`@strawberry_django.field`(prefetch_related=["user_badges__badge"]) def badges(self, root: User) -> list[BadgeNode]: """Return user badges.""" - user_badges = ( - root.user_badges.filter( - is_active=True, - ) - .select_related("badge") - .order_by( - "badge__weight", - "badge__name", - ) - ) - return [user_badge.badge for user_badge in user_badges] + return [ + ub.badge + for ub in sorted( + (ub for ub in root.user_badges.all() if ub.is_active), + key=lambda ub: (ub.badge.weight, ub.badge.name), + ) + ]backend/apps/owasp/api/internal/queries/member_snapshot.py (1)
64-68: Simplify user filter to avoid extra query/exception.You can filter by
github_user__logindirectly and drop theUser.objects.getcall.♻️ Proposed simplification
- if user_login: - try: - snapshots = snapshots.filter(github_user=User.objects.get(login=user_login)) - except User.DoesNotExist: - return [] + if user_login: + snapshots = snapshots.filter(github_user__login=user_login)backend/apps/github/api/internal/queries/repository.py (1)
9-10: Prefer an explicit capped-limit variable for readability.The walrus-in-conditional works, but computing the cap once makes the flow easier to read and safer to extend.
♻️ Suggested refactor
- return ( - ( - Repository.objects.filter( - organization__login__iexact=organization, - ).order_by("-stars_count")[:limit] - ) - if (limit := min(limit, MAX_LIMIT)) > 0 - else [] - ) + capped_limit = min(limit, MAX_LIMIT) + if capped_limit <= 0: + return [] + return ( + Repository.objects.filter( + organization__login__iexact=organization, + ).order_by("-stars_count")[:capped_limit] + )Also applies to: 58-65
| queryset = ( | ||
| queryset.annotate( | ||
| rank=Window( | ||
| expression=Rank(), | ||
| partition_by=[F("author_id")], | ||
| order_by=F("created_at").desc(), | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n backend/apps/github/api/internal/queries/issue.py | head -100Repository: OWASP/Nest
Length of output: 2375
🏁 Script executed:
# Check if RowNumber is available in Django
rg "RowNumber" --type pyRepository: OWASP/Nest
Length of output: 36
🌐 Web query:
Django db.models.functions RowNumber window function
💡 Result:
Short answer: Django provides RowNumber as a window function in django.db.models.functions.window which emits ROW_NUMBER() (integer) and is used with Window(...). Example:
-
import and basic usage:
from django.db.models import F, Window
from django.db.models.functions.window import RowNumber
qs = MyModel.objects.annotate(
rn=Window(
expression=RowNumber(),
partition_by=[F('group_field')], # optional
order_by=F('created_at').desc()
)
) -
Notes: RowNumber computes sequential row numbers per partition (or over the whole result if no partition). It requires a database that supports SQL window functions (some older SQLite versions do not). See Django docs and source for details. [1][2][3]
Sources:
[1] Django docs — Database functions (RowNumber).
[2] Django source for django.db.models.functions.window (RowNumber).
[3] Django ticket about older SQLite lacking window support (OperationalError).
🏁 Script executed:
# Check for tests related to recent_issues distinct behavior
fd "test.*issue" -t f | head -20Repository: OWASP/Nest
Length of output: 36
🏁 Script executed:
# Search for test files that reference distinct or recent_issues
rg "recent_issues|distinct.*author" --type py -l | grep -i testRepository: OWASP/Nest
Length of output: 174
🏁 Script executed:
cat -n backend/tests/apps/github/api/internal/queries/issue_test.pyRepository: OWASP/Nest
Length of output: 9137
Use RowNumber() instead of Rank() to ensure one issue per author.
Rank() assigns identical ranks to tied rows, so multiple issues with the same created_at per author will both have rank=1 and both pass the filter (line 60). This violates the documented contract at line 30: "Whether to return unique issues per author". Use RowNumber() instead, which assigns unique sequential numbers per partition, and add a deterministic tie-breaker like F("id") to guarantee consistent results.
Proposed fix
-from django.db.models.functions import Rank
+from django.db.models.functions import RowNumber
@@
- queryset.annotate(
+ queryset.annotate(
rank=Window(
- expression=Rank(),
+ expression=RowNumber(),
partition_by=[F("author_id")],
- order_by=F("created_at").desc(),
+ order_by=[F("created_at").desc(), F("id").desc()],
)
)🤖 Prompt for AI Agents
In `@backend/apps/github/api/internal/queries/issue.py` around lines 52 - 58, The
current queryset.annotate uses Window(expression=Rank(),
partition_by=[F("author_id")], order_by=F("created_at").desc()) which can give
ties and allow multiple issues per author; replace Rank() with RowNumber() and
make the Window order_by deterministic by adding a tie-breaker (e.g.,
F("created_at").desc(), F("id").desc()) so each author partition gets a unique
sequential number, then the existing filter that selects rank == 1 will reliably
return a single issue per author.
| if distinct: | ||
| latest_pull_request_per_author = ( | ||
| queryset.filter( | ||
| author_id=OuterRef("author_id"), | ||
| queryset = ( | ||
| queryset.annotate( | ||
| rank=Window( | ||
| expression=Rank(), | ||
| partition_by=[F("author_id")], | ||
| order_by=F("created_at").desc(), | ||
| ) | ||
| ) | ||
| .order_by( | ||
| "-created_at", | ||
| ) | ||
| .values("id")[:1] | ||
| ) | ||
| queryset = queryset.filter( | ||
| id__in=Subquery(latest_pull_request_per_author), | ||
| ).order_by( | ||
| "-created_at", | ||
| .filter(rank=1) | ||
| .order_by("-created_at") | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd backend/apps/github/api/internal/queries && cat -n pull_request.py | head -100Repository: OWASP/Nest
Length of output: 3445
🏁 Script executed:
cd backend && find . -name "pull_request.py" -type fRepository: OWASP/Nest
Length of output: 272
🏁 Script executed:
rg "distinct" backend/apps/github/api/internal/queries/pull_request.py -A 3 -B 3Repository: OWASP/Nest
Length of output: 847
🏁 Script executed:
rg "Rank\(\)" . --type py -B 5 -A 10Repository: OWASP/Nest
Length of output: 3972
🏁 Script executed:
cd backend && rg -i "test.*recent_pull_requests" --type py -lRepository: OWASP/Nest
Length of output: 165
🏁 Script executed:
cd backend && rg "recent_pull_requests" --type py -B 2 -A 2Repository: OWASP/Nest
Length of output: 13248
🏁 Script executed:
cd backend && find . -path "*/test*" -name "*.py" -exec grep -l "Rank\|Window" {} \;Repository: OWASP/Nest
Length of output: 227
🌐 Web query:
Django Rank() window function behavior ties multiple rows SQL
💡 Result:
Short answer: Django's Rank() maps to SQL RANK(), so rows that tie on the ORDER BY get the same rank and the next rank value is skipped (gaps). Use DenseRank() to avoid gaps, or RowNumber() / add extra ORDER BY tie-breakers if you need exactly one row per rank. [1][2]
🏁 Script executed:
cat -n tests/apps/github/api/internal/queries/pull_request_test.py | grep -A 20 "test_recent_pull_requests_distinct"Repository: OWASP/Nest
Length of output: 145
🏁 Script executed:
find . -name "pull_request_test.py" -type fRepository: OWASP/Nest
Length of output: 367
🏁 Script executed:
cd backend && cat -n tests/apps/github/api/internal/queries/pull_request_test.py | grep -A 20 "test_recent_pull_requests_distinct"Repository: OWASP/Nest
Length of output: 3194
Add deterministic tie-breaker to ensure one PR per author when created_at timestamps match.
Rank() assigns the same rank to rows with identical created_at values, so multiple PRs from one author with identical timestamps will all pass rank=1, violating the "unique per author" contract. Add id as a secondary sort key to deterministically select only one PR per author.
Suggested fix
queryset.annotate(
rank=Window(
expression=Rank(),
partition_by=[F("author_id")],
- order_by=F("created_at").desc(),
+ order_by=[F("created_at").desc(), F("id").desc()],
)
)🤖 Prompt for AI Agents
In `@backend/apps/github/api/internal/queries/pull_request.py` around lines 72 -
83, The current Window/Rank used in the queryset annotate
(partition_by=[F("author_id")], order_by=F("created_at").desc()) can return
multiple rows per author when created_at ties occur; update the order_by in the
Window expression to include id as a deterministic secondary key (e.g., order_by
created_at desc then id desc) so Rank() will break ties consistently and the
subsequent .filter(rank=1) returns exactly one PR per author; modify the
annotate call where Rank(), Window, partition_by and order_by are used to
include F("id") (or the PK field) as the second sort criterion.
| query = ( | ||
| MemberSnapshot.objects.select_related("github_user") | ||
| .prefetch_related("issues", "pull_requests", "messages") | ||
| .filter(github_user=user) | ||
| ) |
There was a problem hiding this comment.
*Avoid N+1 queries for _count fields.
commits_count/issues_count/pull_requests_count/messages_count trigger M2M .count() calls; without prefetching (and with limit up to 1000), this can explode into thousands of queries. Add prefetches (including commits) or switch to annotated counts.
🔧 Proposed fix (prefetch all M2Ms used by count fields)
- query = (
- MemberSnapshot.objects.select_related("github_user")
- .prefetch_related("issues", "pull_requests", "messages")
- .filter(github_user=user)
- )
+ query = (
+ MemberSnapshot.objects.select_related("github_user")
+ .prefetch_related("commits", "issues", "pull_requests", "messages")
+ .filter(github_user=user)
+ )
@@
- snapshots = MemberSnapshot.objects.all().select_related("github_user")
+ snapshots = (
+ MemberSnapshot.objects.all()
+ .select_related("github_user")
+ .prefetch_related("commits", "issues", "pull_requests", "messages")
+ )Also applies to: 62-71
🤖 Prompt for AI Agents
In `@backend/apps/owasp/api/internal/queries/member_snapshot.py` around lines 34 -
38, The MemberSnapshot query currently uses select_related("github_user") and
prefetch_related("issues", "pull_requests", "messages") but the
commits_count/issues_count/pull_requests_count/messages_count properties trigger
M2M .count() calls and cause N+1s; fix by either adding "commits" to the
prefetch_related call on MemberSnapshot.objects (so
prefetch_related("issues","pull_requests","messages","commits")) or, preferably,
switch to annotated counts using Django's Count (e.g.,
annotate(commits_count=Count("commits"), issues_count=Count("issues"), ...)) on
the same query that builds MemberSnapshot.objects to avoid per-row .count()
queries.
| @cached_property | ||
| def entity_leaders(self) -> list[EntityMember]: | ||
| """Return entity's leaders.""" | ||
| return EntityMember.objects.filter( | ||
| entity_id=self.id, | ||
| entity_type=ContentType.objects.get_for_model(self.__class__), | ||
| is_active=True, | ||
| is_reviewed=True, | ||
| role=EntityMember.Role.LEADER, | ||
| ).order_by("order") | ||
| return self.entity_members.filter(role=EntityMember.Role.LEADER).order_by("order") |
There was a problem hiding this comment.
Return type annotation should be QuerySet instead of list.
The method returns self.entity_members.filter(...).order_by(...), which is a QuerySet[EntityMember], not a list[EntityMember]. This type mismatch can cause incorrect type hints for downstream consumers and static analysis tools.
Note: The child class in project.py inherits this issue since slicing a QuerySet also returns a QuerySet.
Suggested fix
+from django.db.models import QuerySet
+
...
`@cached_property`
- def entity_leaders(self) -> list[EntityMember]:
+ def entity_leaders(self) -> QuerySet[EntityMember]:
"""Return entity's leaders."""
return self.entity_members.filter(role=EntityMember.Role.LEADER).order_by("order")Alternatively, if a list is genuinely intended for the API contract, evaluate the QuerySet:
`@cached_property`
def entity_leaders(self) -> list[EntityMember]:
"""Return entity's leaders."""
- return self.entity_members.filter(role=EntityMember.Role.LEADER).order_by("order")
+ return list(self.entity_members.filter(role=EntityMember.Role.LEADER).order_by("order"))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @cached_property | |
| def entity_leaders(self) -> list[EntityMember]: | |
| """Return entity's leaders.""" | |
| return EntityMember.objects.filter( | |
| entity_id=self.id, | |
| entity_type=ContentType.objects.get_for_model(self.__class__), | |
| is_active=True, | |
| is_reviewed=True, | |
| role=EntityMember.Role.LEADER, | |
| ).order_by("order") | |
| return self.entity_members.filter(role=EntityMember.Role.LEADER).order_by("order") | |
| `@cached_property` | |
| def entity_leaders(self) -> QuerySet[EntityMember]: | |
| """Return entity's leaders.""" | |
| return self.entity_members.filter(role=EntityMember.Role.LEADER).order_by("order") |
| @cached_property | |
| def entity_leaders(self) -> list[EntityMember]: | |
| """Return entity's leaders.""" | |
| return EntityMember.objects.filter( | |
| entity_id=self.id, | |
| entity_type=ContentType.objects.get_for_model(self.__class__), | |
| is_active=True, | |
| is_reviewed=True, | |
| role=EntityMember.Role.LEADER, | |
| ).order_by("order") | |
| return self.entity_members.filter(role=EntityMember.Role.LEADER).order_by("order") | |
| `@cached_property` | |
| def entity_leaders(self) -> list[EntityMember]: | |
| """Return entity's leaders.""" | |
| return list(self.entity_members.filter(role=EntityMember.Role.LEADER).order_by("order")) |
🤖 Prompt for AI Agents
In `@backend/apps/owasp/models/common.py` around lines 102 - 105, The return type
for the cached_property entity_leaders is incorrect: it returns a QuerySet, not
a list. Update the annotation on entity_leaders to return QuerySet[EntityMember]
(importing the appropriate typing/ORM QuerySet if needed) so it matches the
actual return value from self.entity_members.filter(...).order_by(...);
alternatively, if a concrete list is intended change the implementation to
evaluate the queryset (e.g.,
list(self.entity_members.filter(...).order_by(...))) and keep the
list[EntityMember] annotation—also check child classes in project.py that slice
this property and ensure they still expect a QuerySet or adjust them if you
convert to a list.
* Add dockerfile and compose and update makefile * Dump db-data * Reorder makefile * Dump db-data * Remove e2e dockerfile * Add CI/CD * Apply rabbit's suggestions * Add postgres instead of pgvector * Remove needs * Update envs * Fix migrations step * Add envs to docker * Remove migrations step * Remove --without test from dockerfile * Copy tests in dockerfile and add needs to gh workflow * Update dockerfile * Apply suggestion * Use the e2e instance in the frontend e2e tests * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD and dockerfile * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Restore needs for the job * Update Makefiles * Update docs * Apply suggestions * Update load-data * Skip sonar error * update port numbers * Update docs * Update code * Add required shell property back * Bump config.webServer timeout * Update code --------- Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
* Add DB environment variables to e2e.yaml and add csrf_decorate function * Skip sonar suggestion * Fix rest api internal error * Add timeout * Update docs * Update code * Revert csrf update * Add command to dump local data * Update dump and load data * Update rest api config and docs * Apply check-spelling * Use .env.e2e.example for frontend e2e tests in gh actions * Apply rabbit's suggestions * Migrate dump_data to django command and dump owasp, github, and slack tables only * Apply rabbit's suggestions * Update code * Refactor dump_data * Use local cache for e2e * Remove old load_data command * Add tests * Skip sonar * Apply rabbit suggestions * Update code --------- Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org> Add backend/data/nest.sql.gz
* Add hypothesis to poetry and fuzz test the index endpoint. * Refactor import statements in algolia_test.py for consistency * Add the client ip address to the request META. * Add fuzz testing for Slack event handlers and refactor algolia fuzz testing. * Refactor fuzz tests for Slack event handlers to improve readability and structure * Fix the poetry lock file. * Remove fuzz testing from algolia_search unit tests * Create a docker file for fuzz-testing, add the run commands to the MakeFile and install model_bakery for creating randomized data for all models. * Refactor to improve quality * Update fuzz testing setup: modify Makefile and Dockerfile, add entrypoint script for environment configuration * Update poetry.lock to reflect dependency changes and version updates * Create a fuzz configuration, update docker file, makefile, and the tests. * Refactor fuzz configuration by reorganizing imports and cleaning up whitespace * Update Dockerfile and entrypoint script to use Alpine base image and shell * Run the server on port 8000 after the tests. * Create a docker compose file for fuzz testing. * Add 'graphqler' to custom dictionary * Load data from nest.json and add graphqler to cspell dict. * Remove model-bakery dependency from pyproject.toml and update poetry.lock * Update graphqler command in docker compose and the healthcheck * Update graphql command to use backend service URL in docker-compose * Refactor docker-compose to build graphqler service from Dockerfile and add entrypoint script for fuzzing tests * Enhance fuzz testing setup: update Dockerfile and entrypoint scripts, improve CSRF handling, and refine healthcheck command in docker-compose * Update fuzz-test-backend command to abort on container exit * Add fuzz testing workflow and update image build steps * Add .env file creation step for fuzz tests in CI/CD workflow * Add Docker Hub login step for fuzz tests in CI/CD workflow * Refactor for the checks * Refactor fuzz testing workflow: replace Docker Hub login with buildx setup and update docker-compose handling * Fix fuzz tests workflow: rename docker-compose file * Refactor fuzz-tests job. * Add environment variables for fuzz tests configuration * Update fuzz tests environment variables * Fix poetry lock file * Sort the custom-dict. * Update content hash in poetry.lock * Add docker cache mounts to the backend image * Add Redis configuration * refactor yaml * Add docker cache mounts to graphql file * Remove unnecessary chmod command for cache directories in Dockerfile * Fix poetry lock file * Add cache mounts to backend tests * Update cache mounts in graphql image * Update mkdir in graphql image * Remove duplicates * Update tests * Rename docker compose * Update poetry lock * Apply sonar * Migrate to OWASP repo * Update docker * Use graphqler maintainer docker image * Add disable permissions, update docker compose, and update entrypoint * Establish an e2e backend instance locally and in CI/CD (#2429) * Add dockerfile and compose and update makefile * Dump db-data * Reorder makefile * Dump db-data * Remove e2e dockerfile * Add CI/CD * Apply rabbit's suggestions * Add postgres instead of pgvector * Remove needs * Update envs * Fix migrations step * Add envs to docker * Remove migrations step * Remove --without test from dockerfile * Copy tests in dockerfile and add needs to gh workflow * Update dockerfile * Apply suggestion * Use the e2e instance in the frontend e2e tests * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD and dockerfile * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Restore needs for the job * Update Makefiles * Update docs * Apply suggestions * Update load-data * Skip sonar error * update port numbers * Update docs * Update code * Add required shell property back * Bump config.webServer timeout * Update code --------- Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org> * Fix running e2e backend (#2710) * Add DB environment variables to e2e.yaml and add csrf_decorate function * Skip sonar suggestion * Fix rest api internal error * Add timeout * Update docs * Update code * Revert csrf update * Add command to dump local data * Update dump and load data * Update rest api config and docs * Apply check-spelling * Use .env.e2e.example for frontend e2e tests in gh actions * Apply rabbit's suggestions * Migrate dump_data to django command and dump owasp, github, and slack tables only * Apply rabbit's suggestions * Update code * Refactor dump_data * Use local cache for e2e * Remove old load_data command * Add tests * Skip sonar * Apply rabbit suggestions * Update code --------- Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org> * Update code for e2e * Add runs-on * Skip sonar and fix ci/cd * Apply rabbit suggestion and override entrypoint in ci/cd * Use env with csrf * Add timeout * Remove hypothesis and old test files * Apply rabbit's suggestions * Update ci/cd and makefile * Use digest pinning with graphqler image * Update dockerfile and fix the typeerror issue * Apply sonar suggestion * Apply sonar and rabbit suggestions * Remove cache from ci/cd * Use curl instead of wget * Separate e2e from fuzz * Update fuzz ci/cd * Update CI/CD * Run precommit * Update code * Update code * Update docs, ci/cd, and apply suggestions * Use digest pinning and parameters in workflow * Apply sonar suggestions * Apply rabbit suggestions * Run migrations for fuzz testing in ci/cd * Apply rabbit suggestions * Fix exceptions * Establish an e2e backend instance locally and in CI/CD (#2429) * Add dockerfile and compose and update makefile * Dump db-data * Reorder makefile * Dump db-data * Remove e2e dockerfile * Add CI/CD * Apply rabbit's suggestions * Add postgres instead of pgvector * Remove needs * Update envs * Fix migrations step * Add envs to docker * Remove migrations step * Remove --without test from dockerfile * Copy tests in dockerfile and add needs to gh workflow * Update dockerfile * Apply suggestion * Use the e2e instance in the frontend e2e tests * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD and dockerfile * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Update CI/CD * Restore needs for the job * Update Makefiles * Update docs * Apply suggestions * Update load-data * Skip sonar error * update port numbers * Update docs * Update code * Add required shell property back * Bump config.webServer timeout * Update code --------- Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org> * Fix running e2e backend (#2710) * Add DB environment variables to e2e.yaml and add csrf_decorate function * Skip sonar suggestion * Fix rest api internal error * Add timeout * Update docs * Update code * Revert csrf update * Add command to dump local data * Update dump and load data * Update rest api config and docs * Apply check-spelling * Use .env.e2e.example for frontend e2e tests in gh actions * Apply rabbit's suggestions * Migrate dump_data to django command and dump owasp, github, and slack tables only * Apply rabbit's suggestions * Update code * Refactor dump_data * Use local cache for e2e * Remove old load_data command * Add tests * Skip sonar * Apply rabbit suggestions * Update code --------- Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org> * Add backend/data/nest.sql.gz * Update code * Automate data loading * Update dump_data to avoid exceptions * Update dump_data and automated data_loading * Update CI/CD * Update tests * Add timeout for fuzz tests * Update timeout for fuzz * Update CI/CD * Update CI/CD * Update CI/CD * Apply rabbit's suggestions * Update backend/Makefile * Update make targets and docker compose * Add volume for graphql fuzzing results and add upload artifacts in ci/cd * Update ci/cd * Update ci/cd * Update ci/cd * Update ci/cd * Update docker compose and makefile * Apply rabbit's suggestions * Update dump to match the last nest.json.gz --------- Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
* Add redis cache locally and in ci/cd * Update env and makefile * Update make target * Add cache to fuzz and apply rabbit suggestions * Update makefile target * Add cache as dependency to the backend in docker compose * Update file naming to match the remaining compose projects
* Update volumes * Update compose folders Update docker compose, dump file and ci/cd Update ci/cd
* Migrate to schemathesis and add rest fuzz tests * Apply cspell * Update Fuzz Dockerfile context * Update rest auth for fuzz * Optimize Project Health Stats query and split rest and graphql tests * Split rest and graphql tests workflows * Update ci/cd * Apply rabbit suggestions * Update ci/cd * Apply rabbit's suggestions * Increase number of examples * Apply rabbit's suggestions * Apply pre-commit checks * Update CI/CD * Update makefile * Update CI/CD * Update CI/CD * Update ci/cd * Update ci/cd * Update CI/CD * Update settings.base.py and ci/cd * Update configuration and ci/cd * Update alphabitical order in env files * Fix negative indexing * Add EscapeNullCharactersMiddleware * Update middleware, schemathesis config and add HTTP BadRequest status code to the rest api docs * Update rest api schema * Update tests * Optimize recentIssues * Add optimiazations and fixes * Update tests, ci/cd and apply rabbit suggestions * Optimize N+1 queries * Update tests * Update rest schema and add a name for Fuzz Tests job in CI/CD * Fix negative indexing * Update project health metrics filters and pagination * Update mentorship app, ci/cd, and entrypoint * Add trigram index to project * Update nest.dump * Update entrypoint * Apply checks * Add QueryDepthLimiter * Add optimizations * Update tests * Update CI/CD * Add fixes * Apply rabbit's suggestion * Refactor docker files * Apply cspell * Refactor limits * Update milestone enum and rest api endpoints * Apply middleware suggestions * Migrate to strawberry_django.field * Update REDIS_AUTH_ENABLED default value * Update queries to use strawberry_django.field * Apply rabbit suggestions * Update tests and appply rabbit suggestion * Fix pagination.limit * Add optimizations and fixes * Update code * Add optimizations * Add optimizations * Add optimizations * Add fixes * Add milestone index migration * Update nest.dump * Add optimizations and fixes * Update snapshot query * Update backend tests * Update project model * Apply rabbit suggestion * Apply rabbit suggestions * Apply rabbit suggestion and update tests * Update code --------- Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
* Update dump_data * Apply rabbit suggestions * Update tests and dump_data command * Update update-nest-test-images.yaml
* Update strawberry_django decorators * Update entity_leaders * Update tests * Update code --------- Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
711a5f1 to
eb1354d
Compare
|
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
backend/apps/api/rest/v0/committee.py (1)
81-85: Function name and docstring mismatch the endpoint purpose.The function is named
get_chapterwith docstring "Get chapter", but it handles the committee endpoint/{str:committee_id}. This appears to be a copy-paste error.Suggested fix
-def get_chapter( +def get_committee( request: HttpRequest, committee_id: str = Path(example="project"), ) -> CommitteeDetail | CommitteeError: - """Get chapter.""" + """Get committee."""backend/tests/apps/owasp/api/internal/nodes/project_health_metrics_test.py (2)
19-44: Missing fields in expected_field_names.The
expected_field_namesset appears to be missingage_days_requirementandlast_pull_request_days_requirement, which are defined inProjectHealthMetricsNode(lines 40-42 and 65-67 of the node file).🐛 Suggested fix
expected_field_names = { "age_days", + "age_days_requirement", "created_at", "contributors_count", "forks_count", "is_funding_requirements_compliant", "is_leader_requirements_compliant", "last_commit_days", "last_commit_days_requirement", "last_pull_request_days", + "last_pull_request_days_requirement", "last_release_days", "last_release_days_requirement", "open_issues_count",
46-76: Missing test cases for requirement fields.The parameterized test is also missing test cases for
age_days_requirementandlast_pull_request_days_requirementfields.💚 Add missing test cases
`@pytest.mark.parametrize`( ("field_name", "expected_type"), [ ("age_days", int), + ("age_days_requirement", int), ("created_at", datetime), ... ("last_pull_request_days", int), + ("last_pull_request_days_requirement", int), ("last_release_days", int),
🤖 Fix all issues with AI agents
In @.github/workflows/run-fuzz-tests.yaml:
- Around line 93-105: The GitHub Actions step "Build Fuzz-testing image" sets
context: backend but uses file: docker/backend/Dockerfile.fuzz which is resolved
relative to the context and therefore incorrect; update the step so the
Dockerfile path is correct relative to the context—either change context to .
(repo root) or change file to a path relative to backend (e.g.,
../../docker/backend/Dockerfile.fuzz) so that the Docker build action can find
the Dockerfile.
In `@backend/apps/github/api/internal/queries/release.py`:
- Around line 55-66: The Window ordering used when annotating rank can produce
non-deterministic ties when multiple releases by the same author share the same
published_at; update the Window in the distinct branch (the annotate(... Rank()
... order_by=F("published_at").desc())) to include F("id") as a secondary sort
key (e.g. order_by=[F("published_at").desc(), F("id").desc()]) so Rank() yields
a single rank=1 row per author.
In `@backend/apps/owasp/api/internal/queries/post.py`:
- Line 1: Update the module docstring in post.py to correctly describe the file
contents: replace the incorrect "OWASP event GraphQL queries." text with
something like "OWASP post GraphQL queries." to reflect that this module
contains GraphQL queries for the Post model (module: post.py).
In `@backend/apps/owasp/api/internal/queries/project_health_metrics.py`:
- Around line 46-55: The current logic validates and mutates
pagination.offset/limit but never slices the queryset returned by
ProjectHealthMetrics.get_latest_health_metrics(), so apply the computed
offset/limit to the queryset before returning; fetch queryset =
ProjectHealthMetrics.get_latest_health_metrics(), validate pagination.offset and
pagination.limit as you already do (using MAX_OFFSET, MAX_LIMIT, UNSET), compute
local offset = min(pagination.offset, MAX_OFFSET) and, if a valid limit is
provided, compute limit = min(pagination.limit, MAX_LIMIT) and return
list(queryset[offset:offset+limit]) (or list(queryset[offset:]) when no limit),
otherwise return list(queryset) when pagination is absent.
In `@backend/Makefile`:
- Around line 103-113: The Makefile targets load-data, load-data-e2e and
load-data-fuzz run pg_restore against ./backend/data/nest.dump inside DB
containers that don't have that host path mounted; either mount the host
backend/data into the containers or change the targets to use the containers'
mounted path or remove them and invoke the existing data-loader services.
Concretely: for local (nest-db) add a volume mapping so ./backend/data is
available or remove/disable load-data and use the data-loader service; for
e2e/fuzz update load-data-e2e and load-data-fuzz to reference the
container-mounted path (/data/nest.dump) or call the e2e/fuzz data-loader
service instead of exec-db-command-e2e/exec-db-command-fuzz; ensure
exec-db-command, exec-db-command-e2e and exec-db-command-fuzz targets are
consistent with the chosen approach.
In `@backend/tests/apps/owasp/models/committee_test.py`:
- Around line 28-33: The test may see stale results because
Committee.active_committees_count is lru_cache-decorated; before invoking
Committee.active_committees_count() in test_active_committees_count, clear its
cache by calling Committee.active_committees_count.cache_clear() (or
functools.cache_clear equivalent) so the mocked Committee.objects.filter.count()
is actually used; add that cache_clear call just prior to the call to
Committee.active_committees_count() and keep the existing mock/asserts.
♻️ Duplicate comments (9)
backend/pyproject.toml (1)
105-108: Narrow the suppression to S607 or justify S603.If
dump_data.pydoesn’t useshell=True, S603 won’t trigger. Either remove S603 or add a brief justification comment to avoid over-broad security lint suppression.🔧 Suggested minimal change
lint.per-file-ignores."**/management/commands/dump_data.py" = [ - "S603", # https://docs.astral.sh/ruff/rules/subprocess-without-shell-equals-true/, "S607", # https://docs.astral.sh/ruff/rules/start-process-with-partial-path/, ]Please confirm the current Ruff rule guidance before deciding:
Ruff rule S603 subprocess-without-shell-equals-true and S607 start-process-with-partial-path guidance; when should they be suppressed?backend/apps/github/api/internal/queries/issue.py (1)
51-62: UseRowNumber()instead ofRank()to guarantee one issue per author.
Rank()assigns identical ranks to tied rows. If an author has multiple issues with the samecreated_attimestamp, all will haverank=1and pass the filter at line 60, violating the documented "unique issues per author" contract at line 30. UseRowNumber()with a deterministic tie-breaker (e.g.,id) to ensure exactly one issue per author.🐛 Proposed fix
-from django.db.models.functions import Rank +from django.db.models.functions import RowNumber ... if distinct: queryset = ( queryset.annotate( rank=Window( - expression=Rank(), + expression=RowNumber(), partition_by=[F("author_id")], - order_by=F("created_at").desc(), + order_by=[F("created_at").desc(), F("id").desc()], ) ) .filter(rank=1) .order_by("-created_at") )backend/apps/github/api/internal/queries/pull_request.py (1)
72-83: Add deterministic tie-breaker to ensure one PR per author whencreated_attimestamps match.
Rank()assigns the same rank to rows with identicalcreated_atvalues, so multiple PRs from one author with identical timestamps will all passrank=1, violating the "unique per author" contract. Addidas a secondary sort key.Suggested fix
queryset = ( queryset.annotate( rank=Window( expression=Rank(), partition_by=[F("author_id")], - order_by=F("created_at").desc(), + order_by=[F("created_at").desc(), F("id").desc()], ) ) .filter(rank=1) .order_by("-created_at") )backend/apps/owasp/models/common.py (1)
101-104: Return type annotation mismatch.The method returns a
QuerySetfrom.filter().order_by(), but the annotation declareslist[EntityMember]. This was already flagged in a previous review.backend/apps/owasp/api/internal/nodes/committee.py (1)
33-36: Guardrepositories_countwhen no repository is attached.Returning
1unconditionally can overcount whenroot.owasp_repositoryis missing.🛠️ Suggested fix
- return 1 + return 1 if root.owasp_repository else 0backend/entrypoint.fuzz.sh (2)
11-11: Add curl timeouts to prevent hanging.The curl request lacks timeouts, which can cause the entrypoint to hang indefinitely in CI if the network is slow or unresponsive.
Suggested fix
-CSRF_TOKEN=$(curl -fsSL "$BASE_URL/csrf" | jq -r '.csrftoken') +CSRF_TOKEN=$(curl -fsSL --connect-timeout 5 --max-time 15 "$BASE_URL/csrf" | jq -r '.csrftoken')
46-48: Quote TEST_FILE variable to prevent word splitting.The unquoted
${TEST_FILE}variable can cause issues if the path contains spaces or special characters.Suggested fix
- pytest ./tests/${TEST_FILE} + pytest "./tests/${TEST_FILE}" else - pytest --log-cli-level=INFO -s ./tests/${TEST_FILE} + pytest --log-cli-level=INFO -s "./tests/${TEST_FILE}"docker-compose/e2e/compose.yaml (1)
22-28: Installwgetin the backend Dockerfile or use a Python-based healthcheck.This issue was previously flagged. The backend image (
python:3.13.7-alpine) does not includewget, causing the healthcheck to fail.backend/apps/owasp/api/internal/queries/member_snapshot.py (1)
34-38: Avoid N+1 count queries by prefetching all M2Ms used in count fields.
commits_count/issues_count/pull_requests_count/messages_countcall.count()on M2M relations, which can lead to per-row queries without prefetching. Consider prefetchingcommitsalongside the existing relations for both single and list queries.🔧 Suggested fix (add commits to prefetch)
- query = ( - MemberSnapshot.objects.select_related("github_user") - .prefetch_related("issues", "pull_requests", "messages") - .filter(github_user=user) - ) + query = ( + MemberSnapshot.objects.select_related("github_user") + .prefetch_related("commits", "issues", "pull_requests", "messages") + .filter(github_user=user) + ) @@ - snapshots = MemberSnapshot.objects.all().select_related("github_user") + snapshots = ( + MemberSnapshot.objects.all() + .select_related("github_user") + .prefetch_related("commits", "issues", "pull_requests", "messages") + )Also applies to: 62-71
🧹 Nitpick comments (34)
.gitignore (1)
51-52: Remove duplicate Snyk ignore entry to reduce noise.
.cursor/rules/snyk_rules.mdcis already listed at Line 6, so the auto-generated block repeats it unnecessarily. Consider removing the earlier entry or the duplicate to keep the ignore list clean.backend/apps/owasp/api/internal/nodes/snapshot.py (2)
25-28: Addprefetch_relatedhint fornew_chaptersto avoid potential N+1 queries.The
new_chaptersfield is declared as a simplestrawberry_django.field()without aprefetch_relatedhint, unlike the other relationship fields (new_issues,new_projects,new_releases,new_users) which all useprefetch_relatedin their decorators. This inconsistency could cause N+1 queries when querying chapters with nested fields.Consider adding a custom resolver with prefetch_related for consistency:
♻️ Suggested change
- new_chapters: list[ChapterNode] = strawberry_django.field() + `@strawberry_django.field`(prefetch_related=["new_chapters"]) + def new_chapters(self, root: Snapshot) -> list[ChapterNode]: + """Resolve new chapters.""" + return root.new_chapters.order_by("-created_at")
40-53: Consider adding limits tonew_projects,new_releases, andnew_usersfor consistency.Unlike
new_issueswhich hasRECENT_ISSUES_LIMIT, these resolvers return unbounded results. If snapshots can have many related entities, this could lead to performance issues or large response payloads.♻️ Example with limits (if applicable)
+RECENT_PROJECTS_LIMIT = 100 +RECENT_RELEASES_LIMIT = 100 +RECENT_USERS_LIMIT = 100 + `@strawberry_django.field`(prefetch_related=["new_projects"]) def new_projects(self, root: Snapshot) -> list[ProjectNode]: """Resolve new projects.""" - return root.new_projects.order_by("-created_at") + return root.new_projects.order_by("-created_at")[:RECENT_PROJECTS_LIMIT]backend/apps/github/api/internal/queries/milestone.py (1)
79-82: Consider extracting the capped limit for clarity.The walrus operator inside the ternary condition works but slightly obscures the flow. A minor readability improvement:
♻️ Suggested refactor
- return ( - milestones.order_by("-created_at")[:limit] - if (limit := min(limit, MAX_LIMIT)) > 0 - else [] - ) + capped_limit = min(limit, MAX_LIMIT) + if capped_limit <= 0: + return [] + return list(milestones.order_by("-created_at")[:capped_limit])backend/tests/apps/github/api/internal/queries/milestone_test.py (1)
91-107: Consider adding tests for limit edge cases.The implementation introduces
MAX_LIMITcapping and handleslimit <= 0by returning an empty list. These behaviors should be tested.💡 Suggested additional test cases
def test_recent_milestones_limit_exceeds_max(self, get_queryset): """Test that limit is capped at MAX_LIMIT.""" from apps.github.api.internal.queries.milestone import MAX_LIMIT with patch.object(Milestone, "objects", new_callable=Mock) as mock_manager: mock_manager.all.return_value = get_queryset MilestoneQuery().recent_milestones( distinct=False, limit=MAX_LIMIT + 500, login=None, organization=None, state=None, ) # Verify slice uses MAX_LIMIT, not the provided limit get_queryset.__getitem__.assert_called_with(slice(None, MAX_LIMIT)) def test_recent_milestones_zero_limit_returns_empty(self): """Test that limit <= 0 returns an empty list.""" result = MilestoneQuery().recent_milestones( distinct=False, limit=0, login=None, organization=None, state=None, ) assert result == []backend/apps/api/rest/v0/event.py (1)
97-117: UnusedBAD_REQUESTresponse mapping and type annotation mismatch.The
HTTPStatus.BAD_REQUEST: ValidationErrorSchemais declared in the response mapping (line 102), but the handler never returns this status—it only returns the event orNOT_FOUND. Additionally, the return type annotationEventDetail | EventErrordoesn't includeValidationErrorSchema.If this is for framework-level path validation (though
str:event_idaccepts any string), consider documenting that intent. Otherwise, if explicit validation will be added later, the return type should be updated to includeResponseor the validation schema.💡 Option 1: Remove unused BAD_REQUEST if not needed
`@router.get`( "/{str:event_id}", description="Retrieve an event details.", operation_id="get_event", response={ - HTTPStatus.BAD_REQUEST: ValidationErrorSchema, HTTPStatus.NOT_FOUND: EventError, HTTPStatus.OK: EventDetail, }, summary="Get event", )💡 Option 2: Update return type if BAD_REQUEST is intentional
def get_event( request: HttpRequest, event_id: str = Path(..., example="owasp-global-appsec-usa-2025-washington-dc"), -) -> EventDetail | EventError: +) -> EventDetail | EventError | Response:backend/tests/apps/github/api/internal/nodes/milestone_test.py (1)
43-45: Guard against missing field metadata before dereference.
_get_field_by_namecan returnNone; dereferencingbase_resolverthen raises a genericAttributeError. Add a local assertion (or make the helper raise) to fail with a clear message. Apply this pattern to all resolver tests.♻️ Suggested tweak (apply similarly to each resolver test)
- field = self._get_field_by_name("organization_name", MilestoneNode) - result = field.base_resolver.wrapped_func(None, mock_milestone) + field = self._get_field_by_name("organization_name", MilestoneNode) + assert field is not None, "organization_name field not found on MilestoneNode" + result = field.base_resolver.wrapped_func(None, mock_milestone)Also applies to: 54-56, 63-65, 73-75, 83-85, 94-96, 103-105
backend/tests/apps/github/api/internal/nodes/organization_test.py (1)
74-75: Add explicit asserts when fetching fields for clearer failures.
Right now a missing field would throw an attribute error; an assert provides a clearer test failure message.♻️ Suggested tweak
- field = self._get_field_by_name("stats", OrganizationNode) - result = field.base_resolver.wrapped_func(None, mock_organization) + field = self._get_field_by_name("stats", OrganizationNode) + assert field is not None + result = field.base_resolver.wrapped_func(None, mock_organization) @@ - field = self._get_field_by_name("stats", OrganizationNode) - result = field.base_resolver.wrapped_func(None, mock_organization) + field = self._get_field_by_name("stats", OrganizationNode) + assert field is not None + result = field.base_resolver.wrapped_func(None, mock_organization) @@ - field = self._get_field_by_name("url", OrganizationNode) - result = field.base_resolver.wrapped_func(None, mock_organization) + field = self._get_field_by_name("url", OrganizationNode) + assert field is not None + result = field.base_resolver.wrapped_func(None, mock_organization)Also applies to: 110-111, 126-127
backend/apps/mentorship/api/internal/queries/mentorship.py (1)
108-132: Verify empty-list semantics for missing/unenrolled mentees.Returning
[]for missing entities or unenrolled mentees removes the distinction from “no issues”; please confirm this is the intended API behavior. Also consider adding program/module/mentee identifiers to the warning messages for easier debugging.backend/tests/apps/github/models/repository_contributor_test.py (1)
268-276: Consider removing unusedidfield from mock data.The mock queryset return at line 274 includes
"id": "test-id", butget_top_contributorsderives theidfromtc["user__login"], not from a separateidfield in the aggregation result. This extraneous field in the mock is ignored and could be misleading to future maintainers.♻️ Proposed fix
mock_order_by.__getitem__.return_value = [ { "total_contributions": 10, "user__avatar_url": MOCK_AVATAR_URL, "user__login": "testuser", "user__name": name, - "id": "test-id", } ]backend/apps/github/api/internal/nodes/user.py (2)
30-33: Prefetched queryset is ignored due to.filter()on related manager.Calling
root.user_badges.filter(is_active=True)on a prefetched relation creates a new query instead of filtering the prefetched cache. To leverage the prefetch, filter the prefetched objects in Python:Proposed fix
`@strawberry_django.field`(prefetch_related=["user_badges"]) def badge_count(self, root: User) -> int: """Resolve badge count.""" - return root.user_badges.filter(is_active=True).count() + return sum(1 for ub in root.user_badges.all() if ub.is_active)Alternatively, use a
Prefetchobject with a filtered queryset to push the filter to the prefetch:from django.db.models import Prefetch `@strawberry_django.field`( prefetch_related=[ Prefetch("user_badges", queryset=UserBadge.objects.filter(is_active=True)) ] ) def badge_count(self, root: User) -> int: """Resolve badge count.""" return root.user_badges.all().count()
35-48: Prefetch is bypassed by.filter().select_related()chain.The
prefetch_related=["user_badges__badge"]hint is ineffective because.filter()and.select_related()on the related manager create a fresh query. Use aPrefetchobject to apply the filter and select at prefetch time:Proposed fix using Prefetch
from django.db.models import Prefetch from apps.nest.models.user_badge import UserBadge # adjust import as needed `@strawberry_django.field`( prefetch_related=[ Prefetch( "user_badges", queryset=UserBadge.objects.filter(is_active=True) .select_related("badge") .order_by("badge__weight", "badge__name"), ) ] ) def badges(self, root: User) -> list[BadgeNode]: """Return user badges.""" return [user_badge.badge for user_badge in root.user_badges.all()]backend/apps/github/api/internal/nodes/repository.py (2)
69-72: Thelimitparameter default duplicates the constant pattern used elsewhere.Other resolvers use module-level constants (e.g.,
RECENT_ISSUES_LIMIT = 5). Consider extractingRECENT_MILESTONES_LIMIT = 5for consistency, or alternatively, remove the parameter if the limit should always be 5 to match other "recent" patterns.
46-50: Consider consistency with similar operations that don't useprefetch_relatedfor slicing.The
prefetch_related=["issues"]decorator fetches all related issues, though the resolver returns only the top 5 by date. While this pattern is used elsewhere in the codebase (snapshot.py),project.pyimplements an identical operation (recent_issueswith slicing) withoutprefetch_related. WithDjangoOptimizerExtensionenabled, the optimizer may handle query optimization automatically. Consider whether the decorator hint is necessary or if the pattern can be simplified to match the approach in project.py.backend/apps/github/api/internal/nodes/issue.py (3)
29-36: Remove redundant"repository"fromselect_related.The
select_related=["repository__organization", "repository"]hint is redundant—"repository__organization"already includes"repository"in the join path. Simplify to match the pattern inPullRequestNode.♻️ Suggested fix
- `@strawberry_django.field`(select_related=["repository__organization", "repository"]) + `@strawberry_django.field`(select_related=["repository__organization"])
48-51: Potential inefficiency:filter()on prefetched relation triggers a new query.The
prefetch_related=["pull_requests"]hint caches all related pull requests, but.filter(state="closed", merged_at__isnull=False).exists()executes a new database query rather than filtering the prefetched cache. Consider using Python-side filtering on the prefetched data or removing the prefetch hint if a database query is acceptable.♻️ Option A: Filter prefetched data in Python
`@strawberry_django.field`(prefetch_related=["pull_requests"]) def is_merged(self, root: Issue) -> bool: """Return True if this issue has at least one merged pull request.""" - return root.pull_requests.filter(state="closed", merged_at__isnull=False).exists() + return any( + pr.state == "closed" and pr.merged_at is not None + for pr in root.pull_requests.all() + )♻️ Option B: Remove prefetch hint and use database query
- `@strawberry_django.field`(prefetch_related=["pull_requests"]) + `@strawberry_django.field` def is_merged(self, root: Issue) -> bool: """Return True if this issue has at least one merged pull request.""" return root.pull_requests.filter(state="closed", merged_at__isnull=False).exists()
53-61: Remove redundant.select_related("user")inside the resolver.The
prefetch_related=["participant_interests__user"]hint already eagerly loads theuserrelation via the prefetch path. Calling.select_related("user")inside the resolver is redundant and may cause unexpected query behavior.♻️ Suggested fix
`@strawberry_django.field`(prefetch_related=["participant_interests__user"]) def interested_users(self, root: Issue) -> list[UserNode]: """Return all users who have expressed interest in this issue.""" return [ interest.user - for interest in root.participant_interests.select_related("user").order_by( - "user__login" - ) + for interest in root.participant_interests.order_by("user__login") ]Note: If ordering is needed, consider using a
Prefetchobject with a custom queryset in the decorator to apply the ordering at prefetch time.backend/apps/owasp/api/internal/nodes/common.py (1)
14-42: Add type annotations torootparameters for clarity and static analysis.All resolver methods have untyped
rootparameters. Adding type annotations improves IDE support, documentation, and static analysis. Based on the relevant code snippets fromuser.pyandrepository.py, the pattern uses explicit types.Suggested improvement
Since
GenericEntityNodeis a base class for multiple entity types, you could either:
- Use a Union type or Protocol for the root parameter
- Leave it untyped if multiple model types are expected (current approach)
If a common base model exists:
+from apps.owasp.models.common import RepositoryBasedEntityModel + `@strawberry_django.field`(prefetch_related=["entity_members__member"]) -def entity_leaders(self, root) -> list[EntityMemberNode]: +def entity_leaders(self, root: RepositoryBasedEntityModel) -> list[EntityMemberNode]: """Resolve entity leaders.""" return root.entity_leadersbackend/tests/apps/owasp/api/internal/nodes/common_test.py (2)
18-18: Inconsistent test invocation patterns.The
entity_leaderstest (line 18) creates an instance and calls the method, while other tests (lines 27, 36, 45, 54) call the method on the class withNoneasself. This inconsistency could cause confusion.Suggested fix: Use consistent instance-based invocation
def test_leaders_resolver(self): """Test leaders returns indexed leaders list.""" mock_entity = Mock() mock_entity.idx_leaders = ["leader1", "leader2"] - result = GenericEntityNode.leaders(None, mock_entity) + result = GenericEntityNode().leaders(mock_entity) assert result == ["leader1", "leader2"] def test_related_urls_resolver(self): """Test related_urls returns URLs list.""" mock_entity = Mock() mock_entity.related_urls = ["https://example.com", "https://test.com"] - result = GenericEntityNode.related_urls(None, mock_entity) + result = GenericEntityNode().related_urls(mock_entity) assert result == ["https://example.com", "https://test.com"] def test_updated_at_resolver(self): """Test updated_at returns indexed timestamp.""" mock_entity = Mock() mock_entity.idx_updated_at = 1234567890.0 - result = GenericEntityNode.updated_at(None, mock_entity) + result = GenericEntityNode().updated_at(mock_entity) assert math.isclose(result, 1234567890.0) def test_url_resolver(self): """Test url returns indexed URL.""" mock_entity = Mock() mock_entity.idx_url = "https://owasp.org/www-project-example" - result = GenericEntityNode.url(None, mock_entity) + result = GenericEntityNode().url(mock_entity) assert result == "https://owasp.org/www-project-example"Also applies to: 27-27, 36-36, 45-45, 54-54
31-38: Test coverage fortop_contributorsresolver is missing.The
GenericEntityNodeclass has six resolvers (entity_leaders,leaders,related_urls,top_contributors,updated_at,url), but there's no test fortop_contributors. Consider adding coverage for completeness.Suggested test
def test_top_contributors_resolver(self): """Test top_contributors returns list of RepositoryContributorNode.""" mock_entity = Mock() mock_entity.idx_top_contributors = [ { "avatar_url": "https://example.com/avatar.png", "contributions_count": 10, "id": "1", "login": "contributor1", "name": "Contributor One", } ] result = GenericEntityNode().top_contributors(mock_entity) assert len(result) == 1 assert result[0].login == "contributor1" assert result[0].contributions_count == 10backend/apps/owasp/models/project_health_metrics.py (1)
173-181: Consider removing unnecessary select_related in subquery.The
select_related("project")on line 177 is not needed in the inner subquery since onlyidvalues are extracted viavalues_list("id", flat=True). Removing it would reduce query overhead.♻️ Suggested optimization
return ProjectHealthMetrics.objects.filter( id__in=ProjectHealthMetrics.objects.filter(project__is_active=True) - .select_related("project") .order_by("project_id", "-nest_created_at") .distinct("project_id") .values_list("id", flat=True) )backend/apps/owasp/api/internal/nodes/project.py (1)
87-104: Consider prefetching repositories for the milestones query.The
recent_milestonesresolver accessesroot.repositories.all()without aprefetch_related=["repositories"]hint on the decorator. If this field is queried alongside other fields that don't prefetch repositories, it may result in an additional query.♻️ Suggested improvement
- `@strawberry_django.field` + `@strawberry_django.field`(prefetch_related=["repositories"]) def recent_milestones(self, root: Project, limit: int = 5) -> list[MilestoneNode]:backend/.env.fuzz.example (2)
15-15: Remove quotes from IP address value.The quotes around
127.0.0.1may be interpreted literally by some environment variable parsers, resulting in the value"127.0.0.1"instead of127.0.0.1. This aligns with the static analysis hint flagging the quote character.Suggested fix
-DJANGO_PUBLIC_IP_ADDRESS="127.0.0.1" +DJANGO_PUBLIC_IP_ADDRESS=127.0.0.1
20-20: Consider adding a note about generating a proper secret key.
DJANGO_SECRET_KEY=Noneis fine for an example file, but users copying this file might forget to set a real secret key. Consider adding a comment or using a more obviously placeholder value likeDJANGO_SECRET_KEY=change-me-to-a-random-string.backend/tests/fuzz/rest_test.py (1)
8-10: Consider using pytest fixtures for environment validation.Raising
ValueErrorat module import time will cause a less informative error in pytest output. Using a pytest fixture orpytest.skip()would provide cleaner test collection behavior and clearer error messages.Alternative using pytest fixture
import pytest `@pytest.fixture`(scope="module") def rest_config(): rest_url = os.getenv("REST_URL") csrf_token = os.getenv("CSRF_TOKEN") if not rest_url or not csrf_token: pytest.skip("REST_URL and CSRF_TOKEN must be set in the environment.") return rest_url, csrf_tokendocker/backend/Dockerfile.fuzz (1)
26-29: Remove redundant chmod command.Line 29 (
chmod +x ./entrypoint.sh) is redundant since line 26 already sets--chmod=755during the COPY operation.Suggested fix
COPY --chown=root:root --chmod=755 ./entrypoint.fuzz.sh ./entrypoint.sh COPY --chown=root:root --chmod=755 tests/fuzz tests -RUN chmod +x ./entrypoint.sh - USER owaspdocker-compose/fuzz/compose.yaml (1)
48-53: Consider adding--no-ownerand--no-privilegesto pg_restore.The
pg_restorecommand may fail or produce warnings if the dump file contains ownership/privileges from a different user. Adding these flags ensures compatibility when restoring across different environments.♻️ Suggested fix
command: > sh -c ' echo "Loading data from dump..." && - pg_restore -h db -U $$POSTGRES_USER -d $$POSTGRES_DB /data/nest.dump && + pg_restore -h db -U $$POSTGRES_USER -d $$POSTGRES_DB --no-owner --no-privileges /data/nest.dump && echo "Data loading completed." 'backend/apps/common/middlewares/block_null_characters.py (1)
22-31: Improve readability with tuple unpacking.The current iteration pattern
for values in request.GET.lists() for value in values[1]works but is confusing sincevaluesis actually a(key, [values])tuple. Using tuple unpacking improves clarity.♻️ Suggested refactor
if ( NULL_CHARACTER in request.path or NULL_CHARACTER in request.path_info or any( - NULL_CHARACTER in value for values in request.GET.lists() for value in values[1] + NULL_CHARACTER in value for _, vals in request.GET.lists() for value in vals ) or any( - NULL_CHARACTER in value for values in request.POST.lists() for value in values[1] + NULL_CHARACTER in value for _, vals in request.POST.lists() for value in vals ) ):backend/.env.e2e.example (1)
15-15: Remove unnecessary quotes around IP address.The quotes around
"127.0.0.1"are inconsistent with other values in this file and may cause issues depending on how the value is parsed.♻️ Suggested fix
-DJANGO_PUBLIC_IP_ADDRESS="127.0.0.1" +DJANGO_PUBLIC_IP_ADDRESS=127.0.0.1.github/workflows/update-nest-test-images.yaml (1)
78-89: Inconsistentcache-fromconfiguration.This step only uses
type=registryforcache-from, while other steps in this workflow use bothtype=ghaandtype=registry. This inconsistency may result in slower builds when the GHA cache is available.♻️ Suggested fix for consistency
- name: Build and push fuzz-test-backend image uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 with: - cache-from: type=registry,ref=owasp/nest:test-fuzz-backend-cache + cache-from: | + type=gha + type=registry,ref=owasp/nest:test-fuzz-backend-cache cache-to: | type=gha,compression=zstd type=registry,ref=owasp/nest:test-fuzz-backend-cachedocker-compose/e2e/compose.yaml (1)
82-86: Consider usingREDISCLI_AUTHto avoid password warning in logs.Using
redis-cli -aprints a security warning to stderr on every healthcheck (every 5s), polluting logs. SettingREDISCLI_AUTHas an environment variable avoids this.🔧 Suggested fix
healthcheck: interval: 5s retries: 5 - test: [CMD, redis-cli, -a, $$REDIS_PASSWORD, ping] + test: [CMD-SHELL, "REDISCLI_AUTH=$$REDIS_PASSWORD redis-cli ping"] timeout: 5s.github/workflows/run-fuzz-tests.yaml (2)
57-70: Redundant port mapping with host networking.When using
--network host, the-p 9500:9500flag is ignored since the container shares the host's network namespace directly. Remove the redundant port mapping.Proposed fix
docker run -d --rm --name fuzz-nest-backend \ --env-file backend/.env.fuzz.example \ --network host \ -e DJANGO_DB_HOST=localhost \ -e DJANGO_REDIS_AUTH_ENABLED=False \ -e DJANGO_REDIS_HOST=localhost \ - -p 9500:9500 \ owasp/nest:test-backend-latest \
72-80: The/aendpoint is the Django admin interface; this is intentional.The endpoint
http://localhost:9500/amaps to Django's admin site (defined at./backend/settings/urls.py:27). This is a common Django convention to place the admin interface at a non-obvious short path for security. The readiness check verifies backend availability by testing if the admin interface responds.While this approach works, consider implementing a dedicated
/healthor/healthzendpoint as a more conventional and explicit health check mechanism that doesn't rely on admin functionality.backend/tests/apps/owasp/api/internal/queries/member_snapshot_test.py (1)
35-50: Assert eager-loading arguments to catch regressions.These assertions only check call counts; consider validating the exact arguments to ensure the resolver keeps the intended eager-loading paths.
♻️ Suggested tightening of assertions
- mock_queryset.select_related.assert_called_once() - mock_queryset.prefetch_related.assert_called_once() + mock_queryset.select_related.assert_called_once_with("github_user") + mock_queryset.prefetch_related.assert_called_once_with( + "issues", + "pull_requests", + "messages", + )
| if distinct: | ||
| latest_release_per_author = ( | ||
| queryset.filter(author_id=OuterRef("author_id")) | ||
| queryset = ( | ||
| queryset.annotate( | ||
| rank=Window( | ||
| expression=Rank(), | ||
| partition_by=[F("author_id")], | ||
| order_by=F("published_at").desc(), | ||
| ) | ||
| ) | ||
| .filter(rank=1) | ||
| .order_by("-published_at") | ||
| .values("id")[:1] | ||
| ) | ||
| queryset = queryset.filter( | ||
| id__in=Subquery(latest_release_per_author), | ||
| ).order_by( | ||
| "-published_at", | ||
| ) |
There was a problem hiding this comment.
Add a deterministic tie-breaker to the Window ordering.
When multiple releases from the same author have identical published_at timestamps, Rank() assigns them the same rank, causing multiple releases to pass rank=1. Add F("id") as a secondary sort key to guarantee exactly one release per author.
Proposed fix
if distinct:
queryset = (
queryset.annotate(
rank=Window(
expression=Rank(),
partition_by=[F("author_id")],
- order_by=F("published_at").desc(),
+ order_by=[F("published_at").desc(), F("id").desc()],
)
)
.filter(rank=1)
.order_by("-published_at")
)🤖 Prompt for AI Agents
In `@backend/apps/github/api/internal/queries/release.py` around lines 55 - 66,
The Window ordering used when annotating rank can produce non-deterministic ties
when multiple releases by the same author share the same published_at; update
the Window in the distinct branch (the annotate(... Rank() ...
order_by=F("published_at").desc())) to include F("id") as a secondary sort key
(e.g. order_by=[F("published_at").desc(), F("id").desc()]) so Rank() yields a
single rank=1 row per author.
There was a problem hiding this comment.
How will the same author have two releases that have identical published_at timestamps?
There was a problem hiding this comment.
@ahmedxgouda, you're right that this is quite unlikely in practice. The published_at timestamp typically has microsecond precision in PostgreSQL, and GitHub's API generates distinct timestamps for separate release events.
That said, there are a few theoretical scenarios where duplicates could occur:
- Bulk data migrations where timestamps might be normalized
- Database precision mismatches if the field loses precision during storage or conversion
- Manual data manipulation during testing or maintenance
The tie-breaker is defensive programming to guarantee deterministic results in all cases, but given the low probability, it's not strictly necessary. If you're confident the data won't have duplicate timestamps, the current implementation should work fine.
| @@ -1,16 +1,19 @@ | |||
| """OWASP event GraphQL queries.""" | |||
There was a problem hiding this comment.
Module docstring refers to "event" instead of "post".
The docstring says "OWASP event GraphQL queries" but this file contains GraphQL queries for the Post model. This appears to be a copy-paste artifact.
✏️ Suggested fix
-"""OWASP event GraphQL queries."""
+"""OWASP post GraphQL queries."""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| """OWASP event GraphQL queries.""" | |
| """OWASP post GraphQL queries.""" |
🤖 Prompt for AI Agents
In `@backend/apps/owasp/api/internal/queries/post.py` at line 1, Update the module
docstring in post.py to correctly describe the file contents: replace the
incorrect "OWASP event GraphQL queries." text with something like "OWASP post
GraphQL queries." to reflect that this module contains GraphQL queries for the
Post model (module: post.py).



Proposed change
Resolves #(put the issue number here)
Add the PR description here.
Checklist
make check-testlocally: all warnings addressed, tests passed