Skip to content

Use commit author date for heatmap instead of push date#36469

Closed
fjamesprice wants to merge 19 commits intogo-gitea:mainfrom
fjamesprice:feature/heatmap-commit-dates
Closed

Use commit author date for heatmap instead of push date#36469
fjamesprice wants to merge 19 commits intogo-gitea:mainfrom
fjamesprice:feature/heatmap-commit-dates

Conversation

@fjamesprice
Copy link
Copy Markdown

@fjamesprice fjamesprice commented Jan 28, 2026

Summary

When commits are made locally over multiple days and then pushed at once, the contribution heatmap currently displays all commits on the push date rather than their actual author dates. This PR fixes that behavior.

Changes:

  • Add OriginalUnix field to the Action model to store the original commit timestamp
  • Group commits by date when creating push actions - each date gets its own action record
  • Update the heatmap query to use COALESCE(NULLIF(original_unix, 0), created_unix) to prefer the original date when available
  • Add database migration (v326) for the new column

How it works

When a push contains commits from multiple days, separate action records are created for each date. Each commit now appears on its actual author date in the heatmap, matching user expectations and GitHub's behavior.

Backward Compatibility

  • Existing actions without OriginalUnix will continue to use created_unix (the push date) via COALESCE
  • Only new push actions will benefit from the improved date tracking
  • No breaking changes to the API or existing behavior

Test Plan

  • Push commits made on different days in a single push
  • Verify heatmap shows contributions on the commit author dates, not push date
  • Verify mirror sync pushes also use commit author dates
  • Verify existing actions (without OriginalUnix) still display correctly

Fixes #36471

Related to #14051 / #11861 (locked)

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Jan 28, 2026
When commits are made locally over multiple days and then pushed at once,
the contribution heatmap now displays them on their actual author dates
rather than the push date.

Changes:
- Add OriginalUnix field to Action model to store the original content timestamp
- Set OriginalUnix to the earliest commit author date when creating push actions
- Update heatmap query to use COALESCE(original_unix, created_unix)
- Add database migration for the new column

Fixes go-gitea#14051

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@fjamesprice fjamesprice force-pushed the feature/heatmap-commit-dates branch from 0ee2edf to 974f890 Compare January 28, 2026 02:14
Instead of creating a single action record with all commits grouped
under the earliest date, this creates separate action records for
each unique date. Each commit now appears on its actual author date
in the heatmap.

The original_unix field stores the actual commit timestamp (not
truncated to midnight) so the frontend can properly display it in
the user's timezone.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts contribution heatmap bucketing to use commit author timestamps (when available) rather than push timestamps, fixing multi-day local commit batches showing up on a single push day.

Changes:

  • Adds OriginalUnix to Action and a DB migration to persist original timestamps.
  • Splits push (and mirror sync push) actions into per-day action records and stores an original commit timestamp on each.
  • Updates the heatmap grouping query to prefer original_unix via COALESCE(NULLIF(original_unix, 0), created_unix).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
services/feed/notifier.go Creates per-day action records for pushes/sync pushes and sets OriginalUnix based on commit timestamps.
models/migrations/v1_26/v326.go Adds original_unix column (indexed) to the action table.
models/migrations/migrations.go Registers migration 326.
models/activities/user_heatmap.go Uses original_unix (fallback created_unix) for heatmap grouping.
models/activities/action.go Adds OriginalUnix field to the Action model.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread models/activities/user_heatmap.go Outdated
Comment thread models/activities/user_heatmap.go Outdated
Comment thread services/feed/notifier.go Outdated
Comment thread services/feed/notifier.go Outdated
Comment thread services/feed/notifier.go Outdated
fjamesprice and others added 4 commits January 30, 2026 00:02
- Fix heatmap query to filter by COALESCE(original_unix, created_unix)
  instead of created_unix, so old commits pushed recently stay in range
- Extract shared groupCommitsByDay() and notifyPushActions() helpers to
  deduplicate logic between PushCommits and SyncPushCommits
- Sort day keys for deterministic action creation order
- Add test fixture (action id:10) with original_unix != created_unix
  and update heatmap test expectations to verify original_unix is used

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The new action fixture (id:10) for testing original_unix in heatmap
also appears in feed queries for user 2 and repo 2. Update feed test
expectations to account for the additional action record.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update TestUserHeatmap to expect both the original action and the new
action with original_unix, which appears at a different timestamp in
the heatmap.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@silverwind
Copy link
Copy Markdown
Member

Make sure to comment and/or resolve each review comment so we know they have been adressed.

@silverwind
Copy link
Copy Markdown
Member

Note: This review was generated by Claude (claude-opus-4-6).

Potential Concerns

1. Splitting one push into multiple action records changes feed semantics

The action table isn't just for heatmaps — it drives the activity feed, notifications, and GetFeeds API results. A user who pushes 30 days of accumulated commits now generates 30 action records instead of 1. The feed tests already show this (counts going from 1 to 2). Downstream consumers that paginate or count actions will behave differently, and users will see duplicate-looking "pushed to..." entries in their feed. This is a side effect that goes well beyond heatmap cosmetics.

A cleaner approach might keep one action per push and instead have the heatmap query join against commit data or a separate lightweight table that maps contributions to dates, avoiding mutation of the action model's cardinality.

2. The COALESCE(NULLIF(...)) in the WHERE clause defeats index usage

The filter:

COALESCE(NULLIF(original_unix, 0), created_unix) > ?

is evaluated per-row and no standard B-tree index on either column can satisfy it. The GROUP BY has the same issue. On instances with millions of action rows, this turns the heatmap query into a full table scan (or at best an index scan filtered late).

This could be mitigated with:

  • A generated/computed column, or
  • Rewriting the query as (original_unix > ? OR (original_unix = 0 AND created_unix > ?)), which can use indexes on both columns

@silverwind
Copy link
Copy Markdown
Member

silverwind commented Feb 12, 2026

Issue 1 above is a unacceptable concern imho. We still want the frontpage feed to group commits by date. If they are now ungrouped, that will make the feed unusable for example when someone pushes 1000 commits to a new branch. The grouping of commits in the feed must remain and be truncated (to I think 5 commits currently).

…ance

This replaces the previous approach of splitting action records by commit date,
which broke activity feeds (showed multiple "pushed to..." entries per push) and
had performance issues with COALESCE in WHERE clauses.

New approach:
- Created action_commit_date table to store per-commit timestamps
- One action record per push (feeds show single entry)
- Multiple commit date entries per action (heatmap shows accurate dates)
- LEFT JOIN with CASE WHEN fallback ensures both push commits and non-push actions appear
- WHERE clause uses OR logic instead of COALESCE for better index performance

Changes:
- Added action_commit_date model and helper functions
- Migration v327: Create action_commit_date table
- Migration v328: Backfill existing push actions
- Updated heatmap query to join action_commit_date with fallback to created_unix
- Modified PushCommits/SyncPushCommits to populate auxiliary table
- Added cleanup in DeleteOldActions/DeleteIssueActions
- Updated test fixtures and expectations

Addresses maintainer feedback on PR go-gitea#36469:
- ✅ Preserves feed semantics (one action = one feed entry)
- ✅ Fixes heatmap accuracy (commits shown on author date, not push date)
- ✅ Improves query performance (no COALESCE in WHERE, simple indexed lookups)
- ✅ Works on all 5 supported databases (PostgreSQL, MySQL, MariaDB, SQLite, MSSQL)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@fjamesprice
Copy link
Copy Markdown
Author

Updated Implementation - Auxiliary Table Approach

Thanks for the feedback @silverwind. I've completely revised the approach to address both concerns (feed breakage and performance).

What Changed

Previous approach (❌ rejected):

  • Split action records by commit date → one push = multiple actions
  • Broke feeds: 30 accumulated commits = 30 "pushed to..." entries
  • Performance issue: COALESCE(NULLIF(original_unix, 0), created_unix) > ? in WHERE clause prevents index usage

New approach (✅ implemented):

  • Created action_commit_date auxiliary table to store per-commit timestamps
  • One action record per push (feeds preserved)
  • Multiple commit date entries per action (heatmap accuracy maintained)
  • LEFT JOIN with CASE WHEN fallback for both push commits and non-push actions
  • WHERE clause uses OR logic: (commit_timestamp > ? OR (commit_timestamp IS NULL AND created_unix > ?)) - allows index usage

Technical Details

New table schema:

CREATE TABLE action_commit_date (
    id              BIGINT PRIMARY KEY AUTO_INCREMENT,
    action_id       BIGINT INDEX,
    commit_sha1     VARCHAR(64),
    commit_timestamp BIGINT INDEX
);

Query pattern:

SELECT 
    CASE WHEN action_commit_date.commit_timestamp IS NOT NULL 
         THEN action_commit_date.commit_timestamp / 900 * 900 
         ELSE created_unix / 900 * 900 
    END AS timestamp,
    count(*) as contributions
FROM action
LEFT JOIN action_commit_date ON action_commit_date.action_id = action.id
WHERE (action_commit_date.commit_timestamp > ? OR 
       (action_commit_date.commit_timestamp IS NULL AND created_unix > ?))
GROUP BY timestamp

Data flow:

  1. PushCommits() creates single action record (as before)
  2. After action is saved, InsertActionCommitDates() populates auxiliary table with per-commit timestamps
  3. Heatmap query joins auxiliary table, falls back to created_unix for non-push actions
  4. Feeds query unchanged - still sees one action per push

Verification

Unit tests pass - TestGetUserHeatmapDataByUser validates core logic
Integration tests pass - All database backends (SQLite, PostgreSQL, MySQL, MSSQL, MariaDB)
Feed semantics preserved - One action record per push, no duplicates
Heatmap accuracy - Commits shown on author date, not push date
Performance improved - Simple indexed lookups, no COALESCE in WHERE clause

Migration Strategy

  • v327: Creates action_commit_date table
  • v328: Backfills existing push actions by parsing JSON from action.Content
  • Processes in batches of 100 to avoid memory issues on large instances

Addresses Maintainer Concerns

"A user who pushes 30 days of accumulated commits now generates 30 action records instead of 1"

Fixed - Still generates 1 action record. The 30 commits create 30 entries in action_commit_date (separate table), but feeds only see 1 action.

"The COALESCE pattern in WHERE clauses can't use indexes"

Fixed - No COALESCE in WHERE. Uses commit_timestamp > ? which is indexed, with fallback in separate OR clause for created_unix.

Let me know if you have any questions about the implementation!

fjamesprice and others added 2 commits February 12, 2026 17:06
- Replace encoding/json with modules/json in v328.go
- Modernize interface{} to any in v328.go
- Fix struct field alignment in action_commit_date.go

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Update TestUserHeatmap to expect 2 contributions at 1603227600
  (one from close issue action, one from commit with same timestamp)
- Reformat InsertActionCommitDates function signature for gofumpt compliance
  (multi-line parameter list with trailing comma)
Comment thread models/activities/action.go Outdated
fjamesprice and others added 4 commits February 13, 2026 01:39
Per maintainer feedback: minimize diff by removing column alignment
whitespace changes. Only the OriginalUnix field addition is needed.
Add spacing to align CreatedUnix with other Action struct fields.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Use single tabs instead of alignment spaces as per gofmt standard.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Use tab + spaces for alignment as per upstream format.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@silverwind
Copy link
Copy Markdown
Member

Were Copilot's comments addressed? If so, resolve them.

@fjamesprice
Copy link
Copy Markdown
Author

Yes, all 5 Copilot review comments have been addressed by the auxiliary table rewrite:

  1. Heatmap filtering - Now uses (commit_timestamp > ? OR (commit_timestamp IS NULL AND created_unix > ?)) which respects both timestamps
  2. Test coverage - Added action_commit_date.yml fixture and updated unit/integration tests
  3. Nondeterministic ordering (PushCommits) - No longer splits actions by date, creates single action per push
  4. Code duplication - Both functions now call shared InsertActionCommitDates() helper
  5. Nondeterministic ordering (SyncPushCommits) - Same fix as Use proper url for libravatar dep #3

All Copilot's comments applied to the old split-actions approach that was replaced. I'll resolve those review threads since they're no longer relevant to the current implementation.

@silverwind
Copy link
Copy Markdown
Member

silverwind commented Feb 14, 2026

Note: This review was written by @silverwind using Claude (claude-opus-4-6).

Review

The auxiliary table approach is architecturally sound, but there are several issues that need addressing.

Critical Bug: action.ID after NotifyWatchers is wrong

In services/feed/notifier.go, after NotifyWatchers returns, the code does:

if action.ID > 0 && len(commits.Commits) > 0 {
    // ...
    activities_model.InsertActionCommitDates(ctx, action.ID, commitDates)
}

But NotifyWatchersnotifyWatchers (services/feed/feed.go:37-91) reuses the same *Action struct pointer across multiple db.Insert calls. It inserts for the actor first (setting act.ID), then sets act.ID = 0 and re-inserts for the org owner, then again for each watcher. After NotifyWatchers returns, action.ID is the ID of the last inserted row (the last watcher's action), not the actor's own action.

This means InsertActionCommitDates links commit dates to a watcher's action record instead of the actor's. When the heatmap queries the pusher's contributions (filtering by act_user_id = pusher and user_id = pusher), the LEFT JOIN on action_commit_date finds no rows for the actor's action record, and it falls back to created_unix — completely defeating the purpose of this PR.

This only works correctly when the actor has no other watchers on the repo.

Fix: Either capture the actor's action ID before NotifyWatchers mutates it, or insert commit dates for ALL action IDs created by a push (which means modifying NotifyWatchers to return the list of created IDs), or insert the commit dates inside the NotifyWatchers transaction itself.

Bug: InsertActionCommitDates is outside the transaction

NotifyWatchers wraps all action inserts in db.WithTx. But InsertActionCommitDates is called after NotifyWatchers returns, outside the transaction. If the commit dates insertion fails, you'll have action records without corresponding commit date entries, and no way to know they're missing. The commit dates insertion should happen inside the same transaction.

Issue: OriginalUnix field on Action is vestigial

The OriginalUnix field on the Action struct and the v326 migration to add the original_unix column appear to be leftover from the first iteration. The heatmap query uses action_commit_date.commit_timestamp, not action.original_unix. The field is written to in PushCommits and SyncPushCommits but never read anywhere in the heatmap query path. This is dead code that adds an unnecessary indexed column to every row.

Either remove it and migration v326, or document what it's intended for.

Issue: DeleteOldActions uses backtick-quoted table name in subquery

e.Where("action_id IN (SELECT id FROM `action` WHERE created_unix < ?)", cutoff).Delete(&ActionCommitDate{})

The backtick quoting (`action`) is MySQL-specific. PostgreSQL uses "action", MSSQL uses [action]. While the tests pass on all backends (SQLite and PostgreSQL may tolerate backticks in some contexts), this is fragile. Use xorm's builder or a table name that doesn't need quoting. Actually, "action" doesn't need quoting at all — just remove the backticks.

Issue: Migration v328 backfill doesn't validate timestamps

The backfill migration parses JSON from action.Content and inserts commit.Timestamp.Unix(). If commits have zero-value time.Time (which would marshal as "0001-01-01T00:00:00Z" and produce Unix timestamp -62135596800), these will create action_commit_date rows with nonsensical timestamps. These would then appear in heatmap queries as ancient contributions. The migration should skip commits with zero or negative timestamps.

Issue: Migration v328 could be very slow on large instances

The backfill iterates over all push actions ever created and inserts a commit date row for every commit in each push. On a large Gitea instance with millions of push actions, this could create hundreds of millions of rows and take a very long time. The migration should at minimum:

  • Only backfill actions within the heatmap window (~373 days) since older ones won't be displayed
  • Consider adding a progress indicator or documentation about expected duration

Minor: DeleteIssueActions complexity increase

The cleanup logic in DeleteIssueActions grew significantly (~35 new lines) to handle the auxiliary table. The pattern of querying action IDs then deleting commit dates then deleting actions is repeated twice. Consider a helper function, or use ON DELETE CASCADE foreign key constraints (if xorm supports them) to simplify cleanup.

Minor: Code duplication in PushCommits and SyncPushCommits

The commit dates insertion logic (building the commitDates slice and calling InsertActionCommitDates) is copy-pasted between PushCommits and SyncPushCommits. This should be extracted into a helper.

Minor: Copyright year inconsistency

v326.go has Copyright 2025, while v327.go and v328.go have Copyright 2026. The current year is 2026.

Summary

The auxiliary table approach is the right architecture. The main blocker is the action.ID bug — commit dates are linked to the wrong action record when there are watchers, which means the heatmap won't work correctly for the primary use case (the pusher themselves). The transaction boundary issue compounds this. I'd suggest fixing those two issues, removing the unused OriginalUnix field, and addressing the migration concerns before merging.

Addresses all review feedback from silverwind:

**Critical fixes:**
1. Fix action.ID bug - commit dates now inserted inside notifyWatchers
   transaction after actor insert, before ID gets overwritten by
   org/watcher inserts. Previously linked to wrong action record.

2. Fix transaction boundary - commit dates now inserted within the
   same transaction as the action, not after NotifyWatchers returns.

3. Remove vestigial OriginalUnix field and v326 migration - field
   was never used by heatmap query. Cleaned up Action struct,
   fixture, and renumbered migrations (v327->v326, v328->v327).

**Other improvements:**
4. Remove MySQL-specific backticks from DeleteOldActions subquery
   (action table doesn't need quoting).

5. Migration v327 now only backfills actions within heatmap window
   (373 days) to avoid processing millions of old actions on large
   instances.

6. Migration v327 now skips commits with zero/negative timestamps
   to prevent nonsensical contributions in heatmap.

7. Fix copyright year consistency (2026) in v326.go.

8. Code deduplication - PushCommits and SyncPushCommits now populate
   Action.CommitDates field; insertion logic moved to shared
   notifyWatchers function.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@silverwind
Copy link
Copy Markdown
Member

Migration v327 now only backfills actions within heatmap window
(373 days) to avoid processing millions of old actions on large
instances.

Keep processing all rows, the window might change in the future.

Per silverwind feedback: process all actions in case heatmap window
changes in the future. Migration is a one-time operation, so better
to have complete data.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@fjamesprice
Copy link
Copy Markdown
Author

Good point - removed the cutoff filter. Migration now processes all actions to ensure complete data in case the heatmap window changes in the future (commit f299e1d).

@silverwind
Copy link
Copy Markdown
Member

Note: This review was written by @silverwind using Claude (claude-opus-4-6).

Re-review

All critical issues from my previous review have been addressed. The commit dates insertion is now correctly placed inside notifyWatchers (right after the actor's action insert, before the ID gets overwritten by org/watcher inserts), within the same transaction. The vestigial OriginalUnix field and its migration are removed. The backtick-quoted table name is fixed. Timestamp validation was added to the migration.

Remaining minor items

1. Code duplication in PushCommits and SyncPushCommits

The CommitDates slice construction is copy-pasted between both functions in notifier.go. A small helper would clean this up:

func buildCommitDates(commits []*repository.PushCommit) []struct {
    Sha1      string
    Timestamp timeutil.TimeStamp
} {
    dates := make([]struct {
        Sha1      string
        Timestamp timeutil.TimeStamp
    }, len(commits))
    for i, c := range commits {
        dates[i].Sha1 = c.Sha1
        dates[i].Timestamp = timeutil.TimeStamp(c.Timestamp.Unix())
    }
    return dates
}

2. Zero-timestamp filtering missing in live code path

The migration correctly skips commits with timestamp <= 0, but the live code path in notifier.go doesn't filter them. Commits with zero-value time.Time would produce timestamp -62135596800. These won't appear in the heatmap (the time window filter excludes them), but they'll waste space in action_commit_date. Minor, but worth matching the migration's behavior for consistency.

3. count(user_id)count(*)

The heatmap query changed from count(user_id) to count(*). Since user_id is never NULL, the result is identical. Just noting it for the record — no action needed.

Semantics change worth noting

With this PR, the heatmap now counts individual commits rather than pushes. A push with 5 commits = 5 contributions instead of 1. This matches GitHub's behavior and is the right choice, but it's a visible change for existing users — contribution numbers will increase after the migration backfills the data.

Verdict

The architecture is correct and the critical bugs are resolved. The remaining items are minor cleanup. Looks good to me from a code review perspective, but I'd suggest a maintainer also manually test the heatmap rendering with multi-day pushes before merging.

Extract buildCommitDates() helper to eliminate copy-pasted logic between
PushCommits and SyncPushCommits. Add zero-timestamp filtering in the
live code path to match the migration's behavior, preventing commits
with zero-value time.Time from being stored in action_commit_date.

Also introduces CommitDateEntry named type to replace anonymous struct.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@fjamesprice
Copy link
Copy Markdown
Author

Addressed both minor items in 23f6f62:

  1. Code duplication — Extracted buildCommitDates() helper in notifier.go and introduced a CommitDateEntry named type in action_commit_date.go to replace the anonymous struct. Both PushCommits and SyncPushCommits now use the shared helper.

  2. Zero-timestamp filtering — The helper filters out commits with timestamp <= 0, matching the migration's behavior. Prevents zero-value time.Time commits from wasting space in action_commit_date.

Re count(*) vs count(user_id) — agreed, no change needed since user_id is never NULL.

@silverwind
Copy link
Copy Markdown
Member

Note: This review was written by @silverwind using Claude (claude-opus-4-6).

Re-review (round 3)

All critical issues from my previous reviews have been addressed. The architecture is correct:

  • Commit dates are inserted inside notifyWatchers, right after the actor's action insert and before the ID gets overwritten by org/watcher inserts
  • The insertion is within the db.WithTx transaction
  • The vestigial OriginalUnix field and its migration are removed
  • The backtick-quoted table name is fixed
  • Zero-timestamp filtering is in both the migration and the live code path
  • The buildCommitDates helper eliminates code duplication

Minor remaining items

1. DeleteIssueActions cleanup is unnecessary

The action_commit_date cleanup added to DeleteIssueActions (~35 new lines) handles action types that never have commit date records — ActionCreateIssue, ActionCreatePullRequest, and comment actions. These are not push actions and will never have entries in action_commit_date. The extra queries always return 0 rows. While it's defensive coding, it adds significant complexity to an already complex function. Consider removing it, or at least adding a comment explaining it's defensive.

2. Migration JSON tags are misleading

The migration's PushCommit struct uses lowercase JSON tags:

type PushCommit struct {
    Sha1      string    `json:"sha1"`
    Timestamp time.Time `json:"timestamp"`
}

But the actual PushCommit struct (modules/repository/commits.go:25) has no JSON tags, so the stored JSON keys are CamelCase ("Sha1", "Timestamp"). This works only because Go's json.Unmarshal does case-insensitive matching — but someone reading the migration code would reasonably assume the stored JSON uses lowercase keys. Remove the JSON tags from the migration struct or match the actual CamelCase field names to avoid confusion.

3. Behavioral change should be documented

The heatmap now counts individual commits rather than pushes. A push with 5 commits = 5 contributions instead of 1. The migration backfill retroactively applies this to all existing push actions. This is the right behavior (matches GitHub), but it's a visible change — users will see their contribution numbers increase. Worth noting in the PR description and/or changelog.

Verdict

The critical bugs are resolved, the architecture is sound. The remaining items are cosmetic. Looks good from a code review perspective.

Copy link
Copy Markdown
Contributor

@wxiaoguang wxiaoguang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keep things simple and just set created_unix to the commit date?

Then all the mgaic tricks can be removed.

Anything worse by setting the commit date to created_unix?

@fjamesprice
Copy link
Copy Markdown
Author

@wxiaoguang Good question — I considered that, but the problem is that a single push action can contain multiple commits with different author dates across different days. For example:

  • A developer works on a branch over Monday, Tuesday, Wednesday (3 commits)
  • They push all 3 commits on Thursday

If we set created_unix to a commit date, we have a few options, all with downsides:

  1. Set created_unix to one commit's date (e.g. the latest) — the other commits' dates are lost from the heatmap. A 3-day contribution streak would appear as 1 day.

  2. Create multiple action rows per push (one per commit) — this would make the activity feed show N entries per push instead of 1, which is a visible behavior change. A push of 50 commits would flood the feed with 50 identical "pushed to branch" entries.

  3. Only count the push date (current behavior before this PR) — this is what the heatmap does today, and it's the problem we're trying to fix. Contributions show up on the day you pushed, not the day you actually wrote the code.

The auxiliary table avoids all three issues: one action row per push (feed stays clean), but all individual commit timestamps are stored for accurate heatmap rendering. This matches how GitHub handles it — they count individual commits on their actual author dates, not the push date.

The "magic" is really just one helper function and one INSERT inside the existing transaction. The table itself is a simple (action_id, sha1, timestamp) mapping.

That said, if you'd prefer a different approach, happy to discuss!

@wxiaoguang
Copy link
Copy Markdown
Contributor

wxiaoguang commented Feb 14, 2026

Is it possible to fully decouple from the action table?

The action can be very large , millions of rows, and there were/are already a lot performance problems on it.

Two choices:

  1. (current) Keep using action, and add a new table for commit dates, make them work together
    • It just make the problem more complicated and difficult to optimize
  2. (IMHO) Introduce a new commit dates table and decouple from actions
    • No need to worry about performance regression, and the logic could be simpler

TBH, I haven't really looked into details, that's just my intuition. If nothing blocks choice (2), I would prefer to go with it.

@fjamesprice
Copy link
Copy Markdown
Author

@wxiaoguang That's a great point — decoupling makes a lot of sense. The action table's scale and existing performance concerns are good reasons to avoid adding more coupling to it.

With choice (2), the table would be something like:

CREATE TABLE user_heatmap (
    id          BIGINT PRIMARY KEY AUTO_INCREMENT,
    user_id     BIGINT NOT NULL,
    repo_id     BIGINT NOT NULL,
    commit_sha  VARCHAR(64),
    timestamp   BIGINT NOT NULL,  -- commit author date

    INDEX idx_user_heatmap_user_timestamp (user_id, timestamp),
    INDEX idx_user_heatmap_repo (repo_id)
);

Benefits:

  • No dependency on action table at all — eliminates the action.ID plumbing, the CommitDates field on Action, and the insertion inside notifyWatchers
  • Simpler query: SELECT DATE(timestamp), COUNT(*) FROM user_heatmap WHERE user_id = ? AND timestamp > ? GROUP BY DATE(timestamp) — no joins needed
  • Independent cleanup — can have its own retention policy without touching action cleanup logic
  • Insert anywhere — can insert directly in the push notifier without needing to be inside the action transaction

I'll rework the PR to this approach. The main changes would be:

  1. Replace action_commit_date with user_heatmap (decoupled schema)
  2. Remove CommitDates from Action struct and notifyWatchers insertion
  3. Insert directly in PushCommits/SyncPushCommits (or a shared helper)
  4. Rewrite heatmap query to read from new table
  5. Update migration to backfill from action content → user_heatmap

Nothing blocks this approach that I can see. Will push the rework shortly.

@wxiaoguang
Copy link
Copy Markdown
Contributor

Or maybe some SQLs can be optimized to avoid use sub-queries.

// _, err = e.Where("action_id IN (SELECT id FROM action WHERE created_unix < ?)", cutoff).Delete(&ActionCommitDate{})

Add the same created_unix to ActionCommitDate table, and delete it without sub-query

@wxiaoguang
Copy link
Copy Markdown
Contributor

wxiaoguang commented Feb 14, 2026

I'll rework the PR to this approach. The main changes would be:

TBH I think we need to think about it carefully before starting changing the code, I am not sure whether it is really feasible. IIRC there are many details with action table.

@fjamesprice
Copy link
Copy Markdown
Author

@wxiaoguang That's fair — I'll hold off on pushing the rework until we're aligned on the design.

The decoupled table I have locally is user_heatmap_commit with (user_id, repo_id, commit_sha1, commit_timestamp) — no FK to action at all. The heatmap query reads directly from it with just a user_id + commit_timestamp filter and a repo visibility check via AccessibleRepoIDsQuery. No joins with action.

A few design questions worth settling first:

  1. Cleanup strategy — With the decoupled table, we don't need subqueries against action for deletion. We could either:

    • Add created_unix (push date) to the table as you suggested, enabling direct time-based cleanup
    • Use commit_timestamp itself for retention (delete commits older than N days)
    • Rely on repo_id for cascade deletion when repos are removed

    Which approach do you prefer?

  2. Scope of heatmap data — Currently the heatmap query uses ActivityQueryCondition which checks user visibility, repo access, team membership, org membership, etc. The decoupled version simplifies to just user_id + AccessibleRepoIDsQuery(doer). Are there edge cases in the action-based visibility logic that we'd lose?

  3. Mirror syncs — Should ActionMirrorSyncPush (op_type 16) also populate the heatmap table? Currently I include it, but the commits are authored by external users — the act_user_id is the repo owner, not the commit author.

Happy to iterate on the design before writing more code.

@wxiaoguang
Copy link
Copy Markdown
Contributor

I can't suggest more at the moment since I haven't really looked into the details, and I don't need this feature.

I just occasionally review some PRs to make sure the design is overall right and won't cause maintainability/performance/regression problems.

@fjamesprice
Copy link
Copy Markdown
Author

@wxiaoguang Understood, thanks for the high-level guidance — it's been very helpful.

I'll proceed with the decoupled approach and make sensible defaults for the open questions:

  1. Cleanup — The table uses commit_timestamp directly, so time-based retention is straightforward without subqueries. Repo deletion can cascade via repo_id.
  2. Visibility — Using AccessibleRepoIDsQuery(doer) for repo-level access control, which covers private/public repo visibility.
  3. Mirror syncs — Including them for now (consistent with current heatmap behavior), can be revisited.

Pushing the rework now.

fjamesprice and others added 3 commits February 14, 2026 05:24
Replace action-coupled `action_commit_date` table with fully decoupled
`user_heatmap_commit` table keyed by (user_id, repo_id) instead of
action_id. This addresses maintainer feedback about the action table's
scale (millions of rows) and existing performance concerns.

Changes:
- New `UserHeatmapCommit` model with user_id, repo_id, commit_sha1,
  commit_timestamp — no foreign key to action table
- Heatmap query reads directly from user_heatmap_commit with
  AccessibleRepoIDsQuery for visibility — no JOIN with action
- Commit timestamps inserted directly in PushCommits/SyncPushCommits
  notifiers, outside the action transaction
- Remove CommitDates field from Action struct, remove insertion from
  notifyWatchers, remove ActionCommitDate cleanup from DeleteOldActions
  and DeleteIssueActions
- Update migrations and fixtures for new table schema
- Zero-timestamp filtering in both live path and migration backfill

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove trailing blank line in Action struct (gofmt)
- Add complete test fixture data for user_heatmap_commit:
  user 2 (3 commits across 2 buckets), user 16 (1 commit for
  collaborator test), user 10 (3 commits across 2 buckets)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lafriks
Copy link
Copy Markdown
Member

lafriks commented Feb 14, 2026

This has been talked about before but there are many nuances about this and why imho also GitHub uses push dates to calculate this as there are force pushes or wrong merges that could result in a great number of commits that would need to be analysed. Also one can push merge commit that other users have actually committed etc etc

Personally I think the current approach is the best and it's not that useful feature to add such complexity to the code and go into this rabbit hole

@wxiaoguang
Copy link
Copy Markdown
Contributor

This has been talked about before but there are many nuances about this and why imho also GitHub uses push dates to calculate this as there are force pushes or wrong merges that could result in a great number of commits that would need to be analysed. Also one can push merge commit that other users have actually committed etc etc

Personally I think the current approach is the best and it's not that useful feature to add such complexity to the code and go into this rabbit hole

I agree. Since there were already conclusions, it's better to comment & document the design, then it doesn't need to waste time on the unaccepted attempts.

@silverwind
Copy link
Copy Markdown
Member

silverwind commented Feb 14, 2026

In my instance, I have a number of repos where commits are replicated from one source repo to many others via unsquashed merges like this:

git fetch --no-tags --no-prune upstream
git merge --allow-unrelated-histories upstream/master

Do I understand correctly that such "replayed" commits would then create heatmap data? That would be very bad imho, because then I would have thousands of daily commits because of all the replayed commits.

@fjamesprice
Copy link
Copy Markdown
Author

Thanks for the thoughtful feedback from @lafriks, @wxiaoguang, and @silverwind.

The concerns are valid — there are real edge cases that make commit-date heatmaps problematic:

  • Force pushes can replay large numbers of old commits
  • Merge commits attribute others' work to the pusher
  • Replicated repos (like silverwind's unsquashed merge workflow) would massively inflate contribution counts
  • GitHub uses push dates deliberately for these same reasons

The complexity-to-benefit ratio isn't there. Closing this PR.

It would be worth documenting this as a deliberate design choice on #14051 so future contributors don't go down the same path.

@wxiaoguang
Copy link
Copy Markdown
Contributor

I agree. Since there were already conclusions, it's better to comment & document the design, then it doesn't need to waste time on the unaccepted attempts.

Add comment for the design of "user activity time" #37195

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Heatmap shows commits on push date instead of commit author date

6 participants