Skip to content

feat: visual polish, Colab demo, CI workflow, community templates, and PCACalibrator implementation#5

Merged
OnlyTerp merged 3 commits intomasterfrom
devin/1776399930-visual-polish-colab-ci
Apr 17, 2026
Merged

feat: visual polish, Colab demo, CI workflow, community templates, and PCACalibrator implementation#5
OnlyTerp merged 3 commits intomasterfrom
devin/1776399930-visual-polish-colab-ci

Conversation

@OnlyTerp
Copy link
Copy Markdown
Owner

@OnlyTerp OnlyTerp commented Apr 17, 2026

What does this PR do?

Adds visual assets, community infrastructure, and CI to make the repo more professional and approachable. Also implements the missing PCACalibrator class and fixes a validation gap in apply_rope, both of which were needed to get CI passing.

Related Issues

Follows up on #4 (README overhaul) — this PR adds the visual/interactive layer on top of that content.

Changes

  • Hero banner (assets/banner.svg) — dark-themed SVG with pipeline visualization and key stats (6-9x, 0.996 cosine, 2M+ context)
  • 4 benchmark charts (assets/*.png) — compression vs quality scatter, context window bar chart, prefill throughput, pipeline overview
  • Colab demo notebook (notebooks/kvtc_demo.ipynb) — interactive walkthrough of all 3 pipeline stages with matplotlib visualizations
  • GitHub Actions CI (.github/workflows/test.yml) — runs pytest src/test_kvtc.py on Python 3.10/3.11/3.12 matrix with CPU-only torch
  • Issue/PR templates (.github/ISSUE_TEMPLATE/, .github/pull_request_template.md)
  • README updates:
    • Replaced ASCII pipeline diagrams with Mermaid (graph LR, flowchart TB, quadrantChart)
    • Embedded benchmark chart images in new "Visual Benchmarks" section
    • Added CI badge and "Open in Colab" badge to header
    • Added "Interactive Demo" section linking to the Colab notebook
  • PCACalibrator class (src/pca.py) — implements the missing calibrator that calibrate.py, calibrate_vllm.py, and test_kvtc.py all import but was never defined. Collects KV samples, computes PCA bases via SVD, and runs DP bit allocation.
  • apply_rope validation (src/pca.py) — added ValueError for odd head_dim (RoPE requires even dimensions)

Updates since last revision

CI initially failed for two reasons:

  1. ImportError: cannot import name 'PCACalibrator' from 'src.pca' — The class was imported across multiple files but never defined. Implemented it with collect() / compute() methods matching all existing call sites.
  2. test_single_token_edge_case crash — When SVD returns fewer components than dimensions (rank-deficient case, e.g. single token), vh shape is [k, dim] with k < dim. Fixed by zero-padding eigenvectors to [dim, dim].
  3. test_rope_requires_even_head_dim crash — Test expected ValueError for odd head_dim=7 but got RuntimeError from shape mismatch. Added explicit validation at the top of apply_rope.

All 38 tests now pass on Python 3.10, 3.11, and 3.12.

Important review items

  1. ⚠️ PCACalibrator eigenvector convention — SVD returns vh with rows as eigenvectors. pca_transform(data, eigvecs) computes data @ eigvecs.T. The stored eigenvectors field uses rows-as-eigenvectors (i.e., vh directly). Verify this matches what pipeline.py expects at lines 67 and 146.
  2. ⚠️ Rank-deficient SVD padding — When k < dim, zero rows are appended to the eigenvector matrix. These zero-eigenvalue components get 0 bits from DP allocation, so they should be harmless — but worth verifying that compute_quant_params and the decompression path handle the zero rows correctly.
  3. Colab notebook import paths — The notebook does pip install git+... then imports from src.pca import .... Verify these imports resolve correctly after pip install (depends on how setup.py exposes packages).
  4. CI workflow — Uses CPU-only torch (--index-url .../whl/cpu). Confirm that the test suite runs without GPU.
  5. Mermaid quadrantChart — This is a newer Mermaid diagram type. Check that GitHub renders it correctly in the Landscape section.
  6. Banner SVG — Uses system-ui and monospace fonts. Visual check recommended on GitHub's dark and light themes.

Testing

  • Unit tests pass (pytest src/test_kvtc.py -v) — 38 pass, 0 fail
  • CI workflow triggers and passes on push (Python 3.10, 3.11, 3.12)
  • Colab notebook runs end-to-end (install → import → visualizations)
  • Mermaid diagrams render correctly on GitHub
  • Banner SVG displays properly in both dark/light mode
  • All 4 chart PNGs display in README

Checklist

  • Code follows existing style conventions
  • Documentation updated
  • No new dependencies added without justification

Link to Devin session: https://app.devin.ai/sessions/e367c15ff93343faa5e821eb3babf465
Requested by: @OnlyTerp


Open with Devin

…ates

- Hero banner SVG with pipeline visualization
- 4 benchmark charts (compression vs quality, context window, throughput, pipeline)
- Interactive Colab demo notebook (3-stage pipeline walkthrough)
- GitHub Actions CI (Python 3.10/3.11/3.12 matrix)
- Issue templates (bug report, feature request) and PR template
- README: Mermaid diagrams, embedded charts, CI + Colab badges
- README: Interactive Demo section with Colab link

Co-Authored-By: Rob <onerobby@gmail.com>
@devin-ai-integration
Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 potential issues.

View 3 additional findings in Devin Review.

Open in Devin Review

Comment thread notebooks/kvtc_demo.ipynb
Comment on lines +92 to +93
"from src.pca import compute_pca_basis, pca_transform, pca_inverse\n",
"from src.quantize import dp_allocate_bits, quantize_uniform, dequantize_uniform\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Notebook imports non-existent function names, causing ImportError

The notebook imports dp_allocate_bits, quantize_uniform, dequantize_uniform from src.quantize and compute_pca_basis from src.pca, but none of these names exist in the actual modules. The real names in src/quantize.py are dp_bit_allocation, uniform_quantize, uniform_dequantize, and compute_pca_basis does not exist at all in src/pca.py. This causes an ImportError that prevents all subsequent notebook cells (Stage 1, Stage 2, Stage 3) from running.

Actual function names in src/quantize.py
  • dp_bit_allocation (not dp_allocate_bits)
  • uniform_quantize (not quantize_uniform)
  • uniform_dequantize (not dequantize_uniform)

compute_pca_basis doesn't exist anywhere in the codebase.

Suggested change
"from src.pca import compute_pca_basis, pca_transform, pca_inverse\n",
"from src.quantize import dp_allocate_bits, quantize_uniform, dequantize_uniform\n",
"from src.pca import pca_transform, pca_inverse\n",
"from src.quantize import dp_bit_allocation as dp_allocate_bits, uniform_quantize, uniform_dequantize\n",
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread notebooks/kvtc_demo.ipynb
Comment on lines +194 to +195
" ix, s, z = quantize_uniform(col, int(bits[j].item()))\n",
" recon_pca[:, j] = dequantize_uniform(ix, s, z, int(bits[j].item()))\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Notebook calls quantize/dequantize with wrong signatures and expects wrong return types

Even if the import names were fixed, the notebook calls quantize_uniform(col, int(bits[j].item())) expecting it to return a tuple (ix, s, z), and then calls dequantize_uniform(ix, s, z, int(bits[j].item())). However, the actual uniform_quantize at src/quantize.py:59 takes 4 arguments (values, n_bits, scale, zero_point) and returns a single tensor of indices. The notebook's Stage 3 compression roundtrip would crash with a TypeError even after fixing the import names. The code needs to use compute_quant_params (src/quantize.py:77) to get scale/zero_point, then pass them to uniform_quantize and uniform_dequantize.

Prompt for agents
The notebook's Stage 3 roundtrip loop calls the quantize/dequantize functions with the wrong API. In src/quantize.py, uniform_quantize(values, n_bits, scale, zero_point) takes 4 args and returns a tensor. uniform_dequantize(indices, n_bits, scale, zero_point) also takes 4 args. The notebook needs to compute scale and zero_point first (either manually from min/max of each column, or by using compute_quant_params from src/quantize.py). The loop at lines 191-195 should be rewritten to: (1) compute min/max for each active column, (2) derive scale = (max-min) / (2^bits - 1) and zero_point = -min/scale, (3) call uniform_quantize(col, bits, scale, zero_point), (4) call uniform_dequantize(indices, bits, scale, zero_point).
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread notebooks/kvtc_demo.ipynb
"device = 'cuda' if torch.cuda.is_available() else 'cpu'\n",
"if device == 'cuda':\n",
" gpu_name = torch.cuda.get_device_name(0)\n",
" gpu_mem = torch.cuda.get_device_properties(0).total_mem / 1e9\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Notebook uses non-existent total_mem attribute instead of total_memory

The notebook accesses torch.cuda.get_device_properties(0).total_mem, but the correct PyTorch attribute is total_memory. This raises an AttributeError on GPU environments (including Colab T4 where this notebook is designed to run). Every other file in the repo correctly uses getattr(props, 'total_memory', None) or getattr(props, 'total_mem', 0) as a safe fallback (e.g., benchmarks/benchmark_v3.py:46), but the notebook doesn't follow this pattern.

Suggested change
" gpu_mem = torch.cuda.get_device_properties(0).total_mem / 1e9\n",
" gpu_mem = torch.cuda.get_device_properties(0).total_memory / 1e9\n",
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration bot and others added 2 commits April 17, 2026 04:39
PCACalibrator was referenced by calibrate.py, calibrate_vllm.py, and
test_kvtc.py but was never defined. This caused an ImportError during
test collection in CI.

The class collects KV cache samples, computes PCA bases via SVD, and
runs DP bit allocation — matching all existing call sites.

Co-Authored-By: Rob <onerobby@gmail.com>
…_dim

- When num_samples < dim, SVD returns fewer components than dimensions.
  Pad eigenvectors to [dim, dim] so pca_transform works correctly.
- Add ValueError for odd head_dim in apply_rope (RoPE requires even dims).

Co-Authored-By: Rob <onerobby@gmail.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 6 additional findings in Devin Review.

Open in Devin Review

Comment thread src/pca.py
Comment on lines +277 to +278
group_start = group_idx * self.head_group_size
head_indices = list(range(group_start, group_start + self.head_group_size))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 head_indices includes out-of-bounds head indices for last group when heads aren't divisible by group size

In PCACalibrator.compute(), head_indices is always built with exactly self.head_group_size entries (src/pca.py:278), but collect() correctly clips the last group to the actual number of heads (src/pca.py:212: group_end = min(group_start + self.head_group_size, heads)). When the number of heads isn't evenly divisible by head_group_size (e.g., 5 heads with group_size=2), the last group's head_indices will be [4, 5] when only head indices [0..4] exist. This produces incorrect metadata in PCAEntry.head_indices. Currently no pipeline code reads this field, but any future consumer (or the vLLM integration) relying on it would get wrong results.

Prompt for agents
In PCACalibrator.compute() at src/pca.py:277-278, head_indices is computed as range(group_start, group_start + self.head_group_size), but this doesn't clip to the actual head count for the last group. The collect() method at line 212 correctly uses min(group_start + self.head_group_size, heads) but the compute() method has no access to the original head count.

To fix this, either:
1. Store the actual group head count per key during collect() (e.g., in a separate dict _group_heads mapping CalibrationKey to int), then use it in compute() to build head_indices correctly.
2. Alternatively, infer the actual group size from the collected data dimensions — since each sample has shape [seq_len * actual_group_heads, dim], and we know the seq_len from positions, we can compute actual_group_heads.

The simplest approach is option 1: add a _group_heads dict, set it in collect(), and use it in compute() to build the correct head_indices range.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@devin-ai-integration devin-ai-integration bot changed the title feat: visual polish, Colab demo, CI workflow, and community templates feat: visual polish, Colab demo, CI workflow, community templates, and PCACalibrator implementation Apr 17, 2026
@OnlyTerp OnlyTerp merged commit 79d2906 into master Apr 17, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant