Files
projax/docs/plans/aggregator-refactor.md

10 KiB
Raw Blame History

aggregator-refactor — Phase 5a

Task: t-projax-5a-aggregator Status: in progress Date: 2026-05-21

Why

Five collectXxx functions in web/ all open-code the same fan-out shape: take []*store.Item → look up each item's item_links of a given ref_type → fan out across the matching links with a 4-worker pool → reduce to typed rows.

Concretely today:

  • web/dashboard.go:447 collectTasks (CalDAV VTODOs)
  • web/dashboard.go:585 collectIssues (Gitea issues)
  • web/dashboard.go:832 collectEvents (CalDAV VEVENTs)
  • web/timeline.go:539 collectTimelineTodos (CalDAV VTODOs — overlaps collectTasks)
  • web/timeline.go:624 collectTimelineEvents (CalDAV VEVENTs — overlaps collectEvents)

Plus mcp/tools.go:19 declares TimelineBuilder so the MCP layer can call *web.Server.BuildTimelinePayloadFromArgs. That points the dependency arrow mcp → web, which is the wrong way round; both should depend on a shared aggregator.

What ships

A new package internal/aggregate/ that concentrates fan-out across linked items, plus the lifted day-grouping helpers from web/timeline.go. After the four slices land:

  • Dashboard's three collect functions are 510 line shims over the aggregator.
  • Timeline's two collect functions are 510 line shims plus a call to aggregate.BuildTimelineDays.
  • mcp.TimelineBuilder is gone; RegisterProjaxTools takes a *aggregate.Aggregator directly.
  • Worker-pool body, link-fanout, per-source error logging, day grouping, sort-within-day, sticky-pill logic, far-future fade all live in one package instead of being duplicated across three.

No behaviour change. All existing tests stay green at every slice boundary.

Design (settled in the task brief)

Package layout

internal/aggregate/
  aggregator.go      — Aggregator struct, constructor, the five methods + All
  rows.go            — Row types + TimelineRow sum + Result envelope
  timeline_days.go   — BuildTimelineDays + sort/label/duration helpers
  aggregator_test.go — fan-out + per-source error tests (stub clients)
  timeline_days_test.go — grouping/sort/sticky/fade tests

Dependencies (interfaces, not concrete clients)

type CalDAVClient interface {
    ListTodos(ctx, calendarURL) ([]caldav.Todo, error)
    ListEvents(ctx, calendarURL, opts) ([]caldav.Event, error)
}

type GiteaClient interface {
    ListIssues(ctx, owner, repo, opts) ([]gitea.Issue, error)
}

type LinkLister interface {
    LinksByType(ctx, itemID, refType) ([]*store.ItemLink, error)
    DatedLinksRange(ctx, from, to) ([]*store.ItemLinkWithItem, error)
    ItemsCreatedInRange(ctx, from, to) ([]*store.Item, error)
}

type IssueCache interface {
    Get(key) ([]gitea.Issue, bool)
    Set(key, []gitea.Issue)
}

*caldav.Client, *gitea.Client, *store.Store already match by method set. The existing web.issueCache gains exported Get/Set aliases (it already has lower-case versions) so it satisfies IssueCache unchanged otherwise.

Methods

  • Todos(ctx, items, Window) []TodoRow — empty Window = no narrowing (dashboard pattern); non-zero Window narrows by Due for open todos and LastModified (Due fallback) for done/cancelled (timeline pattern).
  • Events(ctx, items, Window) []EventRow — Window required (CalDAV REPORT needs a time-range filter).
  • Issues(ctx, items) []IssueRow — no window; upstream updated_at ordering carries the recency signal.
  • Docs(ctx, items, Window) []DocRow — wraps DatedLinksRange, filters to items in the caller's allow-set.
  • Creations(ctx, items, Window) []CreationRow — wraps ItemsCreatedInRange, filters to items in the allow-set.
  • All(ctx, items, AllOpts) Result — convenience for MCP timeline.

Row types

All row types embed their primitive (caldav.Todo, caldav.Event, gitea.Issue) so html/template's existing .Todo.UID / .Event.Summary field access keeps working via Go field promotion. Template diffs in Slice B/C stay minimal.

type TodoRow struct {
    Item        *store.Item
    CalendarURL string
    caldav.Todo
}

type EventRow struct {
    Item *store.Item
    caldav.Event
}

type IssueRow struct {
    Item *store.Item
    Repo string
    gitea.Issue
}

type DocRow      struct { Item *store.Item; Link *store.ItemLink }
type CreationRow struct { Item *store.Item }

TimelineRow

Pointer-tagged sum type lifted into the package. Templates and the sort/group helpers consume it.

type TimelineRow struct {
    Date     time.Time
    Kind     string // "todo" | "event" | "doc" | "creation"
    Item     *store.Item
    ItemPath string

    Todo     *TodoRow
    Event    *EventRow
    Doc      *DocRow
    Creation *CreationRow

    // Display-side fields the template references directly. Kept flat so
    // the existing template syntax doesn't change.
    CalendarURL  string
    StartLabel   string
    DurationHint string
    Link         *store.ItemLink
    PER          string

    FarFuture bool
}

Day grouping

aggregate.BuildTimelineDays(rows []TimelineRow, opts BuildOpts) []TimelineDay takes pre-built rows, groups by YYYY-MM-DD, sorts each day's rows (timed events → all-day → todos → docs → creations), and applies sticky-pill markers for today/tomorrow. BuildOpts carries Now, Order ("asc"|"desc"), optional TodayKey/TomorrowKey overrides for test determinism.

Per-source error handling

Preserved from today: each per-calendar / per-repo failure is logged at WARN and the affected job is dropped. The remaining rows are returned. Banner-surfacing for unreachable upstreams is out of scope for this refactor (parked under §"Future work" below).

Worker pool

Per-call pool with 4 workers — same as today across all five functions. Created and torn down per aggregation call. No shared instance.

Slicing

Slice What lands Verification
A docs/plans/aggregator-refactor.md (this file) + internal/aggregate/ package + unit tests. No web/mcp wiring yet. go build ./... + go test ./internal/aggregate/... + strings <binary> | grep -c internal/aggregate ≥ 1
B web/timeline.go consumes the aggregator. Server.aggregator wired in web.New. Templates updated where the row type changes. go test ./web/... -run Timeline green unmodified, /timeline renders, SHA on /healthz matches push.
C web/dashboard.go consumes the aggregator. Dashboard-specific bucketing/cap stays put. go test ./web/... -run Dashboard green unmodified, /dashboard renders.
D mcp.TimelineBuilder deleted. RegisterProjaxTools takes *aggregate.Aggregator. cmd/projax/main.go updated. BuildTimelinePayloadFromArgs removed or inlined. go test ./mcp/... -run Timeline green, live /mcp/rpc timeline returns valid payload.

Each slice ships behind its own commit + merge + deploy + verification triple (per CLAUDE.md: SHA on /healthz matches git rev-parse HEAD).

Test plan

Unit tests in internal/aggregate/ use in-memory stub implementations of CalDAVClient, GiteaClient, LinkLister, IssueCache. The tests cover:

  • Empty items slice — every method returns an empty slice without touching the network stubs.
  • Items with no links of the relevant ref_type — same.
  • Items with one matching link — single fetch.
  • Items with multiple matching links across multiple items — fan-out hits each (verified by stub call counter).
  • Per-calendar error from the CalDAV stub — logged, surviving rows returned.
  • Per-repo error from the Gitea stub — same.
  • Issue cache hit path — second call doesn't hit the stub when the cache returns a value.
  • BuildTimelineDays ordering — desc default; asc when requested; day group counts; sticky pill for today/tomorrow; far-future fade.
  • BuildTimelineDays within-day sort — timed events before all-day, todos after events, docs after todos, creations last; ties broken by Summary / PER / Item.Slug.

Integration coverage stays in web/... and mcp/... and continues to exercise the real wiring through the Slice B/C/D ports.

MCP filter-parity note (post-slice-D, 2026-05-22)

Slice D moved MCP item resolution from web.TreeFilter to store.ListByFilters. The dimensions that round-trip identically:

  • tags — AND-match, unchanged.
  • q — substring match, unchanged.
  • kinds — unchanged (drives aggregate.AllOpts.Kinds).
  • from/to/order — unchanged.
  • has — explicit in-memory narrow against store.LinksByRefType (caldav-list / gitea-repo only).
  • include_excluded — explicit in-memory filter against each item's timeline_exclude array.

Narrowed dimensions in the MCP path (vs. web TreeFilter):

  • status — first value wins (single-value at the store layer). TreeFilter accepts multiple. Use case is rare — most calls default to ["active"].
  • mgmt — AND-match (item must carry every named mode). TreeFilter used OR semantics including a synthetic "unmanaged" matcher. Reachable workaround: omit mgmt and filter the returned items client-side.

Not a regression worth fixing in 5a — every documented MCP call from m and from otto-PWA uses tag + default status. If the gap bites, the fix is to either teach store.ListByFilters to accept multi-value status / OR-mgmt, or to lift TreeFilter into a neutral package and call it from both web/ and mcp/.

Future work (out of scope for 5a)

  • Banner-surfacing for upstream failures (calendar unreachable, repo renamed) — today's silent log+continue stays. Filing as a §8 design follow-up.
  • Shared worker-pool instance across calls — not warranted at m's scale; per-call pool is fine.
  • Dashboard cache shape refactor — that's Phase 5b (candidate 3).
  • Item-write validation module — Phase 5c (candidate 2).

References

  • Task t-projax-5a-aggregator
  • Existing collect functions: web/dashboard.go:447,585,832, web/timeline.go:539,624
  • Wrong-way layering: mcp/tools.go:19 (TimelineBuilder)
  • CLAUDE.md § "Post-deploy verification (mandatory)"