Neural network background

GitHub Connector

Two-way sync between a GitHub installation and the Fabric graph — PRs, issues, comments, reviews, and push events. Posts an auto-managed PR-summary comment and answers /fabric slash commands using the fabric-indexer code knowledge graph.

What it does

  • Ingests pull requests, issues, issue/PR-review comments, review state, and pushes via GitHub App webhooks.
  • Auto-comments on every opened/reopened/synchronized PR with a code-intel summary anchored by the sentinel <!-- cognisos-fabric:pr-summary -->. Subsequent commits edit that comment in place rather than posting duplicates.
  • Answers slash commands posted as PR/issue comments — /fabric explain <symbol>, /fabric impact <symbol>, /fabric slice <path>, /fabric query <natural language>, /fabric help.
  • Backfills historical PR diffs page-by-page on first install. The cursor is bounded under PG's 100 KB row ceiling and resumable across worker restarts.
  • Emits back issue/PR comments, review comments, labels, and commit statuses via Connector::emit — every outbound write is recorded in github_emitted_events so GitHub's redelivery can be deduped.

Unlike Slack, one install is bound to a single (installation_id, tenant_id) pair: a single GitHub org can be installed into multiple Fabric tenants without conflict because GitHub issues a distinct installation per (App, account) pair.

Prerequisites

  • GitHub org/user admin who can install Apps on the target org.
  • Public-internet-reachable deployment URL for GitHub's webhook POSTs. v1 deliberately does not ship a public install callback — operators run the admin endpoint below after the GitHub App install lands.
  • FABRIC_MASTER_KEY set on the daemon — the installation token is encrypted at rest with this key.
  • FABRIC_ADMIN_ENDPOINTS=1 and FABRIC_ADMIN_API_TOKEN=<random secret> set on the daemon — required by the admin install endpoint that provisions each tenant's install row (see Provisioning the install row below). Without these the route returns 404 / 500 respectively.
  • For the indexer-driven PR-summary / slash-command paths: one fabric-mcp-server Railway service per (install_id, repo_id), reached over HTTP JSON-RPC. The connector resolves daemon URLs by substituting {install} / {repo} into FABRIC_INDEXER_DAEMON_URL_TEMPLATE.

Create the GitHub App

  1. Settings → Developer settings → GitHub Apps → New GitHub App (or use the manifest-flow at https://docs.github.com/en/apps/sharing-github-apps/registering-a-github-app-from-a-manifest).
  2. Paste the contents of github-app-manifest.json — replace your-host.example.com with your deployment hostname.
  3. After creation GitHub gives you three secrets to wire into the daemon:
    • App IDGITHUB_APP_ID
    • Private key (download the .pem) → GITHUB_APP_PRIVATE_KEY_PEM (the full PEM body, including -----BEGIN/-----END lines)
    • Webhook secret (you set this) → GITHUB_WEBHOOK_SECRET
  4. Install the App: visit https://github.com/apps/<your-app-slug>/installations/new and pick the org + repos.

Provisioning the install row (v1)

v1 of the connector does not ship a public install-callback HTTP route, and the manifest above deliberately does not advertise redirect_url/callback_urls. Instead, an authenticated admin endpoint provisions the install. The endpoint is the only correct way to seed an install in v1 — raw SQL cannot work because connector_credentials.credentials_enc is encrypted per-tenant off FABRIC_MASTER_KEY, so the resolver fails to decrypt anything written outside the runtime crypto path. (Without a credentials row, every webhook + worker call later fails to mint an installation token.)

For UI integrators: the GitHub manifest exposes its install endpoint via install_path in the /api/v1/connectors listing. Clients walking that listing should prefer install_path (when present) over the auth_mode-implied default (/oauth/start for OAuth2). For github the value is /api/v1/connectors/github/admin/install; the framework also refuses /oauth/start for connectors that declare a custom install path, naming the correct endpoint in the error body.

After installing the App on GitHub (step 4 above), record the installation id, account login, account type, and repository selection from the GitHub UI, then call:

curl -X POST "$FABRIC_API/api/v1/connectors/github/admin/install" \
  -H "Authorization: Bearer $FABRIC_API_KEY" \
  -H "X-Fabric-Admin-Token: $FABRIC_ADMIN_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "<user_profiles.id UUID>",
    "github_installation_id": 12345678,
    "account_type": "Organization",
    "account_login": "acme",
    "repository_selection": "all"
  }'

The endpoint:

  • Validates the App can mint an installation token for the supplied id (proves the install actually belongs to this App, not a sibling deployment).
  • Writes connector_installs, github_app_installations, and connector_credentials in one transaction — credentials are routed through CredentialStore::finalize_install_in_tx so the AEAD encryption matches what the resolver expects.
  • Returns the newly-minted install_id.

The route requires two factors before any tenant work happens:

  • FABRIC_ADMIN_ENDPOINTS=1 on the deployment — when unset, the route returns 404 (endpoint invisible).
  • A deploy-admin shared secret in X-Fabric-Admin-Token, constant-time compared against the FABRIC_ADMIN_API_TOKEN env var. The tenant API key still authenticates the caller (and is the only thing that binds the install to a specific tenant_id), but the admin token gates who can call the endpoint at all — without it, a regular tenant API key cannot pre-claim a GitHub installation id meant for a different tenant. Missing/wrong header → 401; FABRIC_ADMIN_API_TOKEN unset while the endpoint is enabled → 500 with an explicit "admin auth not configured" message so the misconfig is loud.

Wiring a real public install handoff (state-token-bound tenant binding via OAuthStateStore) is tracked as a follow-up — at that point the admin token + endpoint env gate retire.

The connector is gated behind GITHUB_CONNECTOR_ENABLED=true — the webhook route returns 501 until all five envs (GITHUB_APP_ID, GITHUB_APP_PRIVATE_KEY_PEM, GITHUB_WEBHOOK_SECRET, FABRIC_INDEXER_DAEMON_URL_TEMPLATE, and FABRIC_MASTER_KEY) are set.

Environment

export GITHUB_CONNECTOR_ENABLED=true
export FABRIC_MASTER_KEY=<random-32+-byte secret>
export GITHUB_APP_ID=123456
export GITHUB_APP_PRIVATE_KEY_PEM="-----BEGIN RSA PRIVATE KEY-----

-----END RSA PRIVATE KEY-----"
export GITHUB_WEBHOOK_SECRET=<random-32+-byte string>
# Used to format per-(install, repo) indexer URLs. {install} and {repo}
# MUST land in separate DNS labels (or in path segments) — the 36-char
# install UUID plus any reasonable repo slug would overshoot the 63-char
# DNS-label limit if collapsed into a single label, and format_daemon_url
# rejects that at the boundary.
export FABRIC_INDEXER_DAEMON_URL_TEMPLATE=https://fabric-indexer.{install}.{repo}.up.railway.app
# Path-segment placement bypasses DNS label limits entirely if your daemon
# routes by URL path rather than by hostname:
#   FABRIC_INDEXER_DAEMON_URL_TEMPLATE=http://fabric-indexer/install/{install}/repo/{repo}

Slash commands

Post any of these as a comment on an issue or PR. The bot replies under the sentinel <!-- cognisos-fabric:slash-reply --> so its own replies are filtered from re-triggering.

VerbArgsWhat it does
/fabric explain<symbol>Calls fabric_explain against the per-repo indexer daemon and replies with the explanation.
/fabric impact<symbol>Calls fabric_impact — surfaces call sites + reverse-deps of the symbol.
/fabric slice<file path>Calls fabric_slice for a token-bounded code slice anchored at the path.
/fabric query<natural language>Free-form query against the fabric MCP.
/fabric help(none)Renders the static help message inline.

All replies are capped at 60 000 chars (UTF-8 char-wise truncation) so even runaway tool output never breaks GitHub's comment-size limit.

Backfill

First install kicks off a three-phase walk over the installation's repos:

  1. ReposListing — paginate /installation/repositories?per_page=100.
  2. PrEnumerating — paginate /repos/{owner}/{repo}/pulls?state=all&per_page=100 for each repo.
  3. PrDiffFetchingGET /repos/{owner}/{repo}/pulls/{n} with Accept: application/vnd.github.v3.diff per PR.

Each Connector::backfill invocation makes exactly one GitHub API call so the framework's per-page rate gate charges github_general once per page. The cursor is bounded under PG's 100 KB row ceiling via PENDING_REPOS_CAP=1000 + PENDING_PRS_CAP=10000; when those queues fill, the FSM spills back to ReposListing with a repos_resume_page marker so a worker crash mid-walk resumes from the last committed cursor on the next claim.

Oversized PRs (GitHub returns HTTP 406 on the diff endpoint) yield events with oversized: true and no diff field — downstream can choose to skip or summarize them rather than failing the page.

Backfill events use namespaced kind pull_request.backfill_diff and id backfill:pr-diff:{repo_id}:{pr_number} so they never collide with live pull_request.opened webhooks on dedup.

Emit

Connector::emit accepts five payload.kind values. Each is validated synchronously at the boundary so a bad payload fails with BadConfig rather than a confusing GitHub 422.

KindBody fieldsNotes
github.pr_summary_commentrepo_full_name, pr_number, body, optional existing_comment_idBody MUST contain the <!-- cognisos-fabric:pr-summary --> sentinel; existing_comment_id triggers PATCH (edit-in-place), absence triggers POST.
github.issue_commentrepo_full_name, issue_number, bodyFree-form.
github.review_commentrepo_full_name, pr_number, commit_id, path, line, body, optional side (LEFT/RIGHT)Line-anchored.
github.labelrepo_full_name, issue_number, labels (non-empty array)Whitespace-only entries are stripped; if the result is empty, returns BadConfig.
github.commit_statusrepo_full_name, sha, state (error/failure/pending/success), optional target_url/description/contextDescription must be ≤140 chars (GitHub's limit).

Every successful call writes a row to github_emitted_events keyed on (install_id, kind, repo_full_name, issue_or_pr_number, provider_id). Failures to record are logged but never rolled back — the GitHub call is the source of truth.

Self-emit dedup

Two filters keep the connector from re-ingesting its own writes when GitHub redelivers a webhook:

  • Tier-1 (in-memory, infallible): drops issue_comment events with sender.type == "Bot", and drops any comment whose body contains <!-- cognisos-fabric:pr-summary --> or <!-- cognisos-fabric:slash-reply -->.
  • Tier-2 (single SELECT against github_emitted_events): drops events whose comment.id matches a row this install wrote. Catches edge cases where GitHub re-attributes the comment to a User sender, or an upstream mirror tool strips the sentinel.

Tier-2 is best-effort: a DB failure logs and proceeds with Tier-1 survivors. Tier-1 alone is sufficient under normal operation.

Coordination with the Notion connector

The Notion skill imports GitHub issues as Notion Tasks. To avoid double-ingestion, v1 of the GitHub connector ingests issues/comments into Fabric but does NOT auto-create Notion Tasks — the Notion skill remains the only path that creates them. Provenance on every artifact carries connector_id = "github" so cross-connector queries can disambiguate.

Tables

  • github_app_installations(install_id, github_installation_id, tenant_id, account_login, account_type, repository_selection, app_id). RLS deny-all for app_role; read-only via service role (same shape as slack_workspace_installs).
  • github_emitted_events(install_id, kind, repo_full_name, issue_or_pr_number, provider_id, by_actor_login, emitted_at). Composite PK doubles as the Tier-2 dedup index.
  • github_repo_index_state(install_id, repo_id, daemon_url, status, last_indexed_sha, …). Tracks per-repo clone + index lifecycle.

Conflicts

A second tenant trying to install the same GitHub org gets its OWN install row — GitHub issues a fresh installation_id per App-account binding so no conflict exists. The exception is when an admin uses the same (App, account) pair across two deployments: that's a configuration error (one App per deployment), not a tenant conflict.

Health

The framework-wide install health endpoint applies — GET /api/v1/connectors/installs/:install_id/health (registered in routes/connectors.rs, identical shape for every connector). It dispatches to GithubConnector::health which returns:

  • Step 1 / disabled: { "status": "skeleton" }.
  • Wired and reachable: round-trips GET /app + GET /installation/{id} and surfaces { "status": "healthy", "github_installation_id": <id> }.
  • Token-mint failure: 503 + the upstream message.

Limits

  • One Railway service per (install_id, repo_id) for the indexer daemon. Cap repo count per tenant in v1; multi-repo-per-daemon is a follow-up.
  • PR diff size: 1 MB cap; oversized diffs are flagged in the backfill event.
  • Initial index of a 500 k-file repo can exceed Railway's default service timeout. The worker uses defer_job to extend lock past the 5-min default.
  • GitHub App private key is read from env as a PEM string. Multi-region / per-tenant App keys are not supported in v1.