GitHub Connector
Two-way sync between a GitHub installation and the Fabric graph — PRs, issues, comments, reviews, and push events. Posts an auto-managed PR-summary comment and answers /fabric slash commands using the fabric-indexer code knowledge graph.
What it does
- Ingests pull requests, issues, issue/PR-review comments, review state, and pushes via GitHub App webhooks.
- Auto-comments on every opened/reopened/synchronized PR with a code-intel summary anchored by the sentinel
<!-- cognisos-fabric:pr-summary -->. Subsequent commits edit that comment in place rather than posting duplicates. - Answers slash commands posted as PR/issue comments —
/fabric explain <symbol>,/fabric impact <symbol>,/fabric slice <path>,/fabric query <natural language>,/fabric help. - Backfills historical PR diffs page-by-page on first install. The cursor is bounded under PG's 100 KB row ceiling and resumable across worker restarts.
- Emits back issue/PR comments, review comments, labels, and commit statuses via
Connector::emit— every outbound write is recorded ingithub_emitted_eventsso GitHub's redelivery can be deduped.
Unlike Slack, one install is bound to a single (installation_id, tenant_id) pair: a single GitHub org can be installed into multiple Fabric tenants without conflict because GitHub issues a distinct installation per (App, account) pair.
Prerequisites
- GitHub org/user admin who can install Apps on the target org.
- Public-internet-reachable deployment URL for GitHub's webhook POSTs. v1 deliberately does not ship a public install callback — operators run the admin endpoint below after the GitHub App install lands.
FABRIC_MASTER_KEYset on the daemon — the installation token is encrypted at rest with this key.FABRIC_ADMIN_ENDPOINTS=1andFABRIC_ADMIN_API_TOKEN=<random secret>set on the daemon — required by the admin install endpoint that provisions each tenant's install row (see Provisioning the install row below). Without these the route returns 404 / 500 respectively.- For the indexer-driven PR-summary / slash-command paths: one
fabric-mcp-serverRailway service per(install_id, repo_id), reached over HTTP JSON-RPC. The connector resolves daemon URLs by substituting{install}/{repo}intoFABRIC_INDEXER_DAEMON_URL_TEMPLATE.
Create the GitHub App
- Settings → Developer settings → GitHub Apps → New GitHub App (or use the manifest-flow at https://docs.github.com/en/apps/sharing-github-apps/registering-a-github-app-from-a-manifest).
- Paste the contents of
github-app-manifest.json— replaceyour-host.example.comwith your deployment hostname. - After creation GitHub gives you three secrets to wire into the daemon:
- App ID →
GITHUB_APP_ID - Private key (download the
.pem) →GITHUB_APP_PRIVATE_KEY_PEM(the full PEM body, including-----BEGIN/-----ENDlines) - Webhook secret (you set this) →
GITHUB_WEBHOOK_SECRET
- App ID →
- Install the App: visit
https://github.com/apps/<your-app-slug>/installations/newand pick the org + repos.
Provisioning the install row (v1)
v1 of the connector does not ship a public install-callback HTTP route, and the manifest above deliberately does not advertise redirect_url/callback_urls. Instead, an authenticated admin endpoint provisions the install. The endpoint is the only correct way to seed an install in v1 — raw SQL cannot work because connector_credentials.credentials_enc is encrypted per-tenant off FABRIC_MASTER_KEY, so the resolver fails to decrypt anything written outside the runtime crypto path. (Without a credentials row, every webhook + worker call later fails to mint an installation token.)
For UI integrators: the GitHub manifest exposes its install endpoint via
install_pathin the/api/v1/connectorslisting. Clients walking that listing should preferinstall_path(when present) over theauth_mode-implied default (/oauth/startfor OAuth2). Forgithubthe value is/api/v1/connectors/github/admin/install; the framework also refuses/oauth/startfor connectors that declare a custom install path, naming the correct endpoint in the error body.
After installing the App on GitHub (step 4 above), record the installation id, account login, account type, and repository selection from the GitHub UI, then call:
curl -X POST "$FABRIC_API/api/v1/connectors/github/admin/install" \
-H "Authorization: Bearer $FABRIC_API_KEY" \
-H "X-Fabric-Admin-Token: $FABRIC_ADMIN_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"user_id": "<user_profiles.id UUID>",
"github_installation_id": 12345678,
"account_type": "Organization",
"account_login": "acme",
"repository_selection": "all"
}'The endpoint:
- Validates the App can mint an installation token for the supplied id (proves the install actually belongs to this App, not a sibling deployment).
- Writes
connector_installs,github_app_installations, andconnector_credentialsin one transaction — credentials are routed throughCredentialStore::finalize_install_in_txso the AEAD encryption matches what the resolver expects. - Returns the newly-minted
install_id.
The route requires two factors before any tenant work happens:
FABRIC_ADMIN_ENDPOINTS=1on the deployment — when unset, the route returns 404 (endpoint invisible).- A deploy-admin shared secret in
X-Fabric-Admin-Token, constant-time compared against theFABRIC_ADMIN_API_TOKENenv var. The tenant API key still authenticates the caller (and is the only thing that binds the install to a specifictenant_id), but the admin token gates who can call the endpoint at all — without it, a regular tenant API key cannot pre-claim a GitHub installation id meant for a different tenant. Missing/wrong header → 401;FABRIC_ADMIN_API_TOKENunset while the endpoint is enabled → 500 with an explicit "admin auth not configured" message so the misconfig is loud.
Wiring a real public install handoff (state-token-bound tenant binding via OAuthStateStore) is tracked as a follow-up — at that point the admin token + endpoint env gate retire.
The connector is gated behind GITHUB_CONNECTOR_ENABLED=true — the webhook route returns 501 until all five envs (GITHUB_APP_ID, GITHUB_APP_PRIVATE_KEY_PEM, GITHUB_WEBHOOK_SECRET, FABRIC_INDEXER_DAEMON_URL_TEMPLATE, and FABRIC_MASTER_KEY) are set.
Environment
export GITHUB_CONNECTOR_ENABLED=true
export FABRIC_MASTER_KEY=<random-32+-byte secret>
export GITHUB_APP_ID=123456
export GITHUB_APP_PRIVATE_KEY_PEM="-----BEGIN RSA PRIVATE KEY-----
…
-----END RSA PRIVATE KEY-----"
export GITHUB_WEBHOOK_SECRET=<random-32+-byte string>
# Used to format per-(install, repo) indexer URLs. {install} and {repo}
# MUST land in separate DNS labels (or in path segments) — the 36-char
# install UUID plus any reasonable repo slug would overshoot the 63-char
# DNS-label limit if collapsed into a single label, and format_daemon_url
# rejects that at the boundary.
export FABRIC_INDEXER_DAEMON_URL_TEMPLATE=https://fabric-indexer.{install}.{repo}.up.railway.app
# Path-segment placement bypasses DNS label limits entirely if your daemon
# routes by URL path rather than by hostname:
# FABRIC_INDEXER_DAEMON_URL_TEMPLATE=http://fabric-indexer/install/{install}/repo/{repo}Slash commands
Post any of these as a comment on an issue or PR. The bot replies under the sentinel <!-- cognisos-fabric:slash-reply --> so its own replies are filtered from re-triggering.
| Verb | Args | What it does |
|---|---|---|
/fabric explain | <symbol> | Calls fabric_explain against the per-repo indexer daemon and replies with the explanation. |
/fabric impact | <symbol> | Calls fabric_impact — surfaces call sites + reverse-deps of the symbol. |
/fabric slice | <file path> | Calls fabric_slice for a token-bounded code slice anchored at the path. |
/fabric query | <natural language> | Free-form query against the fabric MCP. |
/fabric help | (none) | Renders the static help message inline. |
All replies are capped at 60 000 chars (UTF-8 char-wise truncation) so even runaway tool output never breaks GitHub's comment-size limit.
Backfill
First install kicks off a three-phase walk over the installation's repos:
- ReposListing — paginate
/installation/repositories?per_page=100. - PrEnumerating — paginate
/repos/{owner}/{repo}/pulls?state=all&per_page=100for each repo. - PrDiffFetching —
GET /repos/{owner}/{repo}/pulls/{n}withAccept: application/vnd.github.v3.diffper PR.
Each Connector::backfill invocation makes exactly one GitHub API call so the framework's per-page rate gate charges github_general once per page. The cursor is bounded under PG's 100 KB row ceiling via PENDING_REPOS_CAP=1000 + PENDING_PRS_CAP=10000; when those queues fill, the FSM spills back to ReposListing with a repos_resume_page marker so a worker crash mid-walk resumes from the last committed cursor on the next claim.
Oversized PRs (GitHub returns HTTP 406 on the diff endpoint) yield events with oversized: true and no diff field — downstream can choose to skip or summarize them rather than failing the page.
Backfill events use namespaced kind pull_request.backfill_diff and id backfill:pr-diff:{repo_id}:{pr_number} so they never collide with live pull_request.opened webhooks on dedup.
Emit
Connector::emit accepts five payload.kind values. Each is validated synchronously at the boundary so a bad payload fails with BadConfig rather than a confusing GitHub 422.
| Kind | Body fields | Notes |
|---|---|---|
github.pr_summary_comment | repo_full_name, pr_number, body, optional existing_comment_id | Body MUST contain the <!-- cognisos-fabric:pr-summary --> sentinel; existing_comment_id triggers PATCH (edit-in-place), absence triggers POST. |
github.issue_comment | repo_full_name, issue_number, body | Free-form. |
github.review_comment | repo_full_name, pr_number, commit_id, path, line, body, optional side (LEFT/RIGHT) | Line-anchored. |
github.label | repo_full_name, issue_number, labels (non-empty array) | Whitespace-only entries are stripped; if the result is empty, returns BadConfig. |
github.commit_status | repo_full_name, sha, state (error/failure/pending/success), optional target_url/description/context | Description must be ≤140 chars (GitHub's limit). |
Every successful call writes a row to github_emitted_events keyed on (install_id, kind, repo_full_name, issue_or_pr_number, provider_id). Failures to record are logged but never rolled back — the GitHub call is the source of truth.
Self-emit dedup
Two filters keep the connector from re-ingesting its own writes when GitHub redelivers a webhook:
- Tier-1 (in-memory, infallible): drops
issue_commentevents withsender.type == "Bot", and drops any comment whose body contains<!-- cognisos-fabric:pr-summary -->or<!-- cognisos-fabric:slash-reply -->. - Tier-2 (single SELECT against
github_emitted_events): drops events whosecomment.idmatches a row this install wrote. Catches edge cases where GitHub re-attributes the comment to aUsersender, or an upstream mirror tool strips the sentinel.
Tier-2 is best-effort: a DB failure logs and proceeds with Tier-1 survivors. Tier-1 alone is sufficient under normal operation.
Coordination with the Notion connector
The Notion skill imports GitHub issues as Notion Tasks. To avoid double-ingestion, v1 of the GitHub connector ingests issues/comments into Fabric but does NOT auto-create Notion Tasks — the Notion skill remains the only path that creates them. Provenance on every artifact carries connector_id = "github" so cross-connector queries can disambiguate.
Tables
github_app_installations—(install_id, github_installation_id, tenant_id, account_login, account_type, repository_selection, app_id). RLS deny-all forapp_role; read-only via service role (same shape asslack_workspace_installs).github_emitted_events—(install_id, kind, repo_full_name, issue_or_pr_number, provider_id, by_actor_login, emitted_at). Composite PK doubles as the Tier-2 dedup index.github_repo_index_state—(install_id, repo_id, daemon_url, status, last_indexed_sha, …). Tracks per-repo clone + index lifecycle.
Conflicts
A second tenant trying to install the same GitHub org gets its OWN install row — GitHub issues a fresh installation_id per App-account binding so no conflict exists. The exception is when an admin uses the same (App, account) pair across two deployments: that's a configuration error (one App per deployment), not a tenant conflict.
Health
The framework-wide install health endpoint applies — GET /api/v1/connectors/installs/:install_id/health (registered in routes/connectors.rs, identical shape for every connector). It dispatches to GithubConnector::health which returns:
- Step 1 / disabled:
{ "status": "skeleton" }. - Wired and reachable: round-trips
GET /app+GET /installation/{id}and surfaces{ "status": "healthy", "github_installation_id": <id> }. - Token-mint failure: 503 + the upstream message.
Limits
- One Railway service per
(install_id, repo_id)for the indexer daemon. Cap repo count per tenant in v1; multi-repo-per-daemon is a follow-up. - PR diff size: 1 MB cap; oversized diffs are flagged in the backfill event.
- Initial index of a 500 k-file repo can exceed Railway's default service timeout. The worker uses
defer_jobto extend lock past the 5-min default. - GitHub App private key is read from env as a PEM string. Multi-region / per-tenant App keys are not supported in v1.
