A multi-tenant, real-time infrastructure-monitoring SaaS. Four runtime surfaces over PostgreSQL + Redis, with tenant isolation, a flap-detecting alert engine, and guaranteed-delivery notifications.
A monorepo of npm workspaces. The browser talks only to the TLS-terminating web tier, which proxies REST and WebSocket to the API. Background work flows through Redis to a stateless worker pool. PostgreSQL is the single source of truth, reached exclusively through a tenant-scoping wrapper.
From a scheduled probe to a delivered alert and a live dashboard update — the path every monitor result travels.
The worker reconciles each enabled monitor into a BullMQ repeatable job (every: interval + jitter) on the checks queue — spreading load across 50k+ monitors.
HTTP/TCP/ICMP/DNS/push probe with timeout + assertions. Writes a check_results row, updates monitors.status, and publishes check:result.
up → degraded → down with N-consecutive / failure-ratio hysteresis. A confirmed transition to down opens an incident + timeline entry.
Each delivery is a row keyed by a deterministic idempotency_key (UNIQUE) → enqueued with retries + exponential backoff; exhausted attempts dead-letter, never silently drop.
The worker publishes to Redis pulse:events; the API re-emits to the org:<id> Socket.IO room; the dashboard updates within ~2s — no reload.
Each subsystem chosen for a reason recorded in DECISIONS.md.
REST with /v1 path versioning; realtime rooms-per-tenant with a Redis adapter for horizontal scale.
SQL-transparent migrations (auditable for SOC2), no engine binary, clean tenant-scoping wrapper.
Repeatable jobs power the scheduler; retries + dead-letter give guaranteed-delivery notifications.
TanStack Query for server state, Recharts for latency, design tokens copied verbatim from the brand template.
Server-side opaque sessions in an HttpOnly cookie — revocable, XSS-safe; lockout + anti-enumeration.
One schema set shared API↔web; RFC 9457 problem+json errors with field-level detail.
Channel secrets encrypted at rest; outbound webhooks HMAC-signed; inbound signatures verified.
5-service compose stack; CI runs lint, typecheck, tests, build, audit, and the invariant lint.
Every tenant table carries an indexed org_id and is reached only through scopedDb(orgId), with Row-Level Security underneath as defense-in-depth.
| Table | Purpose | Key constraints |
|---|---|---|
| organizations | Tenant root. | UNIQUE slug |
| users | Members + role (owner/admin/operator/viewer); lockout fields. | UNIQUE email |
| sessions · api_keys | Opaque hashed session tokens; org-scoped ingest keys. | UNIQUE token_hash |
| monitors | Health checks (type, target, interval, assertions, status). | idx org_id, status |
| check_results · metrics | Time-series probe results & pushed OS metrics. | partition-ready |
| incidents · incident_events | Incident lifecycle + timeline (open→ack→resolve). | idx org_id, status |
| notification_channels · alert_rules | Channels (AES-GCM secrets) + routing rules. | FK channel_id |
| notifications | Guaranteed-delivery outbox (pending/sent/failed/dead). | UNIQUE idempotency_key |
| audit_logs · webhook_events | SOC2 audit trail; idempotent inbound webhook receipts. | UNIQUE provider,external_id |
Machine-checkable rules enforced by scripts/invariant-lint.mjs in CI, the Verification Gate, and the Drift Detection Gate. manual rules are prose-audited.
| ID | Rule | Check |
|---|---|---|
| INV-1 | All tenant DB access goes through scopedDb(orgId); the raw client is confined to packages/db. | forbidden-pattern |
| INV-2 | notifications.idempotency_key is UNIQUE — guaranteed delivery, no double-send. | unique-constraint |
| INV-3 | users.email is UNIQUE — anti-duplicate, single-lookup login. | unique-constraint |
| INV-4 | Inbound webhooks verify the signature before any DB write. | boundary-order |
| INV-5 | Auth login rate-limits before the password compare. | boundary-order |
| INV-10 | Backend services use the structured logger, never console.log. | forbidden-pattern |
| INV-11 | Every UI screen is routable and every workflow has a UI e2e to its terminal step. | ui-coverage |
| INV-12 | Colors/type/spacing come from the design-token layer, not ad-hoc values. | manual |
| INV-13 | Production enforces HTTPS-only + HSTS with an HTTP→HTTPS redirect. | manual |
| INV-14 | Channel secrets are AES-256-GCM encrypted at rest and never logged. | manual |
| INV-15 | Outbound custom-webhook delivery passes an SSRF egress guard before fetch. | manual |
STRIDE-modeled boundaries mapped to OWASP Top 10. Controls layer defense-in-depth around every crossing.
Argon2id + lockout + anti-enumeration, per-(org,ip) rate limits, Zod validation, body limits. A03 · A05 · A07
scopedDb + Row-Level Security; cross-tenant access returns 404 (no existence leak). A01
Signature verified before write; idempotent on (provider, external_id). A08
SSRF egress guard (https-only, blocks private/loopback/link-local); HMAC-signed payloads; TLS. A10
Least-privilege creds, parameterized writes, validated external responses, Pino redaction. A03 · A05
Env-var secrets + AES-GCM at rest, HTTPS+HSTS, audit logging, CI dependency scanning. A02 · A06
One docker compose up -d --build brings up five services. HTTPS-only at the edge; the API and worker stay on the internal network.
nginx — TLS, HTTP→HTTPS redirect, HSTS, header buffers, serves the SPA + proxies the API.
Express + Socket.IO; runs migrations on boot; /api/health checks db + redis.
BullMQ scheduler / checker / evaluator / notifier; stateless & horizontally scalable.
Healthchecked; named volume; the single source of truth.
Queues, rate-limit buckets, and the realtime pub/sub bus.
21/21 functional smoke + 6/6 Playwright workflows green against the running HTTPS deploy.