Pulse

System Architecture

A multi-tenant, real-time infrastructure-monitoring SaaS. Four runtime surfaces over PostgreSQL + Redis, with tenant isolation, a flap-detecting alert engine, and guaranteed-delivery notifications.

Archetype · SaaS Node 22 · TypeScript Express 5 · Socket.IO PostgreSQL · Drizzle BullMQ · Redis React · Vite · Tailwind SOC2 · OWASP Top 10
Components

Runtime topology

A monorepo of npm workspaces. The browser talks only to the TLS-terminating web tier, which proxies REST and WebSocket to the API. Background work flows through Redis to a stateless worker pool. PostgreSQL is the single source of truth, reached exclusively through a tenant-scoping wrapper.

Client

Web SPA

15 screens · landing + authenticated app shell · realtime dashboards, tables, forms.
apps/web · React · Vite · Tailwind · TanStack Query · Recharts
HTTPS · WSS (HSTS, SameSite cookie)
Edge

Web tier — nginx

TLS termination · HTTP→HTTPS 301 · HSTS · 16KB header buffers · serves static SPA · proxies /v1 /api /socket.io.
deploy/nginx.conf
reverse proxy → api:4000
Application

API — Express + Socket.IO

REST /v1 (auth, monitors, incidents, channels, rules, reports, api-keys, ingest, webhooks, audit, health). Middleware: requestId → logger → helmet → HTTPS → rate-limit → session → CSRF → Zod → handler → RFC 9457 errors. WebSocket rooms-per-tenant.
apps/api
Messaging · Jobs

Redis / Valkey

BullMQ queues (schedule · checks · notifications) · token-bucket rate limits · Socket.IO pub/sub fan-out (pulse:events).
queues · rate-limit · realtime bus

Worker pool

scheduler (repeatable jobs + jitter) · checker (http/tcp/icmp/dns/push) · evaluator (flap state machine + incidents) · notifier (5 adapters, retries/DLQ, idempotency).
apps/worker · stateless · horizontally scalable
scopedDb(orgId) · parameterized SQL
Data

PostgreSQL 16

14 org-scoped tables + Row-Level Security · partition-ready time-series (check_results, metrics) · UNIQUE constraints enforce idempotency & identity.
packages/db · Drizzle ORM · SQL migrations
External

Monitored targets

Servers, devices, services probed via ICMP · TCP · HTTPS · DNS, plus push-ingested OS metrics.

Notification channels

Email (SMTP) · SMS (Twilio) · Slack · Teams (Power Automate) · custom webhooks (HMAC-signed, SSRF-guarded).

Inbound webhooks

Twilio SMS-status callbacks — signature verified before any DB write.
HTTPS / REST WebSocket (realtime) Queue (BullMQ) SQL (scoped + RLS)
Data flow

Anatomy of a check cycle

From a scheduled probe to a delivered alert and a live dashboard update — the path every monitor result travels.

Scheduler enqueues a jittered check

The worker reconciles each enabled monitor into a BullMQ repeatable job (every: interval + jitter) on the checks queue — spreading load across 50k+ monitors.

Checker runs the protocol probe

HTTP/TCP/ICMP/DNS/push probe with timeout + assertions. Writes a check_results row, updates monitors.status, and publishes check:result.

Evaluator runs the flap-detecting state machine

up → degraded → down with N-consecutive / failure-ratio hysteresis. A confirmed transition to down opens an incident + timeline entry.

Notifier fans out — once per rule × channel

Each delivery is a row keyed by a deterministic idempotency_key (UNIQUE) → enqueued with retries + exponential backoff; exhausted attempts dead-letter, never silently drop.

Realtime reaches the browser

The worker publishes to Redis pulse:events; the API re-emits to the org:<id> Socket.IO room; the dashboard updates within ~2s — no reload.

Stack

Technology & rationale

Each subsystem chosen for a reason recorded in DECISIONS.md.

API

Express 5 + Socket.IO

REST with /v1 path versioning; realtime rooms-per-tenant with a Redis adapter for horizontal scale.

Data

PostgreSQL + Drizzle

SQL-transparent migrations (auditable for SOC2), no engine binary, clean tenant-scoping wrapper.

Jobs

BullMQ + Redis

Repeatable jobs power the scheduler; retries + dead-letter give guaranteed-delivery notifications.

Web

React + Vite + Tailwind

TanStack Query for server state, Recharts for latency, design tokens copied verbatim from the brand template.

Auth

Argon2id sessions

Server-side opaque sessions in an HttpOnly cookie — revocable, XSS-safe; lockout + anti-enumeration.

Validation

Zod (shared)

One schema set shared API↔web; RFC 9457 problem+json errors with field-level detail.

Crypto

AES-256-GCM · HMAC

Channel secrets encrypted at rest; outbound webhooks HMAC-signed; inbound signatures verified.

Ops

Docker · GitHub Actions

5-service compose stack; CI runs lint, typecheck, tests, build, audit, and the invariant lint.

Data model

Core tables

Every tenant table carries an indexed org_id and is reached only through scopedDb(orgId), with Row-Level Security underneath as defense-in-depth.

TablePurposeKey constraints
organizationsTenant root.UNIQUE slug
usersMembers + role (owner/admin/operator/viewer); lockout fields.UNIQUE email
sessions · api_keysOpaque hashed session tokens; org-scoped ingest keys.UNIQUE token_hash
monitorsHealth checks (type, target, interval, assertions, status).idx org_id, status
check_results · metricsTime-series probe results & pushed OS metrics.partition-ready
incidents · incident_eventsIncident lifecycle + timeline (open→ack→resolve).idx org_id, status
notification_channels · alert_rulesChannels (AES-GCM secrets) + routing rules.FK channel_id
notificationsGuaranteed-delivery outbox (pending/sent/failed/dead).UNIQUE idempotency_key
audit_logs · webhook_eventsSOC2 audit trail; idempotent inbound webhook receipts.UNIQUE provider,external_id
Governance

Architectural invariants (§9)

Machine-checkable rules enforced by scripts/invariant-lint.mjs in CI, the Verification Gate, and the Drift Detection Gate. manual rules are prose-audited.

IDRuleCheck
INV-1All tenant DB access goes through scopedDb(orgId); the raw client is confined to packages/db.forbidden-pattern
INV-2notifications.idempotency_key is UNIQUE — guaranteed delivery, no double-send.unique-constraint
INV-3users.email is UNIQUE — anti-duplicate, single-lookup login.unique-constraint
INV-4Inbound webhooks verify the signature before any DB write.boundary-order
INV-5Auth login rate-limits before the password compare.boundary-order
INV-10Backend services use the structured logger, never console.log.forbidden-pattern
INV-11Every UI screen is routable and every workflow has a UI e2e to its terminal step.ui-coverage
INV-12Colors/type/spacing come from the design-token layer, not ad-hoc values.manual
INV-13Production enforces HTTPS-only + HSTS with an HTTP→HTTPS redirect.manual
INV-14Channel secrets are AES-256-GCM encrypted at rest and never logged.manual
INV-15Outbound custom-webhook delivery passes an SSRF egress guard before fetch.manual
Security

Trust boundaries

STRIDE-modeled boundaries mapped to OWASP Top 10. Controls layer defense-in-depth around every crossing.

B1 · Anonymous → API

Argon2id + lockout + anti-enumeration, per-(org,ip) rate limits, Zod validation, body limits. A03 · A05 · A07

B2 · Tenant → other tenants

scopedDb + Row-Level Security; cross-tenant access returns 404 (no existence leak). A01

B3 · Inbound webhooks

Signature verified before write; idempotent on (provider, external_id). A08

B4 · Outbound targets

SSRF egress guard (https-only, blocks private/loopback/link-local); HMAC-signed payloads; TLS. A10

B5 · Worker → DB

Least-privilege creds, parameterized writes, validated external responses, Pino redaction. A03 · A05

Cross-cutting

Env-var secrets + AES-GCM at rest, HTTPS+HSTS, audit logging, CI dependency scanning. A02 · A06

Deployment

Local-stack topology

One docker compose up -d --build brings up five services. HTTPS-only at the edge; the API and worker stay on the internal network.

edge · :8443/:8080

web

nginx — TLS, HTTP→HTTPS redirect, HSTS, header buffers, serves the SPA + proxies the API.

app · internal

api

Express + Socket.IO; runs migrations on boot; /api/health checks db + redis.

app · internal

worker

BullMQ scheduler / checker / evaluator / notifier; stateless & horizontally scalable.

data

db — postgres:16

Healthchecked; named volume; the single source of truth.

data

redis

Queues, rate-limit buckets, and the realtime pub/sub bus.

verified

Smoke + e2e

21/21 functional smoke + 6/6 Playwright workflows green against the running HTTPS deploy.

← Gallery