How it would scale
Suggested production architecture
The same layered design, hardened for a real customer. Crucially, everything runs inside the plant network so data never leaves.
What changes for production
- On-prem LLM (data privacy). Replace the cloud Claude API with a self-hosted open-weight model (Llama / Mistral via vLLM) inside the plant network, so operational data never leaves. The assistant layer is intentionally thin, so a smaller local model suffices, and it is swappable behind one env var.
- Multi-model routing (faster, cheaper, more accurate). Split the two LLM jobs across models: a stronger model for tool routing and SQL reasoning (small output, so the premium costs little), and a leaner, faster model for narrating rows (the large-output step). Combined with prompt caching, this gives more accurate SQL on hard questions, faster answers, and lower cost, with each model still swappable behind one env var.
- Real data ingestion. Stream live sensor data from the historian / PLCs into Exasol (CDC), instead of mock data. The SQL views stay the same shape.
- Pooled connections. A long-running backend holds a pooled Exasol connection instead of the serverless per-request connect.
- Security & governance. SSO + role-based access, per-plant data scoping, secrets in a vault, network controls (VPC / security groups) instead of an IP allowlist.
- Operationalize. Configurable, plant-specific thresholds, alerting and escalation, scheduled monitoring, and auto-scaling or multiple clusters for workload isolation.
Assumptions for the production design
- On-prem GPU. The customer can host a GPU on-premises (or in their private cloud) to run the local LLM, sized for the chosen model.
- A small model is enough. A self-hosted open-weight model (Llama or Mistral) is good enough for this thin routing and narration task, validated by evaluation. Smaller models may need a stricter router and some prompt tuning.
- Exasol inside the network. Exasol runs on-premises or in the customer’s private VPC, with network access to the plant data sources.
- A data pipeline exists. A real-time or near-real-time ingestion pipeline (historian, PLCs, CDC) exists or can be built to feed Exasol, and the SQL views keep the same shape.
- Identity provider. An SSO provider and role definitions exist, enabling authentication and per-plant data scoping.
- Ops capacity. There is capacity to run, patch, and monitor the on-prem stack: database, model server, and app.
- Data must stay local. Security and data-residency policy requires data to stay inside the plant network, which is the reason for an on-prem LLM rather than a cloud API.
- Calibrated thresholds. Risk thresholds and weights will be calibrated with the customer’s engineers, replacing the prototype’s illustrative defaults.