Retraining
When changes auto-apply, when you need to retrain, and what the flow actually does.
Retraining re-embeds knowledge sources so they show up in retrieval. Some changes apply automatically; others need a manual push.
Automatic processing
When you add a new source — a file, URL, Q&A pair, or snippet — ChatbotGen processes it automatically in the background. You don't need to click anything for first-time indexing. The status moves from pending → processing/crawling → ready.
When you need to retrain manually
Certain edits don't re-embed automatically. The chatbot shows an unsaved-changes banner:
┌────────────────────────────────────────────────────────────┐
│ You have unsaved changes — click Retrain agent to apply. │
│ [ Retrain agent ] │
└────────────────────────────────────────────────────────────┘
Actions that raise this banner:
- Editing a Q&A pair
- Editing a text snippet
- Deleting any source
- Toggling a URL's "Exclude from search" flag
Clicking Retrain
Hit Retrain agent. A green flash appears:
Training started. Your chatbot will be ready shortly.
Your chatbot keeps serving the previous embeddings during retraining — no downtime. You can navigate away, open conversations, or work on another chatbot; the background worker continues.
Over-limit errors
If your total characters trained would exceed your plan cap, the retrain is blocked with:
Training content (N chars) exceeds your plan limit (M chars). Remove some sources or upgrade your plan.
Two ways to unblock:
- Delete or exclude sources until you're back under the limit
- Upgrade your plan — see Plans & pricing
What retraining does under the hood
For each changed source, the pipeline:
- Re-extracts the text (for URLs, this means re-scraping; for files, re-parsing)
- Re-chunks into overlapping passages
- Re-embeds (pgvector HNSW + PostgreSQL full-text)
- Atomically swaps the new embeddings in
Unchanged sources aren't touched. It's a targeted refresh, not a full rebuild.
Retraining vs re-crawling
Two different actions:
- Re-crawl a URL — re-fetches the page, re-extracts, re-embeds. Use when a specific page's content changed.
- Retrain agent — re-embeds anything flagged as changed across all sources. Use after batch edits or imports.
If your URL content updates on the source site, re-crawl the URL. If you just edited local Q&A or text, click Retrain.
Scheduling
There's no built-in scheduled retrain. If your content changes weekly, set a weekly reminder to visit the Knowledge page and hit Retrain. Scheduled re-indexing is on the roadmap.