Retraining

When changes auto-apply, when you need to retrain, and what the flow actually does.

Retraining re-embeds knowledge sources so they show up in retrieval. Some changes apply automatically; others need a manual push.

Automatic processing

When you add a new source — a file, URL, Q&A pair, or snippet — ChatbotGen processes it automatically in the background. You don't need to click anything for first-time indexing. The status moves from pendingprocessing/crawlingready.

When you need to retrain manually

Certain edits don't re-embed automatically. The chatbot shows an unsaved-changes banner:

┌────────────────────────────────────────────────────────────┐
│  You have unsaved changes — click Retrain agent to apply.  │
│                                          [ Retrain agent ] │
└────────────────────────────────────────────────────────────┘

Actions that raise this banner:

  • Editing a Q&A pair
  • Editing a text snippet
  • Deleting any source
  • Toggling a URL's "Exclude from search" flag

Clicking Retrain

Hit Retrain agent. A green flash appears:

Training started. Your chatbot will be ready shortly.

Your chatbot keeps serving the previous embeddings during retraining — no downtime. You can navigate away, open conversations, or work on another chatbot; the background worker continues.

Over-limit errors

If your total characters trained would exceed your plan cap, the retrain is blocked with:

Training content (N chars) exceeds your plan limit (M chars). Remove some sources or upgrade your plan.

Two ways to unblock:

  1. Delete or exclude sources until you're back under the limit
  2. Upgrade your plan — see Plans & pricing

What retraining does under the hood

For each changed source, the pipeline:

  1. Re-extracts the text (for URLs, this means re-scraping; for files, re-parsing)
  2. Re-chunks into overlapping passages
  3. Re-embeds (pgvector HNSW + PostgreSQL full-text)
  4. Atomically swaps the new embeddings in

Unchanged sources aren't touched. It's a targeted refresh, not a full rebuild.

Retraining vs re-crawling

Two different actions:

  • Re-crawl a URL — re-fetches the page, re-extracts, re-embeds. Use when a specific page's content changed.
  • Retrain agent — re-embeds anything flagged as changed across all sources. Use after batch edits or imports.

If your URL content updates on the source site, re-crawl the URL. If you just edited local Q&A or text, click Retrain.

Scheduling

There's no built-in scheduled retrain. If your content changes weekly, set a weekly reminder to visit the Knowledge page and hit Retrain. Scheduled re-indexing is on the roadmap.