Esta página aún no está traducida al español — mostrando la versión en inglés.

How training works

The four source types, how status flows, the character budget meter, and the action bar that tells you what to do next.

"Training" in ChatbotGen means feeding your chatbot content so it can answer accurately. The Training tab is where that work happens. This page is the lay of the land — each source type has its own dedicated guide.

Four source types

Every chatbot can pull from four kinds of knowledge. The Training page has one tab per type, each with a live count:

┌────────────────────────────────────────────────┐
│  Files (3)   URLs (12)   Q&A (8)   Text (2)    │
└────────────────────────────────────────────────┘
  • Files — PDFs, DOCX, TXT uploads. See Upload files.
  • URLs — single pages or whole-site crawls. See Website URLs.
  • Q&A — handwritten question/answer pairs with alternate phrasings. See Questions & answers.
  • Text — raw text snippets you paste in. (Dedicated guide coming soon — for now, see the Text tab on your chatbot.)

All four feed the same retriever. The chatbot doesn't care where a fact came from — it searches across everything.

Status flow

Each individual source carries its own status:

  • Files: pendingprocessingcompleted (or failed)
  • URLs: pendingcrawlingcompleted (or failed)
  • Text: pendingprocessingcompleted (or failed)
  • Q&A: no visible status — pairs become retrievable as soon as they're saved

Status updates stream in live via WebSocket — no need to refresh.

The action bar

Right under the sub-nav, a state-aware banner tells you exactly what's happening. Its message changes based on overall chatbot state:

Situation Message
A crawl is running "Discovering pages — URLs will appear below as we find them…"
Training in progress "Training in progress…"
Over plan limit "Over plan limit — reduce training content to retrain."
Waiting on sources "Preparing N source(s) — retrain unlocks once they're ready."
Unsaved changes "You have unsaved changes — click Retrain agent to apply."
Nothing to do "Agent is up to date"

The same banner shows aggregate stats (source count + characters used / max) and, when it makes sense, a Retrain agent button on the right.

Character budget

Right below the action bar, a Training content usage bar shows where you stand against your plan's total-characters-trained cap:

┌──────────────────────────────────────────────┐
│  Training content                            │
│  ████████████████░░░░░     380K / 400K       │
└──────────────────────────────────────────────┘

When your total would exceed the cap, the banner turns red and retrains are blocked. Either remove sources or upgrade your plan — see Plans & pricing.

What counts toward the character budget

Only the displayed characters count:

  • Files — extracted text (not raw file bytes)
  • URLs — extracted page content (excluded URLs are ignored in the live total)
  • Q&A — question + answer + variations combined
  • Text — the raw snippet length

So a 3 MB image-heavy PDF might extract very little, while a 300 KB text-only PDF might add hundreds of thousands of characters. Check the character count on each source after it finishes processing.

Next steps