Training¶
Training allows organizations to fine-tune AI models using their accumulated context (embeddings).
Concepts¶
| Concept | Description |
|---|---|
| Training Job | A model fine-tuning task |
| LoRA/DoRA | Parameter-efficient fine-tuning methods |
| Adapter | The trained model weights |
| Rollback | Revert to previous adapter |
Training Lifecycle¶
| Status | Description |
|---|---|
pending |
Job queued |
running |
Training in progress |
completed |
Successfully finished |
failed |
Training failed |
cancelled |
User cancelled |
Types¶
From types/trainings.ts:
interface StartTrainingRequest {
name: string;
description?: string;
epochs?: number;
learningRate?: number;
batchSize?: number;
// Additional config options
}
interface StartTrainingResponse {
jobId: string;
status: TrainingStatus;
}
interface ListJobsResponse {
jobs: TrainingJob[];
}
interface TrainingJob {
id: string;
name: string;
description?: string;
status: TrainingStatus;
progress?: number;
metrics?: TrainingMetrics;
createdAt: string;
startedAt?: string;
completedAt?: string;
}
interface TrainingMetrics {
loss?: number;
accuracy?: number;
epoch?: number;
}
type TrainingStatus =
| "pending"
| "running"
| "completed"
| "failed"
| "cancelled";
Hooks¶
useTrainings¶
hooks/trainings/use-trainings.ts
List jobs and create new training.
const {
jobs, // TrainingJob[]
isLoading,
createJob // (data: StartTrainingRequest) => Promise<StartTrainingResponse>
} = useTrainings(organizationSlug);
API Calls:
- GET /train/jobs - List all jobs
- POST /train - Start new training job
useTraining¶
hooks/trainings/use-training.ts
Manage a single training job.
const {
job, // TrainingJob
status, // TrainingStatus (polled)
isLoading,
isPolling, // Still running
cancelJob, // () => Promise<void>
rollbackJob // () => Promise<void>
} = useTraining(organizationSlug, jobId);
API Calls:
- GET /train/jobs/{id} - Job details
- GET /train/jobs/{id}/status - Job status (polled)
- DELETE /train/jobs/{id} - Cancel/delete job
- POST /train/rollback/{id} - Rollback to previous
Polling:
// Polls status while job is running
refetchInterval: (job) =>
["pending", "running"].includes(job?.status) ? 5000 : false
Pages¶
Training Overview¶
/portal/[slug]/[teamID]/training
Tutorial cards for training workflow: 1. Build context (embeddings) 2. Configure training job 3. Run training 4. Deploy adapter
New Training Job¶
/portal/[slug]/[teamID]/training/new
Configuration form: - Job name - Description - Training parameters: - Epochs - Learning rate - Batch size
const handleSubmit = async (data: StartTrainingRequest) => {
const response = await createJob(data);
router.push(`/portal/${slug}/${teamID}/training/jobs/${response.jobId}`);
};
Jobs List¶
/portal/[slug]/[teamID]/training/jobs
Table of all training jobs: - Name, status, progress - Created/completed dates - Actions (view, cancel)
Job Detail¶
/portal/[slug]/[teamID]/training/jobs/[jobID]
Job monitoring page: - Status indicator - Progress bar (while running) - Metrics (loss, accuracy per epoch) - Rollback action (if completed) - Cancel action (if running) - Delete action
export default function JobDetailPage({ params }) {
const { job, status, rollbackJob, cancelJob } = useTraining(
params.slug,
params.jobID
);
return (
<div>
<JobHeader job={job} status={status} />
{status === "running" && (
<ProgressBar value={job.progress} />
)}
{job.metrics && (
<MetricsDisplay metrics={job.metrics} />
)}
<JobActions
status={status}
onCancel={cancelJob}
onRollback={rollbackJob}
/>
</div>
);
}
Components¶
JobsTable¶
components/tables/trainings/jobs-table.tsx
Data table for training jobs: - Sortable columns - Status badges - Progress indicators - Action buttons
Access Control¶
Training is restricted to admin+ roles:
// In PortalSidebar
{isAdmin && (
<SidebarMenuItem>
<Link href={`/portal/${slug}/${teamID}/training`}>
Training
</Link>
</SidebarMenuItem>
)}
Workflow Example¶
- Prepare Context
- Upload relevant files to embeddings
- Embed session timelines
-
Ensure sufficient training data
-
Configure Job
- Go to
/training/new - Name the job (e.g., "Q1 Game Analysis Model")
-
Optionally adjust hyperparameters
-
Start Training
- Submit configuration
- Job enters
pendingstate -
Redirected to job detail page
-
Monitor Progress
- Status updates via polling
- View training metrics
-
Wait for completion
-
Handle Results
- Success: Adapter deployed automatically
-
Failed: Review logs, adjust config, retry
-
Rollback (if needed)
- If new adapter underperforms
- Click rollback to restore previous
Training Parameters¶
| Parameter | Description | Default |
|---|---|---|
epochs |
Training iterations | Varies |
learningRate |
Step size for optimization | Varies |
batchSize |
Samples per gradient update | Varies |
Metrics¶
During and after training:
| Metric | Description |
|---|---|
loss |
Training loss value |
accuracy |
Model accuracy |
epoch |
Current epoch number |
progress |
Percentage complete |