Skip to content

Training

Training allows organizations to fine-tune AI models using their accumulated context (embeddings).

Concepts

Concept Description
Training Job A model fine-tuning task
LoRA/DoRA Parameter-efficient fine-tuning methods
Adapter The trained model weights
Rollback Revert to previous adapter

Training Lifecycle

Pending → Running → Completed/Failed
Status Description
pending Job queued
running Training in progress
completed Successfully finished
failed Training failed
cancelled User cancelled

Types

From types/trainings.ts:

interface StartTrainingRequest {
  name: string;
  description?: string;
  epochs?: number;
  learningRate?: number;
  batchSize?: number;
  // Additional config options
}

interface StartTrainingResponse {
  jobId: string;
  status: TrainingStatus;
}

interface ListJobsResponse {
  jobs: TrainingJob[];
}

interface TrainingJob {
  id: string;
  name: string;
  description?: string;
  status: TrainingStatus;
  progress?: number;
  metrics?: TrainingMetrics;
  createdAt: string;
  startedAt?: string;
  completedAt?: string;
}

interface TrainingMetrics {
  loss?: number;
  accuracy?: number;
  epoch?: number;
}

type TrainingStatus = 
  | "pending"
  | "running" 
  | "completed"
  | "failed"
  | "cancelled";

Hooks

useTrainings

hooks/trainings/use-trainings.ts

List jobs and create new training.

const {
  jobs,            // TrainingJob[]
  isLoading,
  createJob        // (data: StartTrainingRequest) => Promise<StartTrainingResponse>
} = useTrainings(organizationSlug);

API Calls: - GET /train/jobs - List all jobs - POST /train - Start new training job

useTraining

hooks/trainings/use-training.ts

Manage a single training job.

const {
  job,             // TrainingJob
  status,          // TrainingStatus (polled)
  isLoading,
  isPolling,       // Still running
  cancelJob,       // () => Promise<void>
  rollbackJob      // () => Promise<void>
} = useTraining(organizationSlug, jobId);

API Calls: - GET /train/jobs/{id} - Job details - GET /train/jobs/{id}/status - Job status (polled) - DELETE /train/jobs/{id} - Cancel/delete job - POST /train/rollback/{id} - Rollback to previous

Polling:

// Polls status while job is running
refetchInterval: (job) => 
  ["pending", "running"].includes(job?.status) ? 5000 : false

Pages

Training Overview

/portal/[slug]/[teamID]/training

Tutorial cards for training workflow: 1. Build context (embeddings) 2. Configure training job 3. Run training 4. Deploy adapter

New Training Job

/portal/[slug]/[teamID]/training/new

Configuration form: - Job name - Description - Training parameters: - Epochs - Learning rate - Batch size

const handleSubmit = async (data: StartTrainingRequest) => {
  const response = await createJob(data);
  router.push(`/portal/${slug}/${teamID}/training/jobs/${response.jobId}`);
};

Jobs List

/portal/[slug]/[teamID]/training/jobs

Table of all training jobs: - Name, status, progress - Created/completed dates - Actions (view, cancel)

Job Detail

/portal/[slug]/[teamID]/training/jobs/[jobID]

Job monitoring page: - Status indicator - Progress bar (while running) - Metrics (loss, accuracy per epoch) - Rollback action (if completed) - Cancel action (if running) - Delete action

export default function JobDetailPage({ params }) {
  const { job, status, rollbackJob, cancelJob } = useTraining(
    params.slug,
    params.jobID
  );

  return (
    <div>
      <JobHeader job={job} status={status} />

      {status === "running" && (
        <ProgressBar value={job.progress} />
      )}

      {job.metrics && (
        <MetricsDisplay metrics={job.metrics} />
      )}

      <JobActions
        status={status}
        onCancel={cancelJob}
        onRollback={rollbackJob}
      />
    </div>
  );
}

Components

JobsTable

components/tables/trainings/jobs-table.tsx

Data table for training jobs: - Sortable columns - Status badges - Progress indicators - Action buttons

Access Control

Training is restricted to admin+ roles:

// In PortalSidebar
{isAdmin && (
  <SidebarMenuItem>
    <Link href={`/portal/${slug}/${teamID}/training`}>
      Training
    </Link>
  </SidebarMenuItem>
)}
// In training pages
<RoleProvider requireRole="admin">
  <TrainingPage />
</RoleProvider>

Workflow Example

  1. Prepare Context
  2. Upload relevant files to embeddings
  3. Embed session timelines
  4. Ensure sufficient training data

  5. Configure Job

  6. Go to /training/new
  7. Name the job (e.g., "Q1 Game Analysis Model")
  8. Optionally adjust hyperparameters

  9. Start Training

  10. Submit configuration
  11. Job enters pending state
  12. Redirected to job detail page

  13. Monitor Progress

  14. Status updates via polling
  15. View training metrics
  16. Wait for completion

  17. Handle Results

  18. Success: Adapter deployed automatically
  19. Failed: Review logs, adjust config, retry

  20. Rollback (if needed)

  21. If new adapter underperforms
  22. Click rollback to restore previous

Training Parameters

Parameter Description Default
epochs Training iterations Varies
learningRate Step size for optimization Varies
batchSize Samples per gradient update Varies

Metrics

During and after training:

Metric Description
loss Training loss value
accuracy Model accuracy
epoch Current epoch number
progress Percentage complete