Skip to Content
GuideModel Finetuning

Model Fine-Tuning

The platform currently supports SFT training mode. SFT (Supervised Fine-Tuning) is a training method that allows model developers to select open models or already uploaded models and combine them with their private datasets for fine-tuning to create custom vertical models for their business needs.

Create Task

Click on the product “Model Service Platform UModelVerse” — Feature menu “Model Fine-Tuning” — Create Task

  1. Model Selection

    The platform presets popular open-source models and also supports importing private models from the US3 object storage.

  2. Data Configuration

    Select the dataset needed for training this time.

    Prerequisite: Upload training data to the US3 storage space and associate it with dataset management. For more details, see Dataset Management.

  3. Parameter Configuration

    The platform currently supports the LoRA fine-tuning method, which updates only the low-rank portion of the parameters during training, requiring fewer computing resources and speeding up the training process.

    Supported parameter configurations are as follows:

    Parameter NameParameter Description
    Initial Learning RateA hyperparameter for updating weights during gradient descent.
    Training RoundsControls the number of iterations during training. The Epoch size can be adjusted based on data scale.
    Batch SizeRepresents the data step size for updating model parameters during training. It’s the amount of data the model processes before updating the parameters once.
    Maximum Sample SizeThe maximum number of samples for each dataset.
    Computation TypeWhether to use mixed precision training, such as FP16, BF16, FP32, etc.
    CheckPoint Save IntervalThe interval Step number for saving Checkpoints during training.
    Maximum CheckPoint SavesThe number of Checkpoints to save at the end of training. Saving Checkpoints can increase the training time.
    Log Save IntervalThe interval step for saving logs.
    Truncation LengthThe maximum length of a single training data sample. Exceeding this length will result in automatic truncation.
    Learning Rate SchedulerUsed to adjust how the learning rate changes during training.
    Validation Set RatioThe percentage of the validation set out of the total samples.
    Use Flash Attention-
    LoRA Rank ValueThe rank size during LoRA training, affecting the degree of influence of LoRa’s internal data on the model. A larger rank indicates a greater influence. Choose an appropriate rank based on the data volume.
    LoRA Scaling FactorThe scaling factor in LoRa training, used to adjust the initial training weights to make them closer to or consistent with pre-trained weights.
    LoRA Random DropoutThe rate of randomly dropping or ignoring neurons during training to prevent overfitting and enhance the model’s generalization ability.
    LoRA+ Learning Rate RatioThe multiplier for the learning rate of matrix B in LoRA+.
  4. Output Configuration

    Configure the storage path for the completed model after training. Currently, only the storage space of North China 2 US3 is supported.

  5. Confirm Configuration

    Reconfirm the configured information and billing details. Once confirmed, the training task can be submitted, and the platform will automatically allocate the training resources.

  6. View & Manage Tasks

    In the task list, you can view the real-time status of the fine-tuning tasks (including running status, token count, training duration, Loss graph, etc.).

    Operations available for tasks include: Details, Terminate, Copy, Delete, Publish

    Details: View the running condition of the task, including training status, estimated time, token count, Loss graph, etc.

    Terminate: When the task Loss graph shows anomalies, you can perform the “Terminate” operation to stop the current task, free computing resources, and stop billing.

    Copy: Copy the current task instance. After copying, it will redirect to a new creation page with the parameters carried over. Adjust the parameters to create a new fine-tune task.

    Delete: Delete tasks that either failed, have completed, or have been terminated.

    Publish: You can publish completed training tasks. During publishing, select a specific checkpoint step for model deployment. The successfully published model will be automatically stored in “Model Management” for subsequent service deployment.