Model Fine-Tuning

The platform currently supports SFT training mode. SFT (Supervised Fine-Tuning) is a training method that allows model developers to select open models or already uploaded models and combine them with their private datasets for fine-tuning to create custom vertical models for their business needs.

Create Task

Click on the product “Model Service Platform UModelVerse” — Feature menu “Model Fine-Tuning” — Create Task

Model Selection

The platform presets popular open-source models and also supports importing private models from the US3 object storage.
Data Configuration

Select the dataset needed for training this time.

Prerequisite: Upload training data to the US3 storage space and associate it with dataset management. For more details, see Dataset Management.

Parameter Configuration

The platform currently supports the LoRA fine-tuning method, which updates only the low-rank portion of the parameters during training, requiring fewer computing resources and speeding up the training process.

Supported parameter configurations are as follows:

Parameter Name	Parameter Description
Initial Learning Rate	A hyperparameter for updating weights during gradient descent.
Training Rounds	Controls the number of iterations during training. The Epoch size can be adjusted based on data scale.
Batch Size	Represents the data step size for updating model parameters during training. It’s the amount of data the model processes before updating the parameters once.
Maximum Sample Size	The maximum number of samples for each dataset.
Computation Type	Whether to use mixed precision training, such as FP16, BF16, FP32, etc.
CheckPoint Save Interval	The interval Step number for saving Checkpoints during training.
Maximum CheckPoint Saves	The number of Checkpoints to save at the end of training. Saving Checkpoints can increase the training time.
Log Save Interval	The interval step for saving logs.
Truncation Length	The maximum length of a single training data sample. Exceeding this length will result in automatic truncation.
Learning Rate Scheduler	Used to adjust how the learning rate changes during training.
Validation Set Ratio	The percentage of the validation set out of the total samples.
Use Flash Attention	-
LoRA Rank Value	The rank size during LoRA training, affecting the degree of influence of LoRa’s internal data on the model. A larger rank indicates a greater influence. Choose an appropriate rank based on the data volume.
LoRA Scaling Factor	The scaling factor in LoRa training, used to adjust the initial training weights to make them closer to or consistent with pre-trained weights.
LoRA Random Dropout	The rate of randomly dropping or ignoring neurons during training to prevent overfitting and enhance the model’s generalization ability.
LoRA+ Learning Rate Ratio	The multiplier for the learning rate of matrix B in LoRA+.

Output Configuration

Configure the storage path for the completed model after training. Currently, only the storage space of North China 2 US3 is supported.
Confirm Configuration

Reconfirm the configured information and billing details. Once confirmed, the training task can be submitted, and the platform will automatically allocate the training resources.
View & Manage Tasks

In the task list, you can view the real-time status of the fine-tuning tasks (including running status, token count, training duration, Loss graph, etc.).

Operations available for tasks include: Details, Terminate, Copy, Delete, Publish

Details: View the running condition of the task, including training status, estimated time, token count, Loss graph, etc.

Terminate: When the task Loss graph shows anomalies, you can perform the “Terminate” operation to stop the current task, free computing resources, and stop billing.

Copy: Copy the current task instance. After copying, it will redirect to a new creation page with the parameters carried over. Adjust the parameters to create a new fine-tune task.

Delete: Delete tasks that either failed, have completed, or have been terminated.

Publish: You can publish completed training tasks. During publishing, select a specific checkpoint step for model deployment. The successfully published model will be automatically stored in “Model Management” for subsequent service deployment.

Model Fine-Tuning

Create Task

Model Selection

Data Configuration

Parameter Configuration

Output Configuration

Confirm Configuration

View & Manage Tasks