Model Fine-Tuning
The platform currently supports SFT training mode. SFT (Supervised Fine-Tuning) is a training method that allows model developers to select open models or already uploaded models and combine them with their private datasets for fine-tuning to create custom vertical models for their business needs.
Create Task
Click on the product “Model Service Platform UModelVerse” — Feature menu “Model Fine-Tuning” — Create Task
-
Model Selection
The platform presets popular open-source models and also supports importing private models from the US3 object storage.
-
Data Configuration
Select the dataset needed for training this time.
Prerequisite: Upload training data to the US3 storage space and associate it with dataset management. For more details, see Dataset Management.
-
Parameter Configuration
The platform currently supports the LoRA fine-tuning method, which updates only the low-rank portion of the parameters during training, requiring fewer computing resources and speeding up the training process.
Supported parameter configurations are as follows:
Parameter Name Parameter Description Initial Learning Rate A hyperparameter for updating weights during gradient descent. Training Rounds Controls the number of iterations during training. The Epoch size can be adjusted based on data scale. Batch Size Represents the data step size for updating model parameters during training. It’s the amount of data the model processes before updating the parameters once. Maximum Sample Size The maximum number of samples for each dataset. Computation Type Whether to use mixed precision training, such as FP16, BF16, FP32, etc. CheckPoint Save Interval The interval Step number for saving Checkpoints during training. Maximum CheckPoint Saves The number of Checkpoints to save at the end of training. Saving Checkpoints can increase the training time. Log Save Interval The interval step for saving logs. Truncation Length The maximum length of a single training data sample. Exceeding this length will result in automatic truncation. Learning Rate Scheduler Used to adjust how the learning rate changes during training. Validation Set Ratio The percentage of the validation set out of the total samples. Use Flash Attention - LoRA Rank Value The rank size during LoRA training, affecting the degree of influence of LoRa’s internal data on the model. A larger rank indicates a greater influence. Choose an appropriate rank based on the data volume. LoRA Scaling Factor The scaling factor in LoRa training, used to adjust the initial training weights to make them closer to or consistent with pre-trained weights. LoRA Random Dropout The rate of randomly dropping or ignoring neurons during training to prevent overfitting and enhance the model’s generalization ability. LoRA+ Learning Rate Ratio The multiplier for the learning rate of matrix B in LoRA+. -
Output Configuration
Configure the storage path for the completed model after training. Currently, only the storage space of North China 2 US3 is supported.
-
Confirm Configuration
Reconfirm the configured information and billing details. Once confirmed, the training task can be submitted, and the platform will automatically allocate the training resources.
-
View & Manage Tasks
In the task list, you can view the real-time status of the fine-tuning tasks (including running status, token count, training duration, Loss graph, etc.).
Operations available for tasks include: Details, Terminate, Copy, Delete, Publish
Details: View the running condition of the task, including training status, estimated time, token count, Loss graph, etc.
Terminate: When the task Loss graph shows anomalies, you can perform the “Terminate” operation to stop the current task, free computing resources, and stop billing.
Copy: Copy the current task instance. After copying, it will redirect to a new creation page with the parameters carried over. Adjust the parameters to create a new fine-tune task.
Delete: Delete tasks that either failed, have completed, or have been terminated.
Publish: You can publish completed training tasks. During publishing, select a specific checkpoint step for model deployment. The successfully published model will be automatically stored in “Model Management” for subsequent service deployment.