Choosing a Fine-Tuning Method in Generative AI

OCI Generative AI fine-tunes the pretrained base models using a method that matches the base model. The following table lists the method that Generative AI uses to train each type of base model:

Pretrained Base Model Training Method
meta.llama-3-70b-instruct
  • LoRA
cohere.command-r-16k
  • T-Few
cohere.command
  • T-Few
  • Vanilla
cohere.command-light
  • T-Few
  • Vanilla
Note

For information about the hyperparameters used for each training method, see Hyperparameters for Fine-Tuning a Model in Generative AI.

Choosing Between T-Few and Vanilla

For the cohere.command and cohere.command-light models, OCI Generative AI has two training methods: T-Few, and Vanilla. Use the following guidelines to help you choose the best training method for your use cases.

Feature Options and Recommendations
Training methods for cohere.command and cohere.command-light
  • T-Few
  • Vanilla
Dataset Size
  • Use T-Few for small datasets (A few thousand samples or less)
  • Use Vanilla for large datasets (From a hundred thousand samples to millions of samples)

Using small datasets for the Vanilla method might cause overfitting. Overfitting happens when the trained model gives great results for the training data, but can't generalize outputs for unseen data.

Complexity
  • Use T-Few for format following or instruction following.
  • Use Vanilla for complicated semantical understanding improvement, such as improving a model's understanding of medical cases.
Hosting
  • Use T-Few if you're planning to host several fine-tuned models on the same hosting dedicated AI cluster. If all the models are trained on the same base model, you can host them on the same cluster. This stacked-serving feature saves cost and offers good performance if user traffic to each T-Few fine-tuned model is relatively low. See Adding Endpoints to Hosting Clusters.
  • Each model that's fine-tuned with the Vanilla method requires its own hosting dedicated AI cluster.