ModelArguments
- class trove.modeling.model_args.ModelArguments(model_name_or_path=None, model_revision=None, torch_dtype=None, trust_remote_code=False, attn_implementation=None, encoder_class=None, pooling=None, normalize=None, loss=None, temperature=1.0, temperature_learnable=False, use_peft=False, lora_r=16, lora_alpha=32, lora_dropout=0.05, lora_target_modules=None, lora_modules_to_save=None, lora_task_type='FEATURE_EXTRACTION', use_rslora=False, load_in_8bit=False, load_in_4bit=False, bnb_4bit_quant_type='nf4', use_bnb_nested_quant=False)
Arguments which define the model and tokenizer to load.
-
model_name_or_path:
Optional
[str
] = None The model checkpoint for weights initialization.
-
model_revision:
Optional
[str
] = None The specific model version to use (can be a branch name, tag name or commit id).
-
torch_dtype:
Optional
[str
] = None Override the default torch.dtype and load the model under this dtype. If auto is passed, the dtype will be automatically derived from the model’s weights.
-
trust_remote_code:
bool
= False Trust remote code when loading a model.
-
attn_implementation:
Optional
[str
] = None Which attention implementation to use; you can run –attn_implementation=flash_attention_2, in which case you must install this manually by running pip install flash-attn –no-build-isolation
-
encoder_class:
Optional
[str
] = None Name or alias of the PretrainedEncoder subclass that should be used as the encoder. If not specified, use the subclass of PretrainedEncoder that can load the given checkpoint.
-
pooling:
Optional
[str
] = None The type of pooling to use (options are ‘mean’, ‘first_token’, ‘last_token’). Make sure the encoder class that you are using allows dynamically choosing the pooling method (i.e., supports setting this option).
-
normalize:
Optional
[str
] = None Whether to normalize the embedding vector or not (options are ‘yes’ and ‘no’). Make sure the encoder class that you are using allows selecting the normalization dynamically (i.e., supports setting this option).
-
loss:
Optional
[str
] = None Name of the loss function to use for IR training.
-
temperature:
float
= 1.0 scaling factor for similarity scores when calculating the loss. We do not enforce its use. It is up to the loss function to use this argument.
-
temperature_learnable:
bool
= False If true, make temperature a learnable parameter with initial value set to ‘–temperature’ argument. WARNING: This feature is not complete yet and the learned temperature value is not saved to checkpoints. So you won’t be able to load it later.
-
use_peft:
bool
= False Whether to use PEFT or not for training.
-
lora_r:
Optional
[int
] = 16 LoRA R value.
-
lora_alpha:
Optional
[int
] = 32 LoRA alpha.
-
lora_dropout:
Optional
[float
] = 0.05 LoRA dropout.
-
lora_target_modules:
Optional
[List
[str
]] = None LoRA target modules.
-
lora_modules_to_save:
Optional
[List
[str
]] = None Model layers to unfreeze & train
-
lora_task_type:
str
= 'FEATURE_EXTRACTION' The task_type to pass for LoRA (use SEQ_CLS for reward modeling)
-
use_rslora:
bool
= False Use Rank-Stabilized LoRA (paper), which sets the adapter scaling factor to lora_alpha/√r, instead of the original default value of lora_alpha/r.
-
load_in_8bit:
bool
= False use 8 bit precision for the base model - works only with LoRA
-
load_in_4bit:
bool
= False use 4 bit precision for the base model - works only with LoRA
-
bnb_4bit_quant_type:
Optional
[str
] = 'nf4' precise the quantization type (fp4 or nf4)
-
use_bnb_nested_quant:
bool
= False use nested quantization
- to_dict()
Return a json serializable view of the class attributes.
- Return type:
Dict
-
model_name_or_path: