ModelArguments

class trove.modeling.model_args.ModelArguments(model_name_or_path=None, model_revision=None, torch_dtype=None, trust_remote_code=False, attn_implementation=None, encoder_class=None, pooling=None, normalize=None, loss=None, temperature=1.0, temperature_learnable=False, use_peft=False, lora_r=16, lora_alpha=32, lora_dropout=0.05, lora_target_modules=None, lora_modules_to_save=None, lora_task_type='FEATURE_EXTRACTION', use_rslora=False, load_in_8bit=False, load_in_4bit=False, bnb_4bit_quant_type='nf4', use_bnb_nested_quant=False)

Arguments which define the model and tokenizer to load.

model_name_or_path: Optional[str] = None: The model checkpoint for weights initialization.

model_revision: Optional[str] = None: The specific model version to use (can be a branch name, tag name or commit id).

torch_dtype: Optional[str] = None: Override the default torch.dtype and load the model under this dtype. If auto is passed, the dtype will be automatically derived from the model’s weights.

trust_remote_code: bool = False: Trust remote code when loading a model.

attn_implementation: Optional[str] = None: Which attention implementation to use; you can run –attn_implementation=flash_attention_2, in which case you must install this manually by running pip install flash-attn –no-build-isolation

encoder_class: Optional[str] = None: Name or alias of the PretrainedEncoder subclass that should be used as the encoder. If not specified, use the subclass of PretrainedEncoder that can load the given checkpoint.

pooling: Optional[str] = None: The type of pooling to use (options are ‘mean’, ‘first_token’, ‘last_token’). Make sure the encoder class that you are using allows dynamically choosing the pooling method (i.e., supports setting this option).

normalize: Optional[str] = None: Whether to normalize the embedding vector or not (options are ‘yes’ and ‘no’). Make sure the encoder class that you are using allows selecting the normalization dynamically (i.e., supports setting this option).

loss: Optional[str] = None: Name of the loss function to use for IR training.

temperature: float = 1.0: scaling factor for similarity scores when calculating the loss. We do not enforce its use. It is up to the loss function to use this argument.

temperature_learnable: bool = False: If true, make temperature a learnable parameter with initial value set to ‘–temperature’ argument. WARNING: This feature is not complete yet and the learned temperature value is not saved to checkpoints. So you won’t be able to load it later.

use_peft: bool = False: Whether to use PEFT or not for training.

lora_r: Optional[int] = 16: LoRA R value.

lora_alpha: Optional[int] = 32: LoRA alpha.

lora_dropout: Optional[float] = 0.05: LoRA dropout.

lora_target_modules: Optional[List[str]] = None: LoRA target modules.

lora_modules_to_save: Optional[List[str]] = None: Model layers to unfreeze & train

lora_task_type: str = 'FEATURE_EXTRACTION': The task_type to pass for LoRA (use SEQ_CLS for reward modeling)

use_rslora: bool = False: Use Rank-Stabilized LoRA (paper), which sets the adapter scaling factor to lora_alpha/√r, instead of the original default value of lora_alpha/r.

load_in_8bit: bool = False: use 8 bit precision for the base model - works only with LoRA

load_in_4bit: bool = False: use 4 bit precision for the base model - works only with LoRA

bnb_4bit_quant_type: Optional[str] = 'nf4': precise the quantization type (fp4 or nf4)

use_bnb_nested_quant: bool = False: use nested quantization

to_dict()

Return a json serializable view of the class attributes.

Return type:: Dict