ModelArguments

class trove.modeling.model_args.ModelArguments(model_name_or_path=None, model_revision=None, torch_dtype=None, trust_remote_code=False, attn_implementation=None, encoder_class=None, pooling=None, normalize=None, loss=None, temperature=1.0, temperature_learnable=False, use_peft=False, lora_r=16, lora_alpha=32, lora_dropout=0.05, lora_target_modules=None, lora_modules_to_save=None, lora_task_type='FEATURE_EXTRACTION', use_rslora=False, load_in_8bit=False, load_in_4bit=False, bnb_4bit_quant_type='nf4', use_bnb_nested_quant=False)

Arguments which define the model and tokenizer to load.

model_name_or_path: Optional[str] = None

The model checkpoint for weights initialization.

model_revision: Optional[str] = None

The specific model version to use (can be a branch name, tag name or commit id).

torch_dtype: Optional[str] = None

Override the default torch.dtype and load the model under this dtype. If auto is passed, the dtype will be automatically derived from the model’s weights.

trust_remote_code: bool = False

Trust remote code when loading a model.

attn_implementation: Optional[str] = None

Which attention implementation to use; you can run –attn_implementation=flash_attention_2, in which case you must install this manually by running pip install flash-attn –no-build-isolation

encoder_class: Optional[str] = None

Name or alias of the PretrainedEncoder subclass that should be used as the encoder. If not specified, use the subclass of PretrainedEncoder that can load the given checkpoint.

pooling: Optional[str] = None

The type of pooling to use (options are ‘mean’, ‘first_token’, ‘last_token’). Make sure the encoder class that you are using allows dynamically choosing the pooling method (i.e., supports setting this option).

normalize: Optional[str] = None

Whether to normalize the embedding vector or not (options are ‘yes’ and ‘no’). Make sure the encoder class that you are using allows selecting the normalization dynamically (i.e., supports setting this option).

loss: Optional[str] = None

Name of the loss function to use for IR training.

temperature: float = 1.0

scaling factor for similarity scores when calculating the loss. We do not enforce its use. It is up to the loss function to use this argument.

temperature_learnable: bool = False

If true, make temperature a learnable parameter with initial value set to ‘–temperature’ argument. WARNING: This feature is not complete yet and the learned temperature value is not saved to checkpoints. So you won’t be able to load it later.

use_peft: bool = False

Whether to use PEFT or not for training.

lora_r: Optional[int] = 16

LoRA R value.

lora_alpha: Optional[int] = 32

LoRA alpha.

lora_dropout: Optional[float] = 0.05

LoRA dropout.

lora_target_modules: Optional[List[str]] = None

LoRA target modules.

lora_modules_to_save: Optional[List[str]] = None

Model layers to unfreeze & train

lora_task_type: str = 'FEATURE_EXTRACTION'

The task_type to pass for LoRA (use SEQ_CLS for reward modeling)

use_rslora: bool = False

Use Rank-Stabilized LoRA (paper), which sets the adapter scaling factor to lora_alpha/√r, instead of the original default value of lora_alpha/r.

load_in_8bit: bool = False

use 8 bit precision for the base model - works only with LoRA

load_in_4bit: bool = False

use 4 bit precision for the base model - works only with LoRA

bnb_4bit_quant_type: Optional[str] = 'nf4'

precise the quantization type (fp4 or nf4)

use_bnb_nested_quant: bool = False

use nested quantization

to_dict()

Return a json serializable view of the class attributes.

Return type:

Dict