DefaultEncoder

class trove.modeling.encoder_default.DefaultEncoder(args, training_args=None, preprocess_only=False, **kwargs)
classmethod can_wrap(*args, **kwargs)

returns true if this wrapper can wrap the specified model with the given arguments.

Subclasses of PretrainedEncoder should implement this method. We use this method to automatically choose the correct subclass to wrap different models.

Parameters:
  • model_name_or_path (str) – name of the model to wrap.

  • args (ModelArguments) – arguments that describe the model to wrap.

Return type:

bool

Returns:

True if this class can wrap the model, and false otherwise.

__init__(args, training_args=None, preprocess_only=False, **kwargs)

A generic wrapper that can be used with any model.

You can customize the pooling and normalization through model_args options. This wrapper does not do any custom formatting for queries and passages. It just returns the main text (prefix by the title if available).

Parameters:
  • args (ModelArguments) – config for instantiating the model

  • training_args (TrainingArguments) – Not used by this wrapper.

  • preprocess_only (bool) – If True, do not load model parameteres and only provide methods and attributes necessary for pre-processing the input data.

  • **kwargs – passed to transformers.AutoModel.from_pretrained.

encode(inputs)

calculate the embeddings for tokenized input.

Return type:

Tensor

format_query(text, **kwargs)

Format the query before passing it to tokenizer.

You can also ask for other parameters like dataset_name for example if your model uses different formatting for different datasets like intfloat/e5-mistral-7b-instruct

Return type:

str

format_passage(text, title=None, **kwargs)

Format the passage before passing it to tokenizer.

You can also ask for other parameters like dataset_name for example if your model uses different formatting for different datasets like intfloat/e5-mistral-7b-instruct

Return type:

str

save_pretrained(*args, **kwargs)

Save model parameters.

It should replicate the signature and behavior of transformers.PreTrainedModel.save_pretrained.