ContrieverEncoder

class trove.modeling.encoder_contriever.ContrieverEncoder(args, training_args=None, preprocess_only=False, **kwargs)
classmethod can_wrap(model_name_or_path, args)

Returns true if the model is an instance of facebook/contriever or facebook/contriever-msmarco models.

We can also wrap models that their base model is one of these. But, PretrainedEncoder takes care of that. We just handle the main models.

Return type:

bool

__init__(args, training_args=None, preprocess_only=False, **kwargs)

Wraps contriever variants and also provides necessary attributes and methods for data pre-processing.

Parameters:
  • args (ModelArguments) – config for instantiating the model

  • training_args (TrainingArguments) – Not used by this wrapper.

  • preprocess_only (bool) – If True, do not load model parameteres and only provide methods and attributes necessary for pre-processing the input data.

  • **kwargs – passed to transformers.AutoModel.from_pretrained.

encode(inputs)

calculate the embeddings for tokenized input.

Return type:

Tensor

format_query(text, **kwargs)

Format the query before passing it to tokenizer.

You can also ask for other parameters like dataset_name for example if your model uses different formatting for different datasets like intfloat/e5-mistral-7b-instruct

Return type:

str

format_passage(text, title=None, **kwargs)

Format the passage before passing it to tokenizer.

You can also ask for other parameters like dataset_name for example if your model uses different formatting for different datasets like intfloat/e5-mistral-7b-instruct

Return type:

str

save_pretrained(*args, **kwargs)

Save model parameters.

It should replicate the signature and behavior of transformers.PreTrainedModel.save_pretrained.