ContrieverEncoder
- class trove.modeling.encoder_contriever.ContrieverEncoder(args, training_args=None, preprocess_only=False, **kwargs)
- classmethod can_wrap(model_name_or_path, args)
Returns true if the model is an instance of
facebook/contriever
orfacebook/contriever-msmarco
models.We can also wrap models that their base model is one of these. But,
PretrainedEncoder
takes care of that. We just handle the main models.- Return type:
bool
- __init__(args, training_args=None, preprocess_only=False, **kwargs)
Wraps contriever variants and also provides necessary attributes and methods for data pre-processing.
- Parameters:
args (ModelArguments) – config for instantiating the model
training_args (TrainingArguments) – Not used by this wrapper.
preprocess_only (bool) – If True, do not load model parameteres and only provide methods and attributes necessary for pre-processing the input data.
**kwargs – passed to
transformers.AutoModel.from_pretrained
.
- encode(inputs)
calculate the embeddings for tokenized input.
- Return type:
Tensor
- format_query(text, **kwargs)
Format the query before passing it to tokenizer.
You can also ask for other parameters like dataset_name for example if your model uses different formatting for different datasets like intfloat/e5-mistral-7b-instruct
- Return type:
str
- format_passage(text, title=None, **kwargs)
Format the passage before passing it to tokenizer.
You can also ask for other parameters like dataset_name for example if your model uses different formatting for different datasets like intfloat/e5-mistral-7b-instruct
- Return type:
str
- save_pretrained(*args, **kwargs)
Save model parameters.
It should replicate the signature and behavior of
transformers.PreTrainedModel.save_pretrained
.