ResultHeapq

class trove.containers.result_heapq.ResultHeapq(topk=100, special_docids=None)
__init__(topk=100, special_docids=None)

A simple class that uses heapq to keep track of the topk largest scores for each query.

You can also use an instance of this class as a callable in a pytorch evaluation loop.

Parameters:
  • topk (int) – Number of items to hold in the heapq for each query.

  • special_docids (Optional[Dict]) – special set of documents for each query to collect their similarity scores regardless of their ranking. Do not use unless necessary. It degrades the performance. It can be in qrel format (i.e., special_docids[qid][docid]=some_score). score is ignored here. It can also be a mapping from qid to a list of docids, i.e., special_docids[qid] = [docid1, docid2, ...]

reset_state()

Clear the data collected so far.

Return type:

None

add_triplet(qid, docid, score)

Add (docid, score) to the heapq for qid.

Parameters:
  • qid (str) – query id

  • docid (str) – document id

  • score (Union[int, float]) – similarity score between query and document.

Return type:

None

add_qrel_nested_dict(qrel_dict)

Import results in nested qrel dict format.

The input is similar to the output of as_qrel_nested_dict() method.

Parameters:

qrel_dict (Dict) – results in nested qrel dict format, where qrel_dict[qid][docid] is the similarity score between qid and docid. If qrel_dict contains keys topk_docs or special_docs, it assumes that each of the qrel_dict['topk_docs'] and qrel_dict['special_docs'] is a separate qrel object and is merged into the corresponding internal collection. If none of these keys is available, it assumes that qrel_dict is one qrel object and is merged into self.topk_docs.

Return type:

None

as_qrel_nested_dict(collection=None)

Export the collected results as nested qrel dicts.

Parameters:

collection (Optional[Union[str, List[str]]]) –

The type of results to return. It can be one of the following:

  • None : Just return topk most similar documents

  • 'topk_docs' : similar to None

  • 'special_docs' : just return the collected results for special documents

  • 'all' : return a dict with keys 'special_docs' and 'topk_docs' containing the corresponding results

Return type:

Union[Dict[str, Dict[str, float]], Dict[str, Dict[str, Dict[str, float]]]]

Returns:

The collected similarities so far in qrel format (i.e., qrel[qid][docid]=sim(qid, docid)). If asked for multiple collections of results, it returns a mapping from collection type to qrel.

as_sorted_lists(reverse=False)

Export the topk collected so far as a sorted list for each query.

Parameters:

reverse (bool) – Is used to sort the scores like sorted(..., reversed=reversed)

Return type:

Dict[str, List[Tuple]]

Returns:

a mapping from qid to a list of tuples of (doc_score, doc_id). Each list is sorted based on the doc_score field.

export_result_dump(reset_state=False)

Export all the results collected so far in the same format as stored in this class.

You can use this to export and merge the collected results from multiple collections, for example if you are doing a sharded evaluation in a distributed environment.

Parameters:

reset_state (bool) – If true, clear the data collected so far.

Return type:

Dict[str, Dict[str, List[Tuple[str, float]]]]

Returns:

A dict with keys 'topk_docs' and 'special_docs' which hold the content of attributes of this class with the same name.

merge_result_dump(results)

Merge the results exported from another instance of this class with their export_result_dump() method.

Parameters:

results (Dict) – The exported results from the other instances of ResultHeapq.

Return type:

None

get_state_dict()

Export all the attributes of this class in a dictionary.

You can use the output of this method to recreate an identical instance of this class later.

Return type:

Dict

Returns:

mapping from the name of attributes of this class to their value.