ResultHeapq
- class trove.containers.result_heapq.ResultHeapq(topk=100, special_docids=None)
- __init__(topk=100, special_docids=None)
A simple class that uses heapq to keep track of the topk largest scores for each query.
You can also use an instance of this class as a callable in a pytorch evaluation loop.
- Parameters:
topk (int) – Number of items to hold in the heapq for each query.
special_docids (Optional[Dict]) – special set of documents for each query to collect their similarity scores regardless of their ranking. Do not use unless necessary. It degrades the performance. It can be in qrel format (i.e.,
special_docids[qid][docid]=some_score
). score is ignored here. It can also be a mapping from qid to a list of docids, i.e.,special_docids[qid] = [docid1, docid2, ...]
- reset_state()
Clear the data collected so far.
- Return type:
None
- add_triplet(qid, docid, score)
Add (docid, score) to the heapq for qid.
- Parameters:
qid (str) – query id
docid (str) – document id
score (Union[int, float]) – similarity score between query and document.
- Return type:
None
- add_qrel_nested_dict(qrel_dict)
Import results in nested qrel dict format.
The input is similar to the output of
as_qrel_nested_dict()
method.- Parameters:
qrel_dict (Dict) – results in nested qrel dict format, where
qrel_dict[qid][docid]
is the similarity score betweenqid
anddocid
. Ifqrel_dict
contains keystopk_docs
orspecial_docs
, it assumes that each of theqrel_dict['topk_docs']
andqrel_dict['special_docs']
is a separate qrel object and is merged into the corresponding internal collection. If none of these keys is available, it assumes thatqrel_dict
is one qrel object and is merged intoself.topk_docs
.- Return type:
None
- as_qrel_nested_dict(collection=None)
Export the collected results as nested qrel dicts.
- Parameters:
collection (Optional[Union[str, List[str]]]) –
The type of results to return. It can be one of the following:
None
: Just return topk most similar documents'topk_docs'
: similar to None'special_docs'
: just return the collected results for special documents'all'
: return a dict with keys'special_docs'
and'topk_docs'
containing the corresponding results
- Return type:
Union
[Dict
[str
,Dict
[str
,float
]],Dict
[str
,Dict
[str
,Dict
[str
,float
]]]]- Returns:
The collected similarities so far in qrel format (i.e.,
qrel[qid][docid]=sim(qid, docid)
). If asked for multiple collections of results, it returns a mapping from collection type to qrel.
- as_sorted_lists(reverse=False)
Export the topk collected so far as a sorted list for each query.
- Parameters:
reverse (bool) – Is used to sort the scores like
sorted(..., reversed=reversed)
- Return type:
Dict
[str
,List
[Tuple
]]- Returns:
a mapping from
qid
to a list of tuples of(doc_score, doc_id)
. Each list is sorted based on thedoc_score
field.
- export_result_dump(reset_state=False)
Export all the results collected so far in the same format as stored in this class.
You can use this to export and merge the collected results from multiple collections, for example if you are doing a sharded evaluation in a distributed environment.
- Parameters:
reset_state (bool) – If true, clear the data collected so far.
- Return type:
Dict
[str
,Dict
[str
,List
[Tuple
[str
,float
]]]]- Returns:
A dict with keys
'topk_docs'
and'special_docs'
which hold the content of attributes of this class with the same name.
- merge_result_dump(results)
Merge the results exported from another instance of this class with their
export_result_dump()
method.- Parameters:
results (Dict) – The exported results from the other instances of
ResultHeapq
.- Return type:
None
- get_state_dict()
Export all the attributes of this class in a dictionary.
You can use the output of this method to recreate an identical instance of this class later.
- Return type:
Dict
- Returns:
mapping from the name of attributes of this class to their value.