Monday, 30 March 2015

Week 2 Guiding Questions

What is the typical architecture of a text retrieval system?
A typical architecture of a text retrieval system includes
              - tokenizer -> indexer (offline)
              - tokenizer -> scorer (online)
              - feedback process (offline and online)

What is an inverted index?
Inverted index is a list of documents that match every term in the vocabulary. It is the dominating indexing method for supporting basic search algorithms. 

Why is it desirable for compressing an inverted index?
Inverted index compression is desirable to help improve speed because I/O takes more time than CPU, when compressing, the size will be smaller so we can reduce I/O not to mention saving disk space. 

How can we create an inverted index when the collection of documents does not fit into the memory?
When the collection does not fit into memory, we use sort-based methods on the disk.  
Sort-based methods:
– Step 1: Collect local (termID, docID, freq) tuples
– Step 2: Sort local tuples (to make “runs”)
– Step 3: Pair-wise merge runs
– Step 4: Output inverted file

How can we leverage an inverted index to score documents quickly?
 Exploit inverted index to accumulate scores for documents matching a query term
 Exploit Zipf’s law to avoid touching many documents not matching any query term

Why is evaluation so critical for research and application development in text retrieval?
Reason 1: Assess the actual utility of a TR system
Reason 2: Compare different systems and methods

How does Cranfield evaluation methodology work?
Cranfield evaluation methodolgy works by building reusable test collections & defining measures 

How do we evaluate a set of retrieved documents?
Precision and Recall are the basic measures to evaluate a set of retrieved documents.

Precision to what extent all retrieved results are relevant 
Recall measures the completeness of coverage of relevant documents

Precision: are the retrieved results all relevant?
Recall: have all the relevant documents been retrieved? 

How do you compute precision, recall, and F1?
Below are the formula to compute precision, recall and F1.

















How do we evaluate a ranked list of search results?
We evaluate ranked list of search results by looking at precision and recall at different cut offs (where user will stop).  We also compare PR curves which can showcase the utility of the a TR system between compared systems. We also use a single number called average precision that would accurately depict which one is better. Average precision applies to one query(in contrast to mean average precision) and it is very sensitive to slight changes in PR values hence it is good to use.

How do you compute average precision? 
We compute average precision by getting the sum all precision values at distinct recall levels and divide it by the total number of relevant documents in the collection. 

How do you compute mean average precision (MAP) and geometric mean average precision (gMAP)?
MAP is computed by getting the arithmetic mean of average precision over a set of queries
gMAP is computed by getting the geometric mean of average precision over a set of queries

gMAP tends to be affected more by low values(in contrast to MAP) which are queries that do not have good performance. If you think of improving search engines for difficult queries, gMAP is preferred. User decides preference between MAP and gMAP.

What is mean reciprocal rank?
Mean reciprocal rank is the average reciprocal rank over a set of topics. Reciprocal rank = 1/r
where r is the rank position of the single relevant doc.

Why is MAP more appropriate than precision at k documents when comparing two retrieval methods?
MAP is more appropriate to use as it considers combination of precision and recall and it results to a more sensitivity to the rank of every relevant document.

Why is precision at k documents more meaningful than average precision from a user’s perspective?
Precision at k documents is more meaningful than average precision from a user's perspective because it is the actual precision value assigned to each document. Average precision may be misleading.    

How can we evaluate a ranked list of search results using multi-level relevance judgments?
We can evaluate a ranked list of search results using multi-level relevance judgments by using nDCG.
Main idea of nDCG @k documents
– Measure the total utility of the top k documents(cut off) to a user
– Utility of a lowly ranked document is discounted
– Normalized to ensure comparability across queries 

How do you compute normalized discounted cumulative gain (nDCG)?
You compute nDCG by DCG@k documents 
divided by IdealDCG@k documents

Why is normalization necessary in nDCG? Does MAP need a similar normalization?
Normalization is necessary in order to ensure compatibility across queries.MAP does not evaluate multi-level judgment so it does not need similar normalization process.  

Why is it important to perform a statistical significance test when we compare the retrieval accuracies of two search engine systems?
We need to do statistical significance test to see the actual distribution of precision at k documents. Also, to assess the variance of different queries. This is because average precision may be misleading.
   
Key Phrases/Concepts
- Inverted index; postings
- Binary coding; unary coding; gamma-coding; d-gap
- Zipf’s law
- Cranfield evaluation methodology
- Precision; recall
- Average precision; mean average precision (MAP); geometric mean average precision (gMAP)
- Reciprocal rank; mean reciprocal rank
- F-measure
- Normalized discounted cumulative gain (nDCG)
- Statistical significance test

No comments:

Post a Comment