Vector Similarity

Vector similarity measures similarities between
- 2 docs
- A question and a doc
- Doc within a genre
- Docs with the same sentiment

Ad hoc Information Retrieval (IR)

Find the doc that best matches the query
Docs are unordered set
Each term in the document represents a score

TFIDF = Common weight for vector

Term frequency: number of times term t occurs in the document
Inverse document frequency:

$$
log(\frac{# docs}{# of docs containing t})
$$

TFIDF(t) = TF(t) X IDF(t)
- Score terms highly that occur frequently in the doc or query
- Vice versa

Evaluation Metrics: Precision, recall, F-measure

System Output = answers from a system
Answer Key = correct answers from humans
Correct = length (System Output ∩ Answer Key)
Precision = Correct ÷ length(System Output)
Recall = Correct ÷ length(Answer Key)
F = 2/1/precision + 1/recall

PreviousHMM and Part of Speech Tagging NextLexical Semantics