Vector Similarity

  • Vector similarity measures similarities between

    • 2 docs

    • A question and a doc

    • Doc within a genre

    • Docs with the same sentiment

Ad hoc Information Retrieval (IR)

  • Find the doc that best matches the query

  • Docs are unordered set

  • Each term in the document represents a score

TFIDF = Common weight for vector

  • Term frequency: number of times term t occurs in the document

  • Inverse document frequency:

$$
log(\frac{# docs}{# of docs containing t})
$$
  • TFIDF(t) = TF(t) X IDF(t)

    • Score terms highly that occur frequently in the doc or query

    • Vice versa

Evaluation Metrics: Precision, recall, F-measure

  • System Output = answers from a system

  • Answer Key = correct answers from humans

  • Correct = length (System Output ∩ Answer Key)

  • Precision = Correct ÷ length(System Output)

  • Recall = Correct ÷ length(Answer Key)

  • F = 2/1/precision + 1/recall