Given
set of documents about topic (foreground)
set of documents about diverse topic (background)
Find ranked list of terms
Uses
terms for previously described tasks, search terms, forecasting, etc
In-line term system
Find instances of terms(tokens)
Distributional Term system
Find terms types
Ranks term types by charactiristics ness to a particular topic
Top n terms types are kept, rest are discarded
Uses metrics similar to TF-IDF
Manual rule based chunker
identifies sequence of nouns and adjectives using POS
Identify technical words
well-formedness filter
eliminates ill-formed, terms without oov or tech words, or names
Supplementary patterns
abbreviation patterns
terms matching regex patterns
A term is well formed if it is:
abbreviation
a single oov
matches a regex pattern
Time consuming
yahoo search for exact match of term
Calculate relavance
H^2 * T
0-1 score based on a log function
T = percentage of top 10 hits that are articles or patents
Based on keyword search