How do we predict sentences of words? How to distinguish sentences?
Statiscally
N gram prob predicts occurance of words based on occurance of N-1 previous words
Syntactically
Depends on the logic of the writer
phrases are sequences of words that form equivalence classes based on how all phrases fit together
Replacing an NP in a well-formed sentence with another NP will probably result in a well-formed sentence.
Probability distribtion over sequence of words
Used to rank word by likelyhood
Statisical on words are derived from training corpus(brown corpus)
Then used to predict in other corpora