Distribution of Words in Sentences: N-grams, Phrase Structure Syntax and Parsing

  • How do we predict sentences of words? How to distinguish sentences?

  • Statiscally

    • N gram prob predicts occurance of words based on occurance of N-1 previous words

  • Syntactically

    • Depends on the logic of the writer

    • phrases are sequences of words that form equivalence classes based on how all phrases fit together

    • Replacing an NP in a well-formed sentence with another NP will probably result in a well-formed sentence.

Statistical Model

  • Probability distribtion over sequence of words

  • Used to rank word by likelyhood

Training vs Dev/Test corpus

  • Statisical on words are derived from training corpus(brown corpus)

  • Then used to predict in other corpora