For this reason, a special kind of dictionary called a The above examples specified the default value of a dictionary entry to be the default value of a particular data type.However, we can specify any default value we like, simply by providing the name of a function that can be called with no arguments to create the required value.Consider the following analysis involving By convention in NLTK, a tagged token is represented using a tuple consisting of the token and the tag.We can create one of these special tuples from the standard string representation of a tagged token, using the function Other corpora use a variety of formats for storing part-of-speech tags.It is like a conventional dictionary, in that it gives you an efficient way to look things up.However, as we see from 3.1, it has a much wider range of uses. As we saw above (line ), this gives us the key-value pairs.

In contrast with the file fragment shown above, the corpus reader for the Brown Corpus represents the data as shown below.

This will be useful when we come to developing automatic taggers, as they are trained and tested on lists of sentences, not words. Let's inspect some tagged text to see what parts of speech occur before a noun, with the most frequent ones first.

To begin with, we construct a list of bigrams whose members are themselves word-tag pairs such as Note that the items being counted in the frequency distribution are word-tag pairs.

Since words and tags are paired, we can treat the word as a condition and the tag as an event, and initialize a conditional frequency distribution with a list of condition-event pairs.

This lets us see a frequency-ordered list of tags given a word: We can reverse the order of the pairs, so that the tags are the conditions, and the words are the events. We will do this for the WSJ tagset rather than the universal tagset: Finally, let's look for words that are highly ambiguous as to their part of speech tag.

These techniques are useful in many areas, and tagging gives us a simple context in which to present them.

