.. -*- mode: rst -*-
.. include:: ../definitions.txt

========================
9. Feature Based Grammar
========================

--------------
 Introduction
--------------


-----------------
Agreement in CFGs
-----------------

The Problem
-----------

Consider the following contrasts:

.. _thisdog:
.. ex::
   .. ex::
      this dog
   .. ex::
      \*these dog

.. _thesedogs:
.. ex::
   .. ex::
      these dogs
   .. ex::
      \*this dog

In English, nouns are usually morphologically marked as being singular
or plural. The form of the demonstrative also varies in a
similar way; there is a singular form `this`:lx: and a plural form `these`:lx:.
Examples thisdog_ and thesedogs_ show that there are constraints on
the realization of demonstratives and nouns within a noun phrase:
either both are singular or both are plural. A similar kind
of constraint is observed with subjects and predicates:

.. _subjpredsg:
.. ex::
   .. ex::
      the dog runs
   .. ex::
      \*the dog run

.. _subjpredpl:
.. ex::
   .. ex::
      the dogs run
   .. ex::
      \*the dogs runs


Here again, we can see that morphological properties of the verb co-vary
with morphological properties of the subject noun phrase; this co-variance is
usually termed `agreement`:dt: The element which determines the
agreement, here the subject noun phrase, is called the agreement
`controller`:dt:, while the element whose form is determined by
agreement, here the verb, is called the `target`:dt:.
If we look further at verb agreement in English, we will see that
present tense verbs typically have two inflected forms: one for third person
singular, and another for every other combination of person and
number:

.. ex::
 +--------+-------------+--------+
 |        |singular     |plural  |
 +--------+-------------+--------+
 |1st per |I run        |we run  |
 +--------+-------------+--------+
 |2nd per |you run      |you run |
 +--------+-------------+--------+
 |3rd per |he/she/it    |they run|
 |        |runs         |        |
 +--------+-------------+--------+

We can make the role of morphological properties a bit more explicit as
illustrated in runs_ and run_. These representations indicate that the verb agrees with its
subject in person and number.

.. _runs:
.. ex::

 +----------+----------+----------+
 |the       |dog       |run-s     |
 +----------+----------+----------+
 |          |dog.3.SG  |run-3.SG  |
 +----------+----------+----------+


.. _run:
.. ex::

 +------------+------------+------------+
 |the         |dog-s       |run         |
 +------------+------------+------------+
 |            |dog-3.PL    |run.3.PL    |
 +------------+------------+------------+


Despite the undoubted interest of agreement as a topic in its own
right, we have introduced it here for another reason: we want to look
at what happens when  we try encode agreement constraints in a context-free grammar. 
Suppose we take as our starting point the very simple CFG in agcfg0_.

.. _agcfg0:
.. ex::
   .. parsed-literal::

     `S`:gc: |rarr| `NP VP`:gc:
     `NP`:gc: |rarr| `Det N`:gc: 
     `VP`:gc: |rarr| `V`:gc: 

     `Det`:gc: |rarr| 'this'
     `N`:gc: |rarr| 'dog'
     `V`:gc: |rarr| 'runs'

agcfg0_ allows us to generate the sentence `this dog runs`:lx:;
however, what we really want to do is also generate `these dogs
run`:lx: while blocking unwanted strings such as `*this dogs run`:lx:
and `*these dog runs`:lx:. The most straightforward approach is to
add new non-terminals and productions to the grammar which reflect our
number distinctions and agreement constraints (we ignore person for the time being):

.. _agcfg1:
.. ex::
   .. parsed-literal::

     `S_SG`:gc: |rarr| `NP_SG VP_SG`:gc:
     `S_PL`:gc: |rarr| `NP_PL VP_PL`:gc:
     `NP_SG`:gc: |rarr| `Det_SG N_SG`:gc: 
     `NP_PL`:gc: |rarr| `Det_PL N_PL`:gc: 
     `VP_SG`:gc: |rarr| `V_SG`:gc: 
     `VP_PL`:gc: |rarr| `V_PL`:gc: 

     `Det_SG`:gc: |rarr| 'this'
     `Det_PL`:gc: |rarr| 'these'
     `N_SG`:gc: |rarr| 'dog'
     `N_PL`:gc: |rarr| 'dogs'
     `V_SG`:gc: |rarr| 'runs'
     `V_PL`:gc: |rarr| 'run'

It should be clear that this grammar will do the required
task, but only at the cost of duplicating our previous set of
rules. Rule multiplication is of course more severe if we add in
person agreement constraints.

Exercises
---------

#. Augment agcfg1_ so that it will generate strings like `I am
   happy`:lx: and `she is happy`:lx: but not `*you is happy`:lx: or
   `*they am happy`:lx:.

#. Augment agcfg1_ so that it will correctly describe the following
   Spanish noun phrases:

   .. ex::
      .. ex::

	 +---------------------+--------------------+--------------------+
	 |un                   |cuadro              |hermos-o            |
	 +---------------------+--------------------+--------------------+
	 |INDEF.SG.MASC        |picture             |beautiful-SG.MASC   |
	 +---------------------+--------------------+--------------------+
	 |'a beautiful picture'                     |                    |
	 +------------------------------------------+--------------------+

      .. ex::

	 +---------------------+--------------------+--------------------+
	 |un-os                |cuadro-s            |hermos-os           |
	 +---------------------+--------------------+--------------------+
	 |INDEF-PL.MASC        |picture-PL          |beautiful-PL.MASC   |
	 +---------------------+--------------------+--------------------+
	 |'beautiful pictures'                      |                    |
	 +------------------------------------------+--------------------+

      .. ex::

	 +---------------------+--------------------+--------------------+
	 |un-a                 |cortina             |hermos-a            |
	 +---------------------+--------------------+--------------------+
	 |INDEF-SG.FEM         |curtain             |beautiful-SG.FEM    |
	 +---------------------+--------------------+--------------------+
	 |'a beautiful curtain'                     |                    |
	 +------------------------------------------+--------------------+

      .. ex::

	 +---------------------+--------------------+--------------------+
	 |un-as                |cortina-s           |hermos-as           |
	 +---------------------+--------------------+--------------------+
	 |INDEF-PL.FEM         |curtain-PL          |beautiful-SG.FEM    |
	 +---------------------+--------------------+--------------------+
	 |'beautiful curtains'                      |                    |
	 +------------------------------------------+--------------------+

.. In grammatical terms, we might say that both nouns and
   demonstratives have a property of `number`:gc: which can take the values singular or
   plural.


Using Attributes and Constraints
--------------------------------

We spoke informally of linguistic categories having *properties*; for
example, that a verb has the property of being plural. Let's try to
make this more explicit:

.. _num0:
.. ex::
   .. parsed-literal::

     `N`:gc:\ [`num`:feat: = `pl`:fval:\ ]

In num0_, we have introduced some  new notation which says that the category `N`:gc: has a 
`feature`:dt: called `num`:feat: (short for 'number') and that the
value of this feature is `pl`:fval: (short for 'plural'). We can add
similar annotations to other categories, and use them in lexical
entries:

.. _agcfg2:
.. ex::
   .. parsed-literal::

     `Det`:gc:\ [`num`:feat: = `sg`:fval:\ ] |rarr| 'this'
     `Det`:gc:\ [`num`:feat: = `pl`:fval:\ ]  |rarr| 'these'
     `N`:gc:\ [`num`:feat: = `sg`:fval:\ ] |rarr| 'dog'
     `N`:gc:\ [`num`:feat: = `pl`:fval:\ ] |rarr| 'dogs'
     `V`:gc:\ [`num`:feat: = `sg`:fval:\ ] |rarr| 'runs'
     `V`:gc:\ [`num`:feat: = `pl`:fval:\ ] |rarr| 'run'

Does this help at all? So far, it looks just like a slightly more
verbose alternative to what was specified in agcfg1_. Things become
more interesting when we allow *variables* over feature values, and use
these to state constraints. This is illustrated in agcfg3_.

.. _agcfg3:
.. ex::
   .. _srule:
   .. ex::
      .. parsed-literal::

        `S`:gc: |rarr| `NP`:gc:\ [`num`:feat: = `?n`:math:\ ] `VP`:gc:\ [`num`:feat: = `?n`:math:\ ]

   .. _nprule:
   .. ex::
      .. parsed-literal::

       `NP`:gc:\ [`num`:feat: = `?n`:math:\ ] |rarr| `Det`:gc:\ [`num`:feat: = `?n`:math:\ ] `N`:gc:\ [`num`:feat: = `?n`:math:\ ]

   .. _vprule:
   .. ex::
      .. parsed-literal::

       `VP`:gc:\ [`num`:feat: = `?n`:math:\ ] |rarr| `V`:gc:\ [`num`:feat: = `?n`:math:\ ]

We are using '`?n`:math:' as a variable over values of `num`:feat:; it can
be instantiated either to `sg`:fval: or `pl`:fval:. Its scope is
limited to individual rules. That is, within srule_, for example,
`?n`:math: must be instantiated to the same constant value; we can
read the rule as saying that whatever value `NP`:gc: takes for the feature
`num`:feat:, `VP`:gc: must take the same value. 

In order to understand how these feature constraints work, it's
helpful to think about how one would go about building a tree. Lexical
rules will admit the following local trees (trees of
depth one):

.. ex::
   .. _this:
   .. ex:: 
      .. tree:: (Det[num=sg] this) 
   .. _these:
   .. ex:: 
      .. tree:: (Det[num=pl] these) 

.. ex::
   .. _dog:
   .. ex:: 
      .. tree:: (N[num=sg] dog) 
   .. _dogs:
   .. ex:: 
      .. tree:: (N[num=pl] dogs) 

Now nprule_ says that whatever the `num`:feat: values of `N`:gc: and
`Det`:gc: are, they have to be the same. Consequently,  nprule_ will
permit this_ and dog_ to be combined into an `NP`:gc:  as shown in
good1_ and it will also allow these_ and dogs_ to be combined, as in
good2_. By contrast,  bad1_ and bad2_ are prohibited because the roots
of their
constituent local trees differ in their values for the `num`:feat: feature.

.. ex::
   .. _good1:
   .. ex::
      .. tree:: (NP[num=pl] (Det[num=sg] this)(N[num=sg] dog))

   .. _good2:
   .. ex::
      .. tree:: (NP[num=pl] (Det[num=pl] these)(N[num=pl] dogs))

.. ex::
   .. _bad1:
   .. ex::
      .. tree:: (NP[num=...] (Det[num=sg] this)(N[num=pl] dogs))

   .. _bad2:
   .. ex::
      .. tree:: (NP[num=...] (Det[num=pl] these)(N[num=sg] dog))

Rule vprule_ can be thought of as saying that `num`:feat: value of the
head verb has to be the same as the `num`:feat: value of the `VP`:gc:
mother. Combined with srule_, we derive the consequence that if the
`num`:feat: value of the subject head noun is `pl`:fval:, then so is
the `num`:feat: value of the `VP`:gc:\ 's head verb.

.. ex::
   .. tree:: (S (NP[num=pl] (Det[num=pl] these)(N[num=pl] dogs))(VP[num=pl] (V[num=pl] run)))

Grammar feat0cfg_ illustrates most of the ideas we have introduced so
far in this chapter, plus a couple of new ones.

.. _feat0cfg:
.. ex::
   ::

     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     % Grammar Rules
     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     Start -> S

     S -> NP[num=?n] VP[num=?n]

     % NP expansion rules
     NP[num=?n] -> N[num=?n] 
     NP[num=?n] -> PropN[num=?n] 
     NP[num=?n] -> Det[num=?n] N[num=?n]
     NP[num=pl] -> N[num=pl] 

     % VP expansion rules
     VP[tense=?t, num=?n] -> IV[tense=?t, num=?n]
     VP[tense=?t, num=?n] -> TV[tense=?t, num=?n] NP

     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     % Lexical Rules
     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     Det[num=sg] -> 'this' | 'every'
     Det[num=pl] -> 'these' | 'all'
     Det[num=?n] -> 'the' | 'some'

     PropN[num=sg]-> 'Kim' | 'Jody'

     N[num=sg] -> 'dog' | 'girl' | 'car' | 'child'
     N[num=pl] -> 'dogs' | 'girls' | 'cars' | 'children' 

     IV[tense=pres,  num=sg] -> 'disappears' | 'walks'
     TV[tense=pres, num=sg] -> 'sees' | 'likes'

     IV[tense=pres,  num=pl] -> 'disappear' | 'walk'
     TV[tense=pres, num=pl] -> 'see' | 'like'

     IV[tense=past, num=?n] -> 'disappeared' | 'walked'
     TV[tense=past, num=?n] -> 'saw' | 'liked'

First, you will notice that a feature annotation on a syntactic
category can contain more than one specification; for example,
`V`:gc:\ [`tense`:feat: = `pres`:fval:, `num`:feat: = `pl`:fval:\
]. In general, there is no upper bound on the number of features we
specify as part of our syntactic categories.

Second, we have used feature variables in lexical entries as well
as grammatical rules. For example, `the`:lx: has been assigned the
category `Det`:gc:\ [`num`:feat: = `?n`:math:]. Why is this?  Well,
you know that the definite article `the`:lx: can combine with both
singular and plural nouns. One way of describing this would be to add
two lexical entries to the grammar, one each for the singular and
plural versions of `the`:lx:. However, a more elegant solution is to
leave the `num`:feat: value `underspecified`:dt: and letting it agree
in number with whatever noun it combines with.

A final point to note about feat0cfg_ is that we have used ``%`` as an
escape symbol in order to add comments to the grammar.

In general, when we are trying to develop even a very small grammar,
it is convenient to put the rules in a file where they can be edited,
tested and revised. Assuming we have saved feat0cfg_ as a file named
'feat0.cfg', the function ``GrammarFile.read_file()`` allows us to
read the grammar into NLTK, ready for use in parsing.

.. doctest-ignore::
    >>> from nltk_lite.contrib.grammarfile import GrammarFile
    >>> from pprint import pprint
    >>> from nltk_lite import tokenize
    >>> g = GrammarFile.read_file('feat0.cfg')
    >>>

We can inspect the rules and the lexicon.

.. doctest-ignore::
    >>> print g.earley_grammar()
    Grammar with 7 productions (start state = Start[])
	Start -> S
	S -> NP[num=?n] VP[num=?n]
	NP[num=?n] -> N[num=?n]
	NP[num=?n] -> PropN[num=?n]
	NP[num=?n] -> Det[num=?n] N[num=?n]
        NP[num=pl] -> N[num=pl]
	VP[num=?n, tense=?t] -> IV[num=?n, tense=?t]
	VP[num=?n, tense=?t] -> TV[num=?n, tense=?t] NP
    >>> pprint(g.earley_lexicon())
     {'Jody': [PropN[num=sg]],
      'Kim': [PropN[num=sg]],
      'all': [Det[num=pl]],
      'car': [N[num=sg]],
      'cars': [N[num=pl]],
      'child': [N[num=sg]],
      'children': [N[num=pl]],
      'disappear': [IV[num=pl, tense=pres]],
      'disappeared': [IV[num=?n, tense=past]],
      'disappears': [IV[num=sg, tense=pres]],
      'dog': [N[num=sg]],
      'dogs': [N[num=pl]],
      'every': [Det[num=sg]],
      'girl': [N[num=sg]],
      'girls': [N[num=pl]],
      'like': [TV[num=pl, tense=pres]],
      'liked': [TV[num=?n, tense=past]],
      'likes': [TV[num=sg, tense=pres]],
      'saw': [TV[num=?n, tense=past]],
      'see': [TV[num=pl, tense=pres]],
      'sees': [TV[num=sg, tense=pres]],
      'some': [Det[num=?n]],
      'the': [Det[num=?n]],
      'these': [Det[num=pl]],
      'this': [Det[num=sg]],
      'walk': [IV[num=pl, tense=pres]],
      'walked': [IV[num=?n, tense=past]],
      'walks': [IV[num=sg, tense=pres]]}
     >>> 

Now we can tokenize a sentence and use the ``parse_n()`` function to
invoke the Earley chart parser.

.. doctest-ignore::
     >>> from nltk_lite import tokenize
     >>> sent = 'Kim likes children'
     >>> tokens = list(tokenize.whitespace(sent))
     >>> tokens 
     ['Kim', 'likes', 'children']
     >>> cp = g.earley_parser()
     >>> trees = cp.parse_n(tokens)
	       |.K.l.c.|
     Predictor |> . . .| Start -> * S 
     Predictor |> . . .| S -> * NP[num=?n] VP[num=?n] 
     Predictor |> . . .| NP[num=?n] -> * N[num=?n] 
     Predictor |> . . .| NP[num=?n] -> * PropN[num=?n] 
     Predictor |> . . .| NP[num=?n] -> * Det[num=?n] N[num=?n] 
     Predictor |> . . .| NP[num=pl] -> * N[num=pl] 
     Scanner   |[-] . .| PropN[num=sg] -> 'Kim' * 
     Completer |[-] . .| NP[num=sg] -> PropN[num=sg] * 
     Completer |[-> . .| S -> NP[num=sg] * VP[num=sg] 
     Predictor |. > . .| VP[num=?n, tense=?t] -> * IV[num=?n, tense=?t] 
     Predictor |. > . .| VP[num=?n, tense=?t] -> * TV[num=?n, tense=?t] NP 
     Scanner   |. [-] .| TV[num=sg, tense=pres] -> 'likes' * 
     Completer |. [-> .| VP[num=sg, tense=pres] -> TV[num=sg, tense=pres] * NP 
     Predictor |. . > .| NP[num=?n] -> * N[num=?n] 
     Predictor |. . > .| NP[num=?n] -> * PropN[num=?n] 
     Predictor |. . > .| NP[num=?n] -> * Det[num=?n] N[num=?n] 
     Predictor |. . > .| NP[num=pl] -> * N[num=pl] 
     Scanner   |. . [-]| N[num=pl] -> 'children' * 
     Completer |. . [-]| NP[num=pl] -> N[num=pl] * 
     Completer |. [---]| VP[num=sg, tense=pres] -> TV[num=sg, tense=pres] NP * 
     Completer |[=====]| S -> NP[num=sg] VP[num=sg] * 
     Completer |[=====]| Start -> S * 
     Completer |[=====]| [INIT] -> Start * 
     >>> 

Finally, we can inspect the resulting parse trees (in this case, a
single one).

.. doctest-ignore::
     >>> for tree in trees: print tree
     ... 
     ([INIT]:
       (Start:
	 (S:
	   (NP[num=sg]: (PropN[num=sg]: 'Kim'))
	   (VP[num=sg, tense=pres]:
	     (TV[num=sg, tense=pres]: 'likes')
	     (NP[num=pl]: (N[num=pl]: 'children'))))))
     >>>


Exercises
---------

#. Redo the previous two exercises, but using feat0cfg_ as your
   starting point.

.. Can we do cross-references?




Terminology
-----------


So far, we have only seen feature values like `sg`:fval: and
`pl`:fval. These simple values are usually called `atomic`:dt:
|mdash| that is, they can't be decomposed into subparts. A special
case of atomic values are `boolean`:dt: values, that is, values which
just specify whether a property is true or false of a category. For
example, we might want to distinguish `auxiliary`:dt: verbs such as
`can`:lx:, `may`:lx:, `will`:lx: and `do`:lx: with the boolean feature
`aux`:feat:. Thus, our lexicon for verbs might include entries such as
the following:

.. ex::
      .. parsed-literal::

        `V`:gc:\ [`tense`:feat: = `pres`:fval:, `aux`:feat: = `+`:math:\ ] |rarr| 'can'
        `V`:gc:\ [`tense`:feat: = `pres`:fval:, `aux`:feat: = `+`:math:\ ] |rarr| 'may'

        `V`:gc:\ [`tense`:feat: = `pres`:fval:, `aux`:feat: = `-`:math:\ ] |rarr| 'walks'
        `V`:gc:\ [`tense`:feat: = `pres`:fval:, `aux`:feat: = `-`:math:\ ] |rarr| 'likes'


A frequently used abbreviation for boolean features allows the value
to be prepended to the feature:

.. ex::
      .. parsed-literal::

        `V`:gc:\ [`tense`:feat: = `pres`:fval:, `+aux`:feat:\ ] |rarr| 'can'
        `V`:gc:\ [`tense`:feat: = `pres`:fval:, `-aux`:feat:\ ] |rarr| 'walks'


We have spoken informally of attaching 'feature annotations' to
syntactic categories. A more general
approach is to treat the whole category |mdash| that is, the
non-terminal symbol plus the annotation |mdash| as a bundle of
features. Consider, for example, the object we have written as ncat0_.

.. _ncat0:
.. ex::
      .. parsed-literal::

        `N`:gc:\ [`num`:feat: = `sg`:fval:\ ] 

The syntactic category `N`:gc:, as we have seen before, provides part
of speech information. This information can itself be captured as a
feature specification, as shown in ncat1_.

.. _ncat1:
.. ex::
      .. parsed-literal::

        [`pos`:feat: = `N`:fval:, `num`:feat: = `sg`:fval:\ ] 

In fact, we  regard ncat1_ as our 'official' representation of a
feature-based linguistic category, and
ncat0_ as a convenient abbreviation.
A bundle of feature-value pairs is called a `feature structure`:dt:
or an `attribute value matrix`:dt: (AVM). A feature structure which
contains a specification for the feature `pos`:feat: is a `linguistic
category`:dt:. 

In addition to atomic-valued features, we allow features whose values
are themselves feature structures. For example, we might want to group
together agreement features (e.g., person, number and gender) as a
distinguished part of a category, as shown in agr0_.

.. _agr0:
.. ex::
      .. parsed-literal::

        [pos: N
         agr: [per: 3
               num: pl
               gend: fem]]


In this case, we say that the feature `agr`:feat: has a `complex`:dt: value.

There is no particular significance to the *order* of features in a
feature structure. So agr0_ is equivalent to agr0_.

.. _agr1:
.. ex::
      .. parsed-literal::

        [agr: [num: pl
               per: 3
               gend: fem]
         pos: N]



Feature Structures in NLTK and Unification
------------------------------------------

Feature structures in NLTK are declared with the
``FeatureStructure()`` constructor. Atomic feature values can be strings or
integers.

     >>> from nltk_lite.parse.featurestructure import *
     >>> fs1 = FeatureStructure(tense='past', num='sg') 
     >>> print fs1
     [ num   = 'sg'   ]
     [ tense = 'past' ]
     >>> 

We can think of a feature structure as being like a Python dictionary,
and access its values by indexing in the usual way.

     >>> fs1 = FeatureStructure(per=3, num='pl', gend='fem')
     >>> print fs1['gend']
     fem
     >>> 

However, we cannot use this syntax to *assign* values to features:

     >>> fs1[case] = 'acc'
     Traceback (most recent call last):
       File "<stdin>", line 1, in ?
     NameError: name 'case' is not defined
     >>> 

We can also define feature structures which have complex values, as
discussed earlier.

     >>> fs2 = FeatureStructure(pos='N', agr=fs1)
     >>> print fs2
     [       [ gend = 'fem' ] ]
     [ agr = [ num  = 'pl'  ] ]
     [       [ per  = 3     ] ]
     [                        ]
     [ pos = 'N'              ]
     >>> print fs2['agr']
     [ gend = 'fem' ]
     [ num  = 'pl'  ]
     [ per  = 3     ]
     >>> print fs2['agr']['per']
     3
     >>> 

An alternative method of specifying feature structures in NLTK is to
use the ``parse()`` method of ``FeatureStructure``. This gives us the
facility to use square bracket notation for embedding one feature
structure within another.

     >>> FeatureStructure.parse("[pos='N', agr=[per=3, num='pl', gend='fem']]")
     [agr=[gend='fem', num='pl', per=3], pos='N']
     >>>


Feature structures are not inherently tied to linguistic objects; they are
general purpose structures for representing knowledge. For example, we
could encode information about a person in a feature structure:

     >>> person01 = FeatureStructure(name='Lee', telno='01 27 86 42 96', age=33)
     >>> >>> print person01
     [ age   = 33         ]
     [ name  = 'Lee'      ]
     [ telno = '01 27 86 42 96' ]
     >>> 

It is sometimes helpful to picture feature structures as graphs; more
specifically, `directed acyclic graphs`:dt: (DAGs). dag01_ is equivalent to
the feature structure ``person01`` just shown.

.. _dag01:
.. ex::
   .. image:: ../images/dag01.png
      :scale: 40

The feature names appear as labels on the directed arcs, and feature
values appear as labels on the nodes which are pointed to by the arcs.

Just as before, feature values can be complex:

.. _dag02:
.. ex::
   .. image:: ../images/dag02.png
      :scale: 40

When we look at such graphs, it is natural to think in terms of
paths through the graph. A `feature path`:dt: is a sequence of arcs
that can be followed from the root node. We will represent paths in NLTK as
tuples. Thus, ``('address', 'street')`` is a feature path whose value
in dag02_
is the string 'rue Pascal'.

Now let's consider a situation where Lee has a spouse named 'Kim', and
Kim's address is the same as Lee's.
We might represent this as dag04_.

.. _dag04:
.. ex::
   .. image:: ../images/dag04.png
      :scale: 40

However, rather than repeating the address
information in the feature structure, we can 'share' the same
sub-graph between different arcs:

.. _dag03:
.. ex::
   .. image:: ../images/dag03.png
      :scale: 40



In other words, the value of the path ``('address')`` in dag03_ is identical to
the value of the path ``('spouse', 'address')``.
DAGs such as dag03_ are said to involve `structure sharing`:dt: or
`reentrancy`:dt:. When two paths have the same value, they are said to
be `equivalent`:dt:.

There are a number of notations for representing reentrancy in
matrix-style representations of feature structures. In NLTK, we adopt
the following convention: the first occurrence of a shared feature structure 
is prefixed with an integer in parentheses, such as ``(1)``, and any
subsequent reference to that structure uses the notation
``'->(1)'``, as shown below.

     >>> fs=FeatureStructure.parse("[name='Lee', address=(1)[number=74, street='rue Pascal'], spouse=[name='Kim', address->(1)]]")
     >>> print fs
     [ address = (1) [ number = 74           ] ]
     [               [ street = 'rue Pascal' ] ]
     [                                         ]
     [ name    = 'Lee'                         ]
     [                                         ]
     [ spouse  = [ address -> (1)  ]           ]
     [           [ name    = 'Kim' ]           ]
     >>> 

The bracketed integer is sometimes called a `tag`:dt: or a
`coindex`:dt:. The choice of integer is not significant.
There can be any number of tags within a single feature structure.

     >>> fs = FeatureStructure.parse("[A='a', B=(1)[C='c'], D->(1), E->(1)]")
     >>> print fs
     [ A = 'a'             ]
     [                     ]
     [ B = (1) [ C = 'c' ] ]
     [                     ]
     [ D -> (1)            ]
     [ E -> (1)            ]
     >>> fs1 = FeatureStructure.parse("[A=(1)[], B=(2)[], C->(1), D->(2)]")
     >>> print fs
     [ A = (1) [] ]
     [            ]
     [ B = (2) [] ]
     [            ]
     [ C -> (1)   ]
     [ D -> (2)   ]
     >>> 

It is standard to think of feature structures as providing `partial
information`:dt: about some object, in the sense that we can order
feature structures according to how general they are. For example,
fs01_ is more general (less specific) than fs02_, which in turn is more general than fs03_.

.. ex::
   .. _fs01:
   .. ex::
      .. parsed-literal::

         [number: 74]

   .. _fs02:
   .. ex::
      .. parsed-literal::

         [number: 74
          street: 'rue Pascal']

   .. _fs03:
   .. ex::
      .. parsed-literal::

         [number: 74
          street: 'rue Pascal'
          city: 'Paris']

This ordering is called `subsumption`:dt:; a more general feature
structure `subsumes`:dt: a less general one. If `FS`:math:\
:subscript:`0` subsumes `FS`:math:\ :subscript:`1` (formally, we write
`FS`:math:\ :subscript:`0` |SquareSubsetEqual| `FS`:math:\
:subscript:`1`), then `FS`:math:\ :subscript:`1` must have all the
paths and path equivalences of `FS`:math:\ :subscript:`0`, and may
have additional paths and equivalences as well. Thus, dag04_ subsumes
dag03_, since the latter has additional path equivalences.. It should
be obvious that subsumption only provides a partial ordering on
feature structures, since some feature structures are
incommensurable. For example, fs04_ neither subsumes nor is subsumed
by fs01_.


.. _fs04:
.. ex::
   .. parsed-literal::

         [telno = '01 27 86 42 96']

So we have seen that some feature structures are more specific than
others. How do we go about specialising a given feature structure?
For example, we might decide that addresses should
consist of not just a street number and a street name, but also a
city. That is, we might want to *merge*  graph dag042_ with dag041_ to
yield dag043_.

.. ex::
     .. _dag041:
     .. ex::
	.. image:: ../images/dag04-1.png
	   :scale: 40

     .. _dag042:
     .. ex::
	.. image:: ../images/dag04-2.png
	   :scale: 40

     .. _dag043:
     .. ex::
	.. image:: ../images/dag04-3.png
	   :scale: 40



Merging information from two feature structures is called
`unification`:dt: and in NLTK is supported by the ``unify()`` method
defined in the ``FeatureStructure`` class.

     >>> fs1 = FeatureStructure(number=74, street='rue Pascal')
     >>> fs2 = FeatureStructure(city='Paris')
     >>> print fs1.unify(fs2)
     [ city   = 'Paris'      ]
     [ number = 74           ]
     [ street = 'rue Pascal' ]
     >>> 

Unification is formally defined as a binary operation: `FS`:math:\
:subscript:`0` |SquareIntersection| `FS`:math:\
:subscript:`1`. Unification is symmetric, so 

.. ex::
    `FS`:math:\ :subscript:`0` |SquareIntersection| `FS`:math:\
    :subscript:`1` = `FS`:math:\ :subscript:`1` |SquareIntersection|
    `FS`:math:\ :subscript:`0`.

The same is true in NLTK:

     >>> print fs2.unify(fs1)
     [ city   = 'Paris'      ]
     [ number = 74           ]
     [ street = 'rue Pascal' ]
     >>>

.. but >>> fs1.unify(fs2) is fs2.unify(fs1)
       False
   only works with repr()

If we unify two feature structures which stand in the subsumption
relationship, then the result of unification is the most specific of
the two:

.. ex::
    If `FS`:math:\ :subscript:`0` |SquareSubsetEqual| `FS`:math:\
    :subscript:`1`,  then `FS`:math:\ :subscript:`0`
    |SquareIntersection| `FS`:math:\ :subscript:`1` = `FS`:math:\
    :subscript:`1` 

For example, the result of unifying fs02_ with fs03_ is fs03_.

Unification between `FS`:math:\ :subscript:`0` and `FS`:math:\
:subscript:`1` will fail if the two feature structures share a path |pi|,
but the value of |pi| in `FS`:math:\ :subscript:`0` is a distinct
atom from the value of |pi| in `FS`:math:\ :subscript:`1`. In NLTK,
this is implemented by setting the result of unification to be
``None``.

     >>> fs0 = FeatureStructure(A='a')
     >>> fs1 = FeatureStructure(A='b')
     >>> fs2 = fs0.unify(fs1)
     >>> print fs2
     None
     >>>  

Now, if we look at how unification interacts with structure-sharing,
things become really interesting. First, let's define the NLTK version
of dag04_.

     >>> fs0=FeatureStructure.parse("[name='Lee', address=[number=74, street='rue Pascal'], spouse=[name='Kim', address=[number=74, street='rue Pascal']]]")
     >>> print fs0
     [ address = [ number = 74           ]               ]
     [           [ street = 'rue Pascal' ]               ]
     [                                                   ]
     [ name    = 'Lee'                                   ]
     [                                                   ]
     [           [ address = [ number = 74           ] ] ]
     [ spouse  = [           [ street = 'rue Pascal' ] ] ]
     [           [                                     ] ]
     [           [ name    = 'Kim'                     ] ]
     >>> 

what happens when we augment Kim's address with a specification
for `city`:feat:? (Notice that ``fs1`` includes the whole path from the root of
the feature structure down to `city`:feat:.)

     >>> fs1=FeatureStructure.parse("[spouse = [address = [city = 'Paris']]]")
     >>> print fs0.unify(fs1)
     [ address = [ number = 74           ]               ]
     [           [ street = 'rue Pascal' ]               ]
     [                                                   ]
     [ name    = 'Lee'                                   ]
     [                                                   ]
     [           [           [ city   = 'Paris'      ] ] ]
     [           [ address = [ number = 74           ] ] ]
     [ spouse  = [           [ street = 'rue Pascal' ] ] ]
     [           [                                     ] ]
     [           [ name    = 'Kim'                     ] ]
     >>> 

By contrast, the result is very different if ``fs1`` is unified with
the structure-sharing version, dag03_.

     >>> fs2=FeatureStructure.parse("[name='Lee', address=(1)[number=74, street='rue Pascal'], spouse=[name='Kim', address->(1)]]")
     >>> print fs2.unify(fs1)
     [               [ city   = 'Paris'      ] ]
     [ address = (1) [ number = 74           ] ]
     [               [ street = 'rue Pascal' ] ]
     [                                         ]
     [ name    = 'Lee'                         ]
     [                                         ]
     [ spouse  = [ address -> (1)  ]           ]
     [           [ name    = 'Kim' ]           ]
     >>>

Rather than just updating what was in effect Kim's 'copy' of Lee's address,
we have now updated *both* their addresses at the same time. More
generally, if a unification involves specialising the value of some
path |pi|, then that unification simultaneously specialises the value
of *any path that is equivalent to* |pi|.

As we have already seen, structure sharing can also be stated in NLTK
using variables such as ``?x``. 

     >>> fs1=FeatureStructure.parse("[address1=[number=74, street='rue Pascal']]")
     >>> fs2=FeatureStructure.parse("[address1=?x, address2=?x]")
     >>> print fs2
     [ address1 = ?x ]
     [ address2 = ?x ]
     >>> print fs2.unify(fs1)
     [ address1 = (1) [ number = 74           ] ]
     [                [ street = 'rue Pascal' ] ]
     [                                          ]
     [ address2 -> (1)                          ]
     >>> 

Exercises
---------

#. List two feature structures which subsume [A=?x, B=?x].

#. Ignoring structure sharing, give an informal algorithm for unifying
   two feature structures. 


---------------------------------
Extending a Feature-Based Grammar
---------------------------------


Subcatorization
---------------

In the chapter `Parsing <parse.html>`__, we proposed to augment our
category labels in order to represent different subcategories of
verb. More specifically,
we introduced labels such as `Vitr`:gc: and `Vtr`:gc: for intransitive
and transitive verbs respectively.  This allowed us to write rules
like the following:

.. _subcatcfg0:
.. ex::
   .. parsed-literal::

      `VP`:gc:  |rarr| `IV`:gc:  
      `VP`:gc:  |rarr| `TV NP`:gc: 

Although it is tempting to think of `IV`:gc: and `TV`:gc: as two
kinds of `V`:gc:, this is unjustified: from a formal point of view,
`IV`:gc: has no closer relationship with `TV`:gc: than it does,
say, with `NP`:gc:. As it stands, `IV`:gc: and `TV`:gc: are
unanalyzable nonterminal symbols from a CFG. One unwelcome consequence
is that we do not seem able to say anything about the class of verbs
in general. For example, we cannot say something like "All lexical
items of category `V`:gc: can be marked for tense", since `bark`:lx:,
say, is an item of category `IV`:gc:, not `V`:gc:.

Using features gives us some useful room for manoeuvre but there is no
obvious consensus on how to model subcategorization information. One
approach which has the merit of simplicity is due to Generalized
Phrase Structure Grammar (GPSG). GPSG stipulates that lexical
categories may bear a `subcat`:feat: whose values are integers. This
is illustrated in a modified portion of feat0cfg_, shown in subcatgpsg_.

.. _subcatgpsg:
.. ex::
   ::

     VP[tense=?t, num=?n] -> V[subcat=0, tense=?t, num=?n]
     VP[tense=?t, num=?n] -> V[subcat=1, tense=?t, num=?n] NP

     V[subcat=0, tense=pres,  num=sg] -> 'disappears' | 'walks'
     V[subcat=1, tense=pres, num=sg] -> 'sees' | 'likes'

     V[subcat=0, tense=pres,  num=pl] -> 'disappear' | 'walk'
     V[subcat=1, tense=pres, num=pl] -> 'see' | 'like'

     V[subcat=0, tense=past, num=?n] -> 'disappeared' | 'walked'
     V[subcat=1, tense=past, num=?n] -> 'saw' | 'liked'

When we see a lexical category like `V`:gc:\ [`subcat`:feat: =
`1`:fval:\ ], we can interpret the `subcat`:feat: specification as a
pointer to the rule in which `V`:gc:\ [`subcat`:feat: = `1`:fval:\ ]
is introduced as the head daughter in a `VP`:gc: expansion rule. By
convention, there is a one-to-one correspondence between
`subcat`:feat: values and rules which introduce lexical
heads. It's worth noting that the choice of integer which acts as a value for `subcat`:feat:
is completely arbitrary |mdash| we could equally well have chosen 3999
and 57 as our two values in subcatgpsg_.
On this approach, `subcat`:feat: can *only* appear on lexical
categories; it makes no sense, for example, to specify  a
`subcat`:feat: value on `VP`:gc:. 

An alternative treatment of subcategorization, due originally to a framework
known as categorial grammar, is represented in feature-based frameworks such as PATR
and Head-driven Phrase Structure Grammar. Rather than using
`subcat`:feat: values as a way of indexing rules, the `subcat`:feat:
value directly encodes the valency of a head (the list of
arguments that it can combine with). For example, a verb like
`put`:lx: which takes  `NP`:gc: and `PP`:gc: complements (`put the
book on the table`:lx) 
might be represented as subcathpsg0_:

.. _subcathpsg0:
.. ex::  `V`:gc:\ [`subcat`:feat: = `<NP, NP, PP>`:fval:\ ] 

This says that the verb can combine with three  arguments. The
leftmost element in the list is the subject `NP`:gc:, while everything
else |mdash| an `NP`:gc: followed by a `PP`:gc:  in this case |mdash| comprises the
subcategorized-for complements. When a verb like `put`:lx: is combined
with appropriate complements, the requirements which are specified in
the  `subcat`:feat: are discharged, and only a subject `NP`:gc: is
needed. This category, which corresponds to what is traditionally
thought of as `VP`:gc:, might be represented as follows.

.. _subcathpsg1:
.. ex::  `V`:gc:\ [`subcat`:feat: = `<NP>`:fval:\ ] 

Finally, a sentence is a kind of verbal category which has *no*
requirements for further arguments, and hence has a `subcat`:feat:
whose value is the empty list. The tree subcathpsg2_ shows how these
category assigments combine in a parse of `Kim put the book on the table`:lx:.

.. _subcathpsg2:
.. ex::
      .. tree:: (V[subcat=\<\>] (NP Kim)(V[subcat=\<NP\>](V[subcat=\<NP,\ NP,\ VP\>] put)<NP the\ book><PP on\ the\ table>))



Unbounded Dependency Constructions
----------------------------------

Consider the following contrasts: 

.. _gap1:
.. ex::
   .. _gap1a:
   .. ex::

      We liked the music.

   .. _ga1b:
   .. ex::

      \*We liked.

.. _gap2:
.. ex::
   .. _gap2a:
   .. ex::

      We put the card into the slot.

   .. _gap2b:
   .. ex::

      \*We put into the slot.

   .. _gap2c:
   .. ex::

      \*We put the card.

   .. _gap2d:
   .. ex::

      \*We put.

The verb `like`:lx: requires an `NP`:gc: complement, while
`put`:lx: requires both a following `NP`:gc: and `PP`:gc:. Examples
gap1_ and gap2_ show that these complements are *obligatory*:
omitting them leads to ungrammaticality. Yet there are contexts in
which obligatory complements can be omitted, as gap3_ and gap4_
illustrate.

.. _gap3:
.. ex::
   .. _gap3a:
   .. ex::

      She knows which music we like.

   .. _gap3b:
   .. ex::

      This music, we really like.

.. _gap4:
.. ex::
   .. _gap4a:
   .. ex::

      Which card did you put into the slot?

   .. _gap4b:
   .. ex::

      Which slot did you put the card into?



--------------------------------
 Adding Compositional Semantics
--------------------------------

Overview
--------

One of the goals of linguistic theory is to provide a systematic
correspondence between form and meaning. One widely adopted approach
to representing meaning |mdash| or at least, some aspects of meaning
|mdash| involves translating expressions of natural language in to
first order logic. From a computational point of view, a strong
argument in favour of first order logic is that it strikes a
reasonable balance between expressiveness and logical
tractability. On the one hand, it is flexible enough to represent many
aspects of the logical structure of natural language. On the other
hand, automated theorem proving for first order logic has received
much attention, and although inference in first order logic is not
decidable, in practice many reasoning problems are efficiently
solvable using modern theorem provers.

Standard textbooks on first order logic often contain exercises in
which the reader is required to translate between English and logic, as illustrated in km_ and wq_.\ [#]_

.. [#] These examples come, respectively, from D. Kalish and R. Montague (1964) *Logic: Techniques of
       Formal Reasoning*, Harcourt, Brace and World, p94, and W. v. Quine (1952) *Methods of Logic*, Routledge and Kegan Paul, p121.


.. _km:
.. ex::

   .. ex:: If all whales are mammals, then Moby Dick is not a fish.

   .. ex:: |forall|\ `x`:math:\ (whale(`x`:math:) |rarr| mammal(`x`:math:)) |rarr| |neg|\ fish(MD)

.. _wq:
.. ex::

   .. ex:: There is a painting that all critics admire.

   .. ex:: |exists|\ `y`:math:\ (painting(`y`:math:) |wedge| |forall|\ `x`:math:\ (critic(`x`:math:) |rarr|  admire(`x`:math:, `y`:math:)))


Although there are numerous subtle and thorny issues about how this
translation should be carried out in particular cases, we will put
these to one side. The main focus of our discussion will be on a
different problem: how can we systematically construct a semantic
representation for a sentence which proceeds in step with the
process of parsing that sentence?


Unfortunately, it is not within the scope of this chapter to introduce the syntax and
semantics of first order logic, so if you don't already have some
familiarity with it, we suggest you consult an appropriate source. 


The |lambda| calculus
---------------------

syntax of |lambda| calculus; functions; |beta| conversion 


Compositionality
----------------

Sample grammar

coordination

quantification and scope


Feature-based Semantics
-----------------------



-----------------
 Further Reading
-----------------

Gerald Gazdar, Ewan Klein, Geoffrey Pullum and Ivan Sag (1985)
*Generalized Phrase Structure Grammar*, Basil Blackwell.

Ivan A. Sag and Thomas Wasow (1999) *Syntactic Theory: A Formal
Introduction*, CSLI Publications.

Patrick Blackburn and Johan Bos
*Representation and Inference for Natural Language: A First Course in
Computational Semantics*, CSLI Publications


---------
Exercises
---------

1. 


.. include:: footer.txt
