问题根据寻找NP头部的规则，在NLTK和斯坦福解析中查找名词短语的头部

通常名词短语的头部是NP的最右边的名词，如下所示，树是父NP的头部。所以

            根
             |
             小号
          ___ | ________________________
         NP |
      ___ | _____________ |
     | PP VP
     | ____ | ____ ____ | ___
     NP | NP | PRT
  ___ | _______ | | | |
 DT JJ NN NN IN NNP VBD RP
 | | | | | | | |
来自印度的老橡树倒下了

Out [40]：Tree（'S'，[Tree（'NP'，[Tree（'NP'，[Tree（'DT'，['The']），Tree（'JJ'，['old'] ），树（'NN'，['oak']），树（'NN'，['树']）]），树（'PP'，[树（'IN'，['from']），树（'NP'，[树（'NNP'，['印度']）]）]）]），树（'VP'，[树（'VBD'，['倒']），树（'PRT '，[树（'RP'，['down']）]）]）]）

以下代码基于java实现使用一个简单的规则来找到NP的头部，但我需要基于规则：

parsestr='(ROOT (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down)))))'
def traverse(t):
    try:
        t.label()
    except AttributeError:
          return
    else:
        if t.label()=='NP':
            print 'NP:'+str(t.leaves())
            print 'NPhead:'+str(t.leaves()[-1])
            for child in t:
                 traverse(child)

        else:
            for child in t:
                traverse(child)


tree=Tree.fromstring(parsestr)
traverse(tree)

上面的代码给出了输出：

NP：['''，'old'，'oak'，'tree'，'from'，'India'] NPhead：印度 NP：['''，'old'，'oak'，'tree'] NPhead：树 NP： '印度'] NPhead：印度

虽然现在它给出了给出的句子的正确输出但我需要结合一个条件，只有最右边的名词被提取为头部，目前它不检查它是否是名词（NN）

print 'NPhead:'+str(t.leaves()[-1])

所以像上面代码中的np head条件一样：

t.leaves().getrightmostnoun()

迈克尔柯林斯论文（附录A）包括Penn Treebank的头部查找规则，因此没有必要只有最右边的名词是头部。因此，上述条件应包含这种情况。

对于其中一个答案中给出的以下示例：

（给（NP谈话）的NP（NP人）回家了

主题的名词是人，但是NP的最后一个离开节点是讲话的人。

6660

2017-09-18 14:38

起源

你的问题是什么？ - barny

@barny如何找到头部和NP - stackit

请阅读帮助页面 stackoverflow.com/help/mcve。在这种情况下，显示您的输出做 get：“不起作用”对于StackOverflow来说是不够的。另外，请尝试在代码中添加更多的打印语句（例如在遍历（子）之前的一个，以及在遍历的另一个上）。发布该执行跟踪的输出 - 只要它不立即显示您问题。 - Prune

答案:

有内置的字符串 Tree NLTK中的对象（http://www.nltk.org/_modules/nltk/tree.html），见 https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L541。

>>> from nltk.tree import Tree
>>> parsestr='(ROOT (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down)))))'
>>> for i in Tree.fromstring(parsestr).subtrees():
...     if i.label() == 'NP':
...             print i
... 
(NP
  (NP (DT The) (JJ old) (NN oak) (NN tree))
  (PP (IN from) (NP (NNP India))))
(NP (DT The) (JJ old) (NN oak) (NN tree))
(NP (NNP India))


>>> for i in Tree.fromstring(parsestr).subtrees():
...     if i.label() == 'NP':
...             print i.leaves()
... 
['The', 'old', 'oak', 'tree', 'from', 'India']
['The', 'old', 'oak', 'tree']
['India']

注意，并非总是最右边的名词是NP的头部名词，例如，

>>> s = '(ROOT (S (NP (NN Carnac) (DT the) (NN Magnificent)) (VP (VBD gave) (NP ((DT a) (NN talk))))))'
>>> Tree.fromstring(s)
Tree('ROOT', [Tree('S', [Tree('NP', [Tree('NN', ['Carnac']), Tree('DT', ['the']), Tree('NN', ['Magnificent'])]), Tree('VP', [Tree('VBD', ['gave']), Tree('NP', [Tree('', [Tree('DT', ['a']), Tree('NN', ['talk'])])])])])])
>>> for i in Tree.fromstring(s).subtrees():
...     if i.label() == 'NP':
...             print i.leaves()[-1]
... 
Magnificent
talk

可以说， Magnificent 仍然可以是头名词。另一个例子是当NP包含一个相关子句时：

（给（NP谈话）的NP（NP人）回家了

主题的头名是 person NP的最后一个离开节点 the person that gave the talk 是 talk。

2017-09-19 09:01

我最后完成了它，如代码所示，但只需要添加一个条件来检查最右边是否是NN - stackit

检查我更新的问题 - stackit

所以类似于上面代码中的np head条件：t.leaves（）。getrightmostnoun（） - stackit

Michael Collins论文（附录A）包括Penn Treebank的头部查找规则，因此没有必要只有最右边的名词是head3 - stackit

礼貌地询问NLTK github问题，如果你遇到麻烦，可以帮助实施它。更好的是，尝试实现，使用您的工作代码执行拉取请求并要求进行代码审查，我确信NLTK开发人员会帮助您解决问题。或者等到其他人编码=） - alvas

我正在寻找一个使用NLTK的python脚本执行此任务并偶然发现了这篇文章。这是我提出的解决方案。它有点吵和随意，绝对不总是选择正确的答案（例如复合名词）。但我想发布它，以防其他人有一个主要有效的解决方案。

#!/usr/bin/env python

from nltk.tree import Tree

examples = [
    '(ROOT (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down)))))',
    "(ROOT\n  (S\n    (NP\n      (NP (DT the) (NN person))\n      (SBAR\n        (WHNP (WDT that))\n        (S\n          (VP (VBD gave)\n            (NP (DT the) (NN talk))))))\n    (VP (VBD went)\n      (NP (NN home)))))",
    '(ROOT (S (NP (NN Carnac) (DT the) (NN Magnificent)) (VP (VBD gave) (NP ((DT a) (NN talk))))))'
]

def find_noun_phrases(tree):
    return [subtree for subtree in tree.subtrees(lambda t: t.label()=='NP')]

def find_head_of_np(np):
    noun_tags = ['NN', 'NNS', 'NNP', 'NNPS']
    top_level_trees = [np[i] for i in range(len(np)) if type(np[i]) is Tree]
    ## search for a top-level noun
    top_level_nouns = [t for t in top_level_trees if t.label() in noun_tags]
    if len(top_level_nouns) > 0:
        ## if you find some, pick the rightmost one, just 'cause
        return top_level_nouns[-1][0]
    else:
        ## search for a top-level np
        top_level_nps = [t for t in top_level_trees if t.label()=='NP']
        if len(top_level_nps) > 0:
            ## if you find some, pick the head of the rightmost one, just 'cause
            return find_head_of_np(top_level_nps[-1])
        else:
            ## search for any noun
            nouns = [p[0] for p in np.pos() if p[1] in noun_tags]
            if len(nouns) > 0:
                ## if you find some, pick the rightmost one, just 'cause
                return nouns[-1]
            else:
                ## return the rightmost word, just 'cause
                return np.leaves()[-1]

for example in examples:
    tree = Tree.fromstring(example)
    for np in find_noun_phrases(tree):
        print "noun phrase:",
        print " ".join(np.leaves())
        head = find_head_of_np(np)
        print "head:",
        print head

对于问题和其他答案中讨论的示例，这是输出：

noun phrase: The old oak tree from India
head: tree
noun phrase: The old oak tree
head: tree
noun phrase: India
head: India
noun phrase: the person that gave the talk
head: person
noun phrase: the person
head: person
noun phrase: the talk
head: talk
noun phrase: home
head: home
noun phrase: Carnac the Magnificent
head: Magnificent
noun phrase: a talk
head: talk

2018-05-18 10:25

问题 根据寻找NP头部的规则，在NLTK和斯坦福解析中查找名词短语的头部

答案:

热门问题

问题根据寻找NP头部的规则，在NLTK和斯坦福解析中查找名词短语的头部