The Penn Treebank

GPTKB entity

Statements (61)
Predicate Object
gptkbp:instanceOf corpus
gptkbp:affiliation gptkb:University_of_Pennsylvania
gptkbp:citation Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313-330.
gptkbp:contains syntactic annotation
annotated text
parsed sentences
part-of-speech annotation
gptkbp:createdBy gptkb:Tony_Rose
gptkb:Ann_Bies
gptkb:Ann_Warner
gptkb:Anthony_Kroch
gptkb:Beatrice_Santorini
gptkb:Beth_Sundheim
gptkb:Carolyn_Penstein_Rosé
gptkb:Catherine_Macleod
gptkb:Chris_Cieri
gptkb:Donald_Hindle
gptkb:Grace_Kim
gptkb:Kimberly_Babko
gptkb:Mark_Mandel
gptkb:Mary_Ann_Marcinkiewicz
gptkb:Mitch_Marcus
gptkb:Mitchell_Marcus
gptkb:Pamela_MacIntyre
gptkb:Robert_Ingria
gptkb:Stephanie_Strassel
gptkb:Ann_Taylor
gptkb:Jonathan_Wright
gptkb:David_Graff
gptkb:Aravind_Joshi
gptkb:Martha_Palmer
gptkb:Ralph_Weischedel
gptkb:Mark_Liberman
gptkb:Paul_Kingsbury
gptkb:Robert_MacIntyre
gptkb:Tom_Morton
gptkbp:field linguistics
natural language processing
computational linguistics
gptkbp:fundedBy gptkb:DARPA
gptkb:National_Science_Foundation
gptkb:US_Department_of_Defense
https://www.w3.org/2000/01/rdf-schema#label The Penn Treebank
gptkbp:includes gptkb:Brown_Corpus
gptkb:Switchboard_Corpus
gptkb:Wall_Street_Journal_corpus
gptkbp:language English
gptkbp:license LDC license
gptkbp:location gptkb:United_States
gptkbp:publishedBy gptkb:Linguistic_Data_Consortium
gptkbp:releaseYear 1992
gptkbp:size over 4.5 million words
gptkbp:url https://catalog.ldc.upenn.edu/LDC99T42
gptkbp:usedFor gptkb:machine_learning
corpus linguistics
syntactic parsing
NLP research
POS tagging
training parsers
gptkbp:bfsParent gptkb:Adam_Meyers
gptkbp:bfsLayer 7