Comparative Analysis of Natural Language Methods for Part-of-Speech Tagging

This project is a comparative analysis between large language models (LLMs) and traditional methods for part of speech tagging of natural language using a tagged subset of the widely used Penn Treebank dataset and a domain-specific dataset from the BioNLP STc challenge. LLMs, such as GPT-3 and BERT-style models, have shown remarkable performance in various NLP tasks, including part of speech tagging, and may offer advantages in terms of their ability to capture contextual information and handle long-range dependencies compared to more traditional methods. On the other hand, more traditional models have been widely used for part of speech tagging, exemplified by parsers such as the Stanford Parser, and the part of speech taggers available in the Natural Language Toolkit (NLTK) library in Python. We evaluate the accuracy of several models and provide insights into the strengths and weaknesses of each approach for part of speech tagging. This information informs the choice of modeling technique for similar applications and contributes to the understanding of the trade-offs between LLMs and PCFGs in part of speech tagging of natural language text.