WOLFcon 2024 - Understanding and Using AI Workflows with FOLIO

23 September 2024


Natural Language Processing

Natural Language Processing2 or NLP is a set of machine learning technologies for interpreting, generating, and comprehending human languages. Some examples of NLP processes include:

  • Language translation - Translating text from one language to another.
  • Text summarization - Summarizing the main points from a large text.
  • Named Entity Recognition - Tagging words or phrases such as proper names of people, places, and concepts.
  • Part of Speech Tagging - Identifying grammar components like nouns, verbs, and adjectives in a text sample.
  • Sentiment analysis - Classifying the emotional or subjective tone in a text sample.
  • Text Generation - Generating text usually based on a prompt.

Use in Libraries

NLP has been used in libraries in the following ways:

  • Extracting entities from semi-structured text or library metadata.
  • Improving search functionality through semantic understanding
  • Automating cataloging and classification processes
  • Enhancing user interfaces with natural language queries
  • Assisting in content recommendations based on user preferences

NLP Software

Choosing the right NLP software depends on your specific needs and technical expertise.

  • Annif - A platform that uses subject vocabularies like FAST1, to train a model on a corpus of data and then provides subject suggestions.
  • NLTK - An open-source platform and Python package that provides interfaces to a number of corpus and documents as well as a rich set of classification, tokenization, stemming, tagging, parsing, and semantic reasoning libraries.
  • spaCy - An open-source Python package that offers tooling for named entity-matching, test summarization, part of speech tagging, and sentiment analysis as well as tools for model training and large language model integrations.
  • CoreNLP - An open-source Java library that includies token and sentence boundaries, parts of speech, named entities, numeric and time values, dependency and constituency parsing, sentiment analysis, and quote attributions.