|
|
Natural Language Processing

If you've ever used a translation app, had predictive text spell that tricky word for you, or said the words, "Alexa, what's the weather like tomorrow?" then you've enjoyed the products of natural language processing. 

Mockup Schule

Explore our app and discover over 50 million learning materials for free.

Natural Language Processing

Illustration

Lerne mit deinen Freunden und bleibe auf dem richtigen Kurs mit deinen persönlichen Lernstatistiken

Jetzt kostenlos anmelden

Nie wieder prokastinieren mit unseren Lernerinnerungen.

Jetzt kostenlos anmelden
Illustration

If you've ever used a translation app, had predictive text spell that tricky word for you, or said the words, "Alexa, what's the weather like tomorrow?" then you've enjoyed the products of natural language processing.

It's no coincidence that we can now communicate with computers using human language - they were trained that way - and in this article, we're going to find out how. We'll begin by looking at a definition and the history behind natural language processing before moving on to the different types and techniques. Finally, we will look at the social impact natural language processing has had.

Definition of Natural Language Processing

Natural language processing (NLP) is a branch of artificial intelligence (AI) that assists in the process of programming computers/computer software to 'learn' human languages. The goal of NLP is to create software that understands language as well as we do.

Natural language processing has roots in linguistics, computer science, and machine learning and has been around for more than 50 years (almost as long as the modern-day computer!).

Today, we can see the results of NLP in things such as Apple's Siri, Google's suggested search results, and language learning apps like Duolingo.

Natural language processing, amazon echo dot, StudySmarterFig 1. We can talk to 'Alexa' because of natural language processing

History of Natural Language Processing

The beginnings of NLP as we know it today arose in the 1940s after the Second World War. The global nature of the war highlighted the importance of understanding multiple different languages, and technicians hoped to create a 'computer' that could translate languages for them.

The creation of such a computer proved to be pretty difficult, and linguists such as Noam Chomsky identified issues regarding syntax. For example, Chomsky found that some sentences appeared to be grammatically correct, but their content was nonsense. He argued that for computers to understand human language, they would need to understand syntactic structures.

Syntactic structures - In 1957, Noam Chomsky released his highly influential book Syntactic Structures, in which he argued that syntax should be treated separately from semantics and that there must be a formal and standardized approach to analyzing syntax.

By the 1990s, NLP had come a long way and now focused more on statistics than linguistics, 'learning' rather than translating, and used more Machine Learning algorithms. Using Machine Learning meant that NLP developed the ability to recognize similar chunks of speech and no longer needed to rely on exact matches of predefined expressions. For example, software using NLP would understand both "What's the weather like?" and "How's the weather?".

By 2011, Apple released the first successful and publicly available NLP virtual assistant, Siri.

How Does Natural Language Processing Work?

You're probably wondering by now how NLP works - this is where linguistics knowledge will come in handy.

NLP uses AI to take in real-world human language and perform processing tasks in order to turn the language into code the computer will understand. There are two parts to this process:

  • Pre-processing (sometimes referred to as data processing) This involves breaking the language down and converting it into data that an algorithm can work with.

  • Algorithm development - Once the language has been turned into data, an algorithm must be developed to process and use it.

Let's look at some of the most common pre-processing techniques now. These techniques are rooted in linguistics and linguistic analysis. We won't be looking at algorithm development today, as this is less related to linguistics.

Natural Language Processing Techniques

There are two main pre-processing types: syntactic and semantic analysis. Before we dive into these techniques, let's look at some definitions for these two terms.

Syntax - The arrangement and order of words within a sentence. The most basic syntax structure is subject-verb-object (SVO).

Semantics - The branch of linguistics that looks at the meaning, logic, and relationship of and between words.

Syntactic Analysis

Syntactic analysis involves looking at a sentence as a whole to understand its meaning rather than analyzing individual words. There are several syntactic analysis techniques NLP utilizes.

Parsing

Parsing involves breaking a sentence down into each of its constituents. A constituent is a unit of language that serves a function in a sentence; they can be individual words, phrases, or clauses. For example, the sentence "The cat plays the grand piano." comprises two main constituents, the noun phrase (the cat) and the verb phrase (plays the grand piano). The verb phrase can then be further divided into two more constituents, the verb (plays) and the noun phrase (the grand piano).

Conducting a parsing analysis involves representing each sentence's constituents in a parse tree, like so:

Natural language processing, Parsing tree, StudySmarterFig 2. Example of a parse tree

Parse trees can show us the relationship between words in a sentence and how they work together to form constituents. For example, we can see that "the grand piano" is a constituent, but "plays the" isn't. This information can be turned into data for an NLP algorithm.

Stemming

Stemming is a morphological process that involves reducing conjugated words back to their root word.

Conjugation (adj. conjugated) - Inflecting a verb to show different grammatical meanings, such as tense, aspect, and person. Inflecting verbs typically involves adding suffixes to the end of the verb or changing the word's spelling.

Root word - Walk (verb)

Conjugations - walking, walked, walks, walker

Taking each word back to its original form can help NLP algorithms recognize that although the words may be spelled differently, they have the same essential meaning. It also means that only the root words need to be stored in a database, rather than every possible conjugation of every word.

Text Segmentation

Text segmentation is the process of separating language into meaningful units, such as morphemes (e.g., un-, luck, -y), words, sentences, paragraphs, and intent (i.e., what is the purpose of the language? does it ask a question, provide a statement, or give an order?).

Semantic Analysis

Sometimes sentences can follow all the syntactical rules but don't make semantical sense. This is why it's important to also conduct semantic analyses. These help the algorithms understand the tone, purpose, and intended meaning of language.

Sentiment Analysis

Sentiment analysis is an NLP technique that aims to understand whether the language is positive, negative, or neutral. It can also determine the tone of language, such as angry or urgent, as well as the intent of the language (i.e., to get a response, to make a complaint, etc.). Sentiment analysis works by finding vocabulary that exists within preexisting lists.

Adjectives like disappointed, wrong, incorrect, and upset would be picked up in the pre-processing stage and would let the algorithm know that the piece of language (e.g., a review) was negative.

Disambiguation

Word disambiguation is the process of trying to remove lexical ambiguities. A lexical ambiguity occurs when it is unclear which meaning of a word is intended.

"I'll meet you at the bank."

The word bank has more than one meaning, so there is an ambiguity as to which meaning is intended here. By looking at the wider context, it might be possible to remove that ambiguity.

"I need to deposit some money, so I'll meet you at the bank."

Now we can see that the word bank is referring to a financial establishment and not a river bank or the verb to bank.

Removing lexical ambiguities helps to ensure the correct semantic meaning is being understood.

Natural Language Processing Examples

Now we have a good idea of what NLP is and how its works, let's look at some real-world examples of how NLP affects our day-to-day lives.

Email filters

If you open up your email and look at the menu, you'll likely find different folders such as "spam" or "social." Emails you've received have been automatically 'filtered' to these folders based on the vocabulary they contain. This is a type of sentiment analysis.

Predictive text

One of the earliest uses of NLP was in predictive text. Today, predictive text uses NLP techniques and 'deep learning' to correct the spelling of a word, guess which word you will use next, and make suggestions to improve your writing.

Activity: Try sending a message using only predictive text. It's possible to create a whole message only using the suggested words proposed by predictive text. Thanks to NLP, these words will be unique and tailored to you and can create some very funny (and revealing) messages!

Language apps

Natural language processing has made huge improvements to language translation apps. It can help ensure that the translation makes syntactic and grammatical sense in the new language rather than simply directly translating individual words.

Natural language processing, image of online language translating, StudySmarterFig 3. Language translation as we know it today wouldn't be possible without NLP

The Social Impact of Natural Language Processing

In 2016, the researchers Hovy & Spruit released a paper discussing the social and ethical implications of NLP. In it, they highlight how up until recently, it hasn't been deemed necessary to discuss the ethical considerations of NLP; this was mainly because conducting NLP doesn't involve human participants. However, researchers are becoming increasingly aware of the social impact the products of NLP can have on people and society as a whole.

Here are some of the main issues they identified:

  • Exclusion - NLP may learn from dominant cultures, making it easier to use and more appropriate for those from those dominant cultures.

  • Overgeneralization - NLP may lead to software making widespread assumptions about things like our gender, age, religion, and sexual orientation.

  • Bias - Most NLP tools focus on English and can therefore produce more rich data for English speakers than for others.1

Natural Language Processing - Key takeaways

  • Natural language processing (NLP) is a branch of artificial intelligence (AI) that assists in programming computer software to 'learn' human languages.
  • Natural language processing has roots in linguistics, computer science, and machine learning.
  • NLP uses AI to take in real-world human language and perform processing tasks to turn the language into code the computer will understand. There are two parts to this process: pre-processing and algorithm development.
  • Pre-processing involves categorizing language into data an algorithm can work with. Common pre-processing techniques include syntactic analysis (e.g., parsing, stemming, and text segmentation), and semantic analysis (e.g., sentiment analysis and disambiguation).
  • We can see examples of NLP in predictive text, email filters, language learning apps, virtual assistants (e.g., Siri), and more.

References

  1. D. Hovy & S. L. Spruit. The social impact of natural language processing. 2016.

Frequently Asked Questions about Natural Language Processing

Natural language processing (NLP) is a branch of artificial intelligence (AI) that assists in the process of programming computers/computer software to "learn" human languages. The goal of NLP is to create software that understands language as well as we do. 

The main goal of natural language processing is for computers to understand human language as well as we do. It is used in software such as predictive text, virtual assistants, email filters, automated customer service, language translations, and more.

There are two main phases in natural language processing: pre-processing and algorithm development.

Challenges include:

  • Spelling mistakes 
  • Strong accents
  • Lexical ambiguities 
  • Unclear intent 
  • Ethical considerations

There are many different ways to analyze language for natural language processing. Some techniques include syntactical analyses like parsing and stemming or semantic analyses like sentiment analysis. 

Test your knowledge with multiple choice flashcards

True or false, software using natural language processing would understand both "What's the weather like?" and "How's the weather?"

Natural language processing involves two processes. What are they?

Pre-processing involves two different types of analyses. What are they?

Next
More about Natural Language Processing

Join over 22 million students in learning with our StudySmarter App

The first learning app that truly has everything you need to ace your exams in one place

  • Flashcards & Quizzes
  • AI Study Assistant
  • Study Planner
  • Mock-Exams
  • Smart Note-Taking
Join over 22 million students in learning with our StudySmarter App Join over 22 million students in learning with our StudySmarter App

Sign up to highlight and take notes. It’s 100% free.

Entdecke Lernmaterial in der StudySmarter-App

Google Popup

Join over 22 million students in learning with our StudySmarter App

Join over 22 million students in learning with our StudySmarter App

The first learning app that truly has everything you need to ace your exams in one place

  • Flashcards & Quizzes
  • AI Study Assistant
  • Study Planner
  • Mock-Exams
  • Smart Note-Taking
Join over 22 million students in learning with our StudySmarter App