(PDF) A Review of Applications of Artificial Intelligence in Heavy Duty Trucks

nlp explained :: Article Creator

What Is NLP? Natural Language Processing Explained - CIO

Natural language processing definition

Natural language processing (NLP) is the branch of artificial intelligence (AI) that deals with training computers to understand, process, and generate language. Search engines, machine translation services, and voice assistants are all

While the term originally referred to a system's ability to read, it's since become a colloquialism for all computational linguistics. Subcategories include natural language generation (NLG) — a computer's ability to create communication of its own — and natural language understanding (NLU) — the ability to understand slang, mispronunciations, misspellings, and other variants in language.

The introduction of transformer models in the 2017 paper "Attention Is All You Need" by Google researchers revolutionized NLP, leading to the creation of generative AI models such as Bidirectional Encoder Representations from Transformer (BERT) and subsequent DistilBERT — a smaller, faster, and more efficient BERT — Generative Pre-trained Transformer (GPT), and Google Bard.

How natural language processing works

NLP leverages machine learning (ML) algorithms trained on unstructured data, typically text, to analyze how elements of human language are structured together to impart meaning. Phrases, sentences, and sometimes entire books are fed into ML engines where they're processed using grammatical rules, people's real-life linguistic habits, and the like. An NLP algorithm uses this data to find patterns and extrapolate what comes next. For example, a translation algorithm that recognizes that, in French, "I'm going to the park" is "Je vais au parc" will learn to predict that "I'm going to the store" also begins with "Je vais au." All the algorithm then needs is the word for "store" to complete the translation task.

NLP applications

Machine translation is a powerful NLP application, but search is the most used. Every time you look something up in Google or Bing, you're helping to train the system. When you click on a search result, the system interprets it as confirmation that the results it has found are correct and uses this information to improve search results in the future.

Chatbots work the same way. They integrate with Slack, Microsoft Messenger, and other chat programs where they read the language you use, then turn on when you type in a trigger phrase. Voice assistants such as Siri and Alexa also kick into gear when they hear phrases like "Hey, Alexa." That's why critics say these programs are always listening; if they weren't, they'd never know when you need them. Unless you turn an app on manually, NLP programs must operate in the background, waiting for that phrase.

Transformer models take applications such as language translation and chatbots to a new level. Innovations such as the self-attention mechanism and multi-head attention enable these models to better weigh the importance of various parts of the input, and to process those parts in parallel rather than sequentially.

Rajeswaran V, senior director at Capgemini, notes that Open AI's GPT-3 model has mastered language without using any labeled data. By relying on morphology — the study of words, how they are formed, and their relationship to other words in the same language — GPT-3 can perform language translation much better than existing state-of-the-art models, he says.

NLP systems that rely on transformer models are especially strong at NLG.

Natural language processing examples

Data comes in many forms, but the largest untapped pool of data consists of text — and unstructured text in particular. Patents, product specifications, academic publications, market research, news, not to mention social media feeds, all have text as a primary component and the volume of text is constantly growing. Apply the technology to voice and the pool gets even larger. Here are three examples of how organizations are putting the technology to work:

Edmunds drives traffic with GPT: The online resource for automotive inventory and information has created a ChatGPT plugin that exposes its unstructured data — vehicle reviews, ratings, editorials — to the generative AI. The plugin enables ChatGPT to answer user questions about vehicles with its specialized content, driving traffic to its website.

Eli Lilly overcomes translation bottleneck: With global teams working in a variety of languages, the pharmaceutical firm developed Lilly Translate, a home-grown NLP solution, to help translate everything from internal training materials and formal, technical communications to regulatory agencies. Lilly Translate uses NLP and deep learning language models trained with life sciences and Lilly content to provide real-time translation of Word, Excel, PowerPoint, and text for users and systems.

Accenture uses NLP to analyze contracts: The company's Accenture Legal Intelligent Contract Exploration (ALICE) tool helps the global services firm's legal organization of 2,800 professionals perform text searches across its million-plus contracts, including searches for contract clauses. ALICE uses "word embedding" to go through contract documents paragraph by paragraph, looking for keywords to determine whether the paragraph relates to a particular contract clause type.

Natural language processing software

Whether you're building a chatbot, voice assistant, predictive text application, or other application with NLP at its core, you'll need tools to help you do it. According to Technology Evaluation Centers, the most popular software includes:

Natural Language Toolkit (NLTK), an open-source framework for building Python programs to work with human language data. It was developed in the Department of Computer and Information Science at the University of Pennsylvania and provides interfaces to more than 50 corpora and lexical resources, a suite of text processing libraries, wrappers for natural language processing libraries, and a discussion forum. NLTK is offered under the Apache 2.0 license.

Mallet, an open-source, Java-based package for statistical NLP, document classification, clustering, topic modeling, information extraction, and other ML applications to text. It was primarily developed at the University of Massachusetts Amherst.

SpaCy, an open-source library for advanced natural language processing explicitly designed for production use rather than research. Licensed by MIT, SpaCy was made with high-level data science in mind and allows deep data mining.

Amazon Comprehend. This Amazon service doesn't require ML experience. It's intended to help organizations find insights from email, customer reviews, social media, support tickets, and other text. It uses sentiment analysis, part-of-speech extraction, and tokenization to parse the intention behind the words.

Google Cloud Translation. This API uses NLP to examine a source text to determine language and then use neural machine translation to dynamically translate the text into another language. The API allows users to integrate the functionality into their own programs.

Natural language processing courses

There's a wide variety of resources available for learning to create and maintain NLP applications, many of which are free. They include:

NLP – Natural Language Processing with Python from Udemy. This course provides an introduction to natural language processing in Python, building to advanced topics such as sentiment analysis and the creation of chatbots. It consists of 11.5 hours of on-demand video, two articles, and three downloadable resources. The course costs $94.99, which includes a certificate of completion.

Data Science: Natural Language Processing in Python from Udemy. Aimed at NLP beginners who are conversant with Python, this course involves building a number of NLP applications and models, including a cipher decryption algorithm, spam detector, sentiment analysis model, and article spinner. The course consists of 12 hours of on-demand video and costs $99.99, which includes a certificate of completion.

Natural Language Processing Specialization from Coursera. This intermediate-level set of four courses is intended to prepare students to design NLP applications such as sentiment analysis, translation, text summarization, and chatbots. It includes a career certificate.

Hands On Natural Language Processing (NLP) using Python from Udemy. This course is for individuals with basic programming experience in any language, an understanding of object-oriented programming concepts, knowledge of basic to intermediate mathematics, and knowledge of matrix operations. It's completely project-based and involves building a text classifier for predicting sentiment of tweets in real-time, and an article summarizer that can fetch articles and find the summary. The course consists of 10.5 hours of on-demand video and eight articles, and costs $19.99, which includes a certificate of completion.

Natural Language Processing in TensorFlow by Coursera. This course is part of Coursera's TensorFlow in Practice Specialization, and covers using TensorFlow to build natural language processing systems that can process text and input sentences into a neural network. Coursera says it's an intermediate-level course and estimates it will take four weeks of study at four to five hours per week to complete.

NLP salaries

Here are some of the most popular job titles related to NLP and the average salary (in US$) for each position, according to data from PayScale.

Computational linguist: $60,000 to $126,000

Data scientist: $79,000 to $137,000

Data science director: $107,000 to $215,000

Lead data scientist: $115,000 to $164,000

Machine learning engineer: $83,000 to $154,000

Senior data scientist: $113,000 to $177,000

Software engineer: $80,000 to $166,000

BERT Explained: What You Need To Know About Google's New Algorithm - Search Engine Journal

Google's newest algorithmic update, BERT, helps Google understand natural language better, particularly in conversational search.

BERT will impact around 10% of queries. It will also impact organic rankings and featured snippets. So this is no small change!

But did you know that BERT is not just any algorithmic update, but also a research paper and machine learning natural language processing framework?

In fact, in the year preceding its implementation, BERT has caused a frenetic storm of activity in production search.

On November 20, I moderated a Search Engine Journal webinar presented by Dawn Anderson, Managing Director at Bertey.

Anderson explained what Google's BERT really is and how it works, how it will impact search, and whether you can try to optimize your content for it.

Here's a recap of the webinar presentation.

width=

What Is BERT in Search?

BERT, which stands for Bidirectional Encoder Representations from Transformers, is actually many things.

It's more popularly known as a Google search algorithm ingredient /tool/framework called Google BERT which aims to help Search better understand the nuance and context of words in Searches and better match those queries with helpful results.

BERT is also an open-source research project and academic paper. First published in October 2018 as BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, the paper was authored by Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova.

Additionally, BERT is a natural language processing NLP framework that Google produced and then open-sourced so that the whole natural language processing research field could actually get better at natural language understanding overall.

You'll probably find that most mentions of BERT online are NOT about the Google BERT update.

There are lots of actual papers about BERT being carried out by other researchers that aren't using what you would consider as the Google BERT algorithm update.

BERT has dramatically accelerated natural language understanding NLU more than anything and Google's move to open source BERT has probably changed natural language processing forever.

The machine learning ML and NLP communities are very excited about BERT as it takes a huge amount of heavy lifting out of their being able to carry out research in natural language. It has been pre-trained on a lot of words – and on the whole of the English Wikipedia 2,500 million words.

Vanilla BERT provides a pre-trained starting point layer for neural networks in machine learning and natural language diverse tasks.

While BERT has been pre-trained on Wikipedia, it is fine-tuned on questions and answers datasets.

One of those question-and-answer data sets it can be fine-tuned on is called MS MARCO: A Human Generated MAchine Reading COmprehension Dataset built and open-sourced by Microsoft.

There are real Bing questions and answers (anonymized queries from real Bing users) that's been built into a dataset with questions and answers for ML and NLP researchers to fine-tune and then they actually compete with each other to build the best model.

Researchers also compete over Natural Language Understanding with SQuAD (Stanford Question Answering Dataset). BERT now even beats the human reasoning benchmark on SQuAD.

Lots of the major AI companies are also building BERT versions:

Microsoft extends on BERT with MT-DNN (Multi-Task Deep Neural Network).

RoBERTa from Facebook.

SuperGLUE Benchmark was created because the original GLUE Benchmark became too easy.

What Challenges Does BERT Help to Solve?

There are things that we humans understand easily that machines don't really understand at all including search engines.

The Problem with Words

The problem with words is that they're everywhere. More and more content is out there

Words are problematic because plenty of them are ambiguous, polysemous, and synonymous.

Bert is designed to help solve ambiguous sentences and phrases that are made up of lots and lots of words with multiple meanings.

Ambiguity & Polysemy

Almost every other word in the English language has multiple meanings. In spoken word, it is even worse because of homophones and prosody.

For instance, "four candles" and "fork handles" for those with an English accent. Another example: comedians' jokes are mostly based on the play on words because words are very easy to misinterpret.

It's not very challenging for us humans because we have common sense and context so we can understand all the other words that surround the context of the situation or the conversation – but search engines and machines don't.

This does not bode well for conversational search into the future.

Word's Context

"The meaning of a word is its use in a language." – Ludwig Wittgenstein, Philosopher, 1953

Basically, this means that a word has no meaning unless it's used in a particular context.

The meaning of a word changes literally as a sentence develops due to the multiple parts of speech a word could be in a given context.

Stanford Parser

Case in point, we can see in just the short sentence "I like the way that looks like the other one." alone using the Stanford Part-of-Speech Tagger that the word "like" is considered to be two separate parts of speech (POS).

The word "like" may be used as different parts of speech including verb, noun, and adjective.

So literally, the word "like" has no meaning because it can mean whatever surrounds it. The context of "like" changes according to the meanings of the words that surround it.

The longer the sentence is, the harder it is to keep track of all the different parts of speech within the sentence.

On NLR & NLU

Natural Language Recognition Is NOT Understanding

Natural language understanding requires an understanding of context and common sense reasoning. This is VERY challenging for machines but largely straightforward for humans.

Natural Language Understanding Is Not Structured Data

Structured data helps to disambiguate but what about the hot mess in between?

Not Everyone or Thing Is Mapped to the Knowledge Graph

There will still be lots of gaps to fill. Here's an example.

Ontology-driven NLP

As you can see here, we have all these entities and the relationships between them. This is where NLU comes in as it is tasked to help search engines fill in the gaps between named entities.

How Can Search Engines Fill in the Gaps Between Named Entities?

Natural Language Disambiguation

"You shall know a word by the company it keeps." – John Rupert Firth, Linguist, 1957

Words that live together are strongly connected:

Co-occurrence.

Co-occurrence provides context.

Co-occurrence changes a word's meaning.

Words that share similar neighbors are also strongly connected.

Similarity and relatedness.

Language models are trained on very large text corpora or collections loads of words to learn distributional similarity…

…and build vector space models for word embeddings.

vector space models for word embeddings

The NLP models learn the weights of the similarity and relatedness distances. But even if we understand the entity (thing) itself, we need to understand word's context

On their own, single words have no semantic meaning so they need text cohesion. Cohesion is the grammatical and lexical linking within a text or sentence that holds a text together and gives it meaning.

Semantic context matters. Without surrounding words, the word "bucket" could mean anything in a sentence.

He kicked the bucket.

I have yet to cross that off my bucket list.

The bucket was filled with water.

An important part of this is part-of-speech (POS) tagging:

POS Tagging

How BERT Works

Past language models (such as Word2Vec and Glove2Vec) built context-free word embeddings. BERT, on the other hand, provides "context".

To better understand how BERT works, let's look at what the acronym stands for.

B: Bi-directional

Previously all language models (i.E., Skip-gram and Continuous Bag of Words) were uni-directional so they could only move the context window in one direction – a moving window of "n" words (either left or right of a target word) to understand word's context.

Most language modelers are uni-directional. They can traverse over the word's context window from only left to right or right to left. Only in one direction, but not both at the same time.

BERT is different. BERT uses bi-directional language modeling (which is a FIRST).

BERT can see the WHOLE sentence on either side of a word contextual language modeling and all of the words almost at once.

ER: Encoder Representations

What gets encoded is decoded. It's an in-and-out mechanism.

T: Transformers

BERT uses "transformers" and "masked language modeling".

One of the big issues with natural language understanding in the past has been not being able to understand in what context a word is referring to.

Pronouns, for instance. It's very easy to lose track of who's somebody's talking about in a conversation. Even humans can struggle to keep track of who somebody's being referred to in a conversation all the time.

That's kind of similar for search engines, but they struggle to keep track of when you say he, they, she, we, it, etc.

So transformers' attention part of this actually focuses on the pronouns and all the words' meanings that go together to try and tie back who's being spoken to or what is being spoken about in any given context.

Masked language modeling stops the target word from seeing itself. The mask is needed because it prevents the word that's under focus from actually seeing itself.

When the mask is in place, BERT just guesses at what the missing word is. It's part of the fine-tuning process as well.

What Types of Natural Language Tasks Does BERT Help With?

BERT will help with things like:

Named entity determination.

Textual entailment next sentence prediction.

Coreference resolution.

Question answering.

Word sense disambiguation.

Automatic summarization.

Polysemy resolution.

BERT advanced the state-of-the-art (SOTA) benchmarks across 11 NLP tasks.

How BERT Will Impact Search BERT Will Help Google to Better Understand Human Language

BERT's understanding of the nuances of human language is going to make a massive difference as to how Google interprets queries because people are searching obviously with longer, questioning queries.

BERT Will Help Scale Conversational Search

BERT will also have a huge impact on voice search (as an alternative to problem-plagued Pygmalion).

Expect Big Leaps for International SEO

BERT has this mono-linguistic to multi-linguistic ability because a lot of patterns in one language do translate into other languages.

There is a possibility to transfer a lot of the learnings to different languages even though it doesn't necessarily understand the language itself fully.

Google Will Better Understand 'Contextual Nuance' & Ambiguous Queries

A lot of people have been complaining that their rankings have been impacted.

But I think that that's probably more because Google in some way got better at understanding the nuanced context of queries and the nuanced context of content.

So perhaps, Google will be better able to understand contextual nuance and ambiguous queries.

Should You (or Can You) Optimize Your Content for BERT?

Probably not.

Google BERT is a framework of better understanding. It doesn't judge content per se. It just better understands what's out there.

For instance, Google Bert might suddenly understand more and maybe there are pages out there that are over-optimized that suddenly might be impacted by something else like Panda because Google's BERT suddenly realized that a particular page wasn't that relevant for something.

That's not saying that you're optimizing for BERT, you're probably better off just writing natural in the first place.

[Video Recap] BERT Explained: What You Need to Know About Google's New Algorithm

Watch the video recap of the webinar presentation.

Or check out the SlideShare below.

Image Credits

All screenshots taken by author, November 2019

Join Us For Our Next Webinar! The Data Reveals: What It Takes To Win In AI Search

The Technology Behind OpenAI's Fiction-writing, Fake-news-spewing AI, Explained

Last Thursday (Feb. 14), the nonprofit research firm OpenAI released a new language model capable of generating convincing passages of prose. So convincing, in fact, that the researchers have refrained from open-sourcing the code, in hopes of stalling its potential weaponization as a means of mass-producing fake news.

While the impressive results are a remarkable leap beyond what existing language models have achieved, the technique involved isn't exactly new. Instead, the breakthrough was driven primarily by feeding the algorithm ever more training data—a trick that has also been responsible for most of the other recent advancements in teaching AI to read and write. "It's kind of surprising people in terms of what you can do with [...] more data and bigger models," says Percy Liang, a computer science professor at Stanford.

The passages of text that the model produces are good enough to masquerade as something human-written. But this ability should not be confused with a genuine understanding of language—the ultimate goal of the subfield of AI known as natural-language processing (NLP). (There's an analogue in computer vision: an algorithm can synthesize highly realistic images without any true visual comprehension.) In fact, getting machines to that level of understanding is a task that has largely eluded NLP researchers. That goal could take years, even decades, to achieve, surmises Liang, and is likely to involve techniques that don't yet exist.

Four different philosophies of language currently drive the development of NLP techniques. Let's begin with the one used by OpenAI.

List of sentences all containing the word

Linguistic philosophy. Words derive meaning from how they are used. For example, the words "cat" and "dog" are related in meaning because they are used more or less the same way. You can feed and pet a cat, and you feed and pet a dog. You can't, however, feed and pet an orange.

How it translates to NLP. Algorithms based on distributional semantics have been largely responsible for the recent breakthroughs in NLP. They use machine learning to process text, finding patterns by essentially counting how often and how closely words are used in relation to one another. The resultant models can then use those patterns to construct complete sentences or paragraphs, and power things like autocomplete or other predictive text systems. In recent years, some researchers have also begun experimenting with looking at the distributions of random character sequences rather than words, so models can more flexibly handle acronyms, punctuation, slang, and other things that don't appear in the dictionary, as well as languages that don't have clear delineations between words.

Pros. These algorithms are flexible and scalable, because they can be applied within any context and learn from unlabeled data.

Cons. The models they produce don't actually understand the sentences they construct. At the end of the day, they're writing prose using word associations.

Linguistic philosophy. Language is used to describe actions and events, so sentences can be subdivided into subjects, verbs, and modifiers—who, what, where, and when.

How it translates to NLP. Algorithms based on frame semantics use a set of rules or lots of labeled training data to learn to deconstruct sentences. This makes them particularly good at parsing simple commands—and thus useful for chatbots or voice assistants. If you asked Alexa to "find a restaurant with four stars for tomorrow," for example, such an algorithm would figure out how to execute the sentence by breaking it down into the action ("find"), the what ("restaurant with four stars"), and the when ("tomorrow").

Pros. Unlike distributional-semantic algorithms, which don't understand the text they learn from, frame-semantic algorithms can distinguish the different pieces of information in a sentence. These can be used to answer questions like "When is this event taking place?"

Cons. These algorithms can only handle very simple sentences and therefore fail to capture nuance. Because they require a lot of context-specific training, they're also not flexible.

Linguistic philosophy. Language is used to communicate human knowledge.

How it translates to NLP. Model-theoretical semantics is based on an old idea in AI that all of human knowledge can be encoded, or modeled, in a series of logical rules. So if you know that birds can fly, and eagles are birds, then you can deduce that eagles can fly. This approach is no longer in vogue because researchers soon realized there were too many exceptions to each rule (for example, penguins are birds but can't fly). But algorithms based on model-theoretical semantics are still useful for extracting information from models of knowledge, like databases. Like frame-semantics algorithms, they parse sentences by deconstructing them into parts. But whereas frame semantics defines those parts as the who, what, where, and when, model-theoretical semantics defines them as the logical rules encoding knowledge. For example, consider the question "What is the largest city in Europe by population?" A model-theoretical algorithm would break it down into a series of self-contained queries: "What are all the cities in the world?" "Which ones are in Europe?" "What are the cities' populations?" "Which population is the largest?" It would then be able to traverse the model of knowledge to get you your final answer.

Pros. These algorithms give machines the ability to answer complex and nuanced questions.

Cons. They require a model of knowledge, which is time consuming to build, and are not flexible across different contexts.

Linguistic philosophy. Language derives meaning from lived experience. In other words, humans created language to achieve their goals, so it must be understood within the context of our goal-oriented world.

How it translates to NLP. This is the newest approach and the one that Liang thinks holds the most promise. It tries to mimic how humans pick up language over the course of their life: the machine starts with a blank state and learns to associate words with the correct meanings through conversation and interaction. In a simple example, if you wanted to teach a computer how to move objects around in a virtual world, you would give it a command like "Move the red block to the left" and then show it what you meant. Over time, the machine would learn to understand and execute the commands without help.

Pros. In theory, these algorithms should be very flexible and get the closest to a genuine understanding of language.

Cons. Teaching is very time intensive—and not all words and phrases are as easy to illustrate as "Move the red block."

In the short term, Liang thinks, the field of NLP will see much more progress from exploiting existing techniques, particularly those based on distributional semantics. But in the longer term, he believes, they all have limits. "There's probably a qualitative gap between the way that humans understand language and perceive the world and our current models," he says. Closing that gap would probably require a new way of thinking, he adds, as well as much more time.

This originally appeared in our AI newsletter The Algorithm. To have it directly delivered to your inbox, sign up here for free.

Search This Blog

Follow It

Autonomous AI

How to Make Money Online