(PDF) A Review of Applications of Artificial Intelligence in Heavy Duty Trucks
What Is NLP? Natural Language Processing Explained - CIO
Natural language processing definitionNatural language processing (NLP) is the branch of artificial intelligence (AI) that deals with training computers to understand, process, and generate language. Search engines, machine translation services, and voice assistants are all
While the term originally referred to a system's ability to read, it's since become a colloquialism for all computational linguistics. Subcategories include natural language generation (NLG) — a computer's ability to create communication of its own — and natural language understanding (NLU) — the ability to understand slang, mispronunciations, misspellings, and other variants in language.
The introduction of transformer models in the 2017 paper "Attention Is All You Need" by Google researchers revolutionized NLP, leading to the creation of generative AI models such as Bidirectional Encoder Representations from Transformer (BERT) and subsequent DistilBERT — a smaller, faster, and more efficient BERT — Generative Pre-trained Transformer (GPT), and Google Bard.
How natural language processing worksNLP leverages machine learning (ML) algorithms trained on unstructured data, typically text, to analyze how elements of human language are structured together to impart meaning. Phrases, sentences, and sometimes entire books are fed into ML engines where they're processed using grammatical rules, people's real-life linguistic habits, and the like. An NLP algorithm uses this data to find patterns and extrapolate what comes next. For example, a translation algorithm that recognizes that, in French, "I'm going to the park" is "Je vais au parc" will learn to predict that "I'm going to the store" also begins with "Je vais au." All the algorithm then needs is the word for "store" to complete the translation task.
NLP applicationsMachine translation is a powerful NLP application, but search is the most used. Every time you look something up in Google or Bing, you're helping to train the system. When you click on a search result, the system interprets it as confirmation that the results it has found are correct and uses this information to improve search results in the future.
Chatbots work the same way. They integrate with Slack, Microsoft Messenger, and other chat programs where they read the language you use, then turn on when you type in a trigger phrase. Voice assistants such as Siri and Alexa also kick into gear when they hear phrases like "Hey, Alexa." That's why critics say these programs are always listening; if they weren't, they'd never know when you need them. Unless you turn an app on manually, NLP programs must operate in the background, waiting for that phrase.
Transformer models take applications such as language translation and chatbots to a new level. Innovations such as the self-attention mechanism and multi-head attention enable these models to better weigh the importance of various parts of the input, and to process those parts in parallel rather than sequentially.
Rajeswaran V, senior director at Capgemini, notes that Open AI's GPT-3 model has mastered language without using any labeled data. By relying on morphology — the study of words, how they are formed, and their relationship to other words in the same language — GPT-3 can perform language translation much better than existing state-of-the-art models, he says.
NLP systems that rely on transformer models are especially strong at NLG.
Natural language processing examplesData comes in many forms, but the largest untapped pool of data consists of text — and unstructured text in particular. Patents, product specifications, academic publications, market research, news, not to mention social media feeds, all have text as a primary component and the volume of text is constantly growing. Apply the technology to voice and the pool gets even larger. Here are three examples of how organizations are putting the technology to work:
Whether you're building a chatbot, voice assistant, predictive text application, or other application with NLP at its core, you'll need tools to help you do it. According to Technology Evaluation Centers, the most popular software includes:
There's a wide variety of resources available for learning to create and maintain NLP applications, many of which are free. They include:
Here are some of the most popular job titles related to NLP and the average salary (in US$) for each position, according to data from PayScale.
BERT Explained: What You Need To Know About Google's New Algorithm - Search Engine Journal
Google's newest algorithmic update, BERT, helps Google understand natural language better, particularly in conversational search.
BERT will impact around 10% of queries. It will also impact organic rankings and featured snippets. So this is no small change!
But did you know that BERT is not just any algorithmic update, but also a research paper and machine learning natural language processing framework?
In fact, in the year preceding its implementation, BERT has caused a frenetic storm of activity in production search.
On November 20, I moderated a Search Engine Journal webinar presented by Dawn Anderson, Managing Director at Bertey.
Anderson explained what Google's BERT really is and how it works, how it will impact search, and whether you can try to optimize your content for it.
Here's a recap of the webinar presentation.
BERT, which stands for Bidirectional Encoder Representations from Transformers, is actually many things.
It's more popularly known as a Google search algorithm ingredient /tool/framework called Google BERT which aims to help Search better understand the nuance and context of words in Searches and better match those queries with helpful results.
BERT is also an open-source research project and academic paper. First published in October 2018 as BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, the paper was authored by Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova.
Additionally, BERT is a natural language processing NLP framework that Google produced and then open-sourced so that the whole natural language processing research field could actually get better at natural language understanding overall.
You'll probably find that most mentions of BERT online are NOT about the Google BERT update.
There are lots of actual papers about BERT being carried out by other researchers that aren't using what you would consider as the Google BERT algorithm update.
BERT has dramatically accelerated natural language understanding NLU more than anything and Google's move to open source BERT has probably changed natural language processing forever.
The machine learning ML and NLP communities are very excited about BERT as it takes a huge amount of heavy lifting out of their being able to carry out research in natural language. It has been pre-trained on a lot of words – and on the whole of the English Wikipedia 2,500 million words.
Vanilla BERT provides a pre-trained starting point layer for neural networks in machine learning and natural language diverse tasks.
While BERT has been pre-trained on Wikipedia, it is fine-tuned on questions and answers datasets.
One of those question-and-answer data sets it can be fine-tuned on is called MS MARCO: A Human Generated MAchine Reading COmprehension Dataset built and open-sourced by Microsoft.
There are real Bing questions and answers (anonymized queries from real Bing users) that's been built into a dataset with questions and answers for ML and NLP researchers to fine-tune and then they actually compete with each other to build the best model.
Researchers also compete over Natural Language Understanding with SQuAD (Stanford Question Answering Dataset). BERT now even beats the human reasoning benchmark on SQuAD.
Lots of the major AI companies are also building BERT versions:
There are things that we humans understand easily that machines don't really understand at all including search engines.
The Problem with WordsThe problem with words is that they're everywhere. More and more content is out there
Words are problematic because plenty of them are ambiguous, polysemous, and synonymous.
Bert is designed to help solve ambiguous sentences and phrases that are made up of lots and lots of words with multiple meanings.
Ambiguity & Polysemy
Almost every other word in the English language has multiple meanings. In spoken word, it is even worse because of homophones and prosody.
For instance, "four candles" and "fork handles" for those with an English accent. Another example: comedians' jokes are mostly based on the play on words because words are very easy to misinterpret.
It's not very challenging for us humans because we have common sense and context so we can understand all the other words that surround the context of the situation or the conversation – but search engines and machines don't.
This does not bode well for conversational search into the future.
Word's Context"The meaning of a word is its use in a language." – Ludwig Wittgenstein, Philosopher, 1953
Basically, this means that a word has no meaning unless it's used in a particular context.
The meaning of a word changes literally as a sentence develops due to the multiple parts of speech a word could be in a given context.
Case in point, we can see in just the short sentence "I like the way that looks like the other one." alone using the Stanford Part-of-Speech Tagger that the word "like" is considered to be two separate parts of speech (POS).
The word "like" may be used as different parts of speech including verb, noun, and adjective.
So literally, the word "like" has no meaning because it can mean whatever surrounds it. The context of "like" changes according to the meanings of the words that surround it.
The longer the sentence is, the harder it is to keep track of all the different parts of speech within the sentence.
On NLR & NLUNatural Language Recognition Is NOT Understanding
Natural language understanding requires an understanding of context and common sense reasoning. This is VERY challenging for machines but largely straightforward for humans.
Natural Language Understanding Is Not Structured Data
Structured data helps to disambiguate but what about the hot mess in between?
Not Everyone or Thing Is Mapped to the Knowledge Graph
There will still be lots of gaps to fill. Here's an example.
As you can see here, we have all these entities and the relationships between them. This is where NLU comes in as it is tasked to help search engines fill in the gaps between named entities.
How Can Search Engines Fill in the Gaps Between Named Entities?Natural Language Disambiguation
"You shall know a word by the company it keeps." – John Rupert Firth, Linguist, 1957
Words that live together are strongly connected:
Language models are trained on very large text corpora or collections loads of words to learn distributional similarity…
…and build vector space models for word embeddings.
The NLP models learn the weights of the similarity and relatedness distances. But even if we understand the entity (thing) itself, we need to understand word's context
On their own, single words have no semantic meaning so they need text cohesion. Cohesion is the grammatical and lexical linking within a text or sentence that holds a text together and gives it meaning.
Semantic context matters. Without surrounding words, the word "bucket" could mean anything in a sentence.
An important part of this is part-of-speech (POS) tagging:
Past language models (such as Word2Vec and Glove2Vec) built context-free word embeddings. BERT, on the other hand, provides "context".
To better understand how BERT works, let's look at what the acronym stands for.
B: Bi-directionalPreviously all language models (i.E., Skip-gram and Continuous Bag of Words) were uni-directional so they could only move the context window in one direction – a moving window of "n" words (either left or right of a target word) to understand word's context.
Most language modelers are uni-directional. They can traverse over the word's context window from only left to right or right to left. Only in one direction, but not both at the same time.
BERT is different. BERT uses bi-directional language modeling (which is a FIRST).
BERT can see the WHOLE sentence on either side of a word contextual language modeling and all of the words almost at once.
ER: Encoder RepresentationsWhat gets encoded is decoded. It's an in-and-out mechanism.
T: TransformersBERT uses "transformers" and "masked language modeling".
One of the big issues with natural language understanding in the past has been not being able to understand in what context a word is referring to.
Pronouns, for instance. It's very easy to lose track of who's somebody's talking about in a conversation. Even humans can struggle to keep track of who somebody's being referred to in a conversation all the time.
That's kind of similar for search engines, but they struggle to keep track of when you say he, they, she, we, it, etc.
So transformers' attention part of this actually focuses on the pronouns and all the words' meanings that go together to try and tie back who's being spoken to or what is being spoken about in any given context.
Masked language modeling stops the target word from seeing itself. The mask is needed because it prevents the word that's under focus from actually seeing itself.
When the mask is in place, BERT just guesses at what the missing word is. It's part of the fine-tuning process as well.
What Types of Natural Language Tasks Does BERT Help With?BERT will help with things like:
BERT advanced the state-of-the-art (SOTA) benchmarks across 11 NLP tasks.
How BERT Will Impact Search BERT Will Help Google to Better Understand Human LanguageBERT's understanding of the nuances of human language is going to make a massive difference as to how Google interprets queries because people are searching obviously with longer, questioning queries.
BERT Will Help Scale Conversational SearchBERT will also have a huge impact on voice search (as an alternative to problem-plagued Pygmalion).
Expect Big Leaps for International SEOBERT has this mono-linguistic to multi-linguistic ability because a lot of patterns in one language do translate into other languages.
There is a possibility to transfer a lot of the learnings to different languages even though it doesn't necessarily understand the language itself fully.
Google Will Better Understand 'Contextual Nuance' & Ambiguous QueriesA lot of people have been complaining that their rankings have been impacted.
But I think that that's probably more because Google in some way got better at understanding the nuanced context of queries and the nuanced context of content.
So perhaps, Google will be better able to understand contextual nuance and ambiguous queries.
Should You (or Can You) Optimize Your Content for BERT?Probably not.
Google BERT is a framework of better understanding. It doesn't judge content per se. It just better understands what's out there.
For instance, Google Bert might suddenly understand more and maybe there are pages out there that are over-optimized that suddenly might be impacted by something else like Panda because Google's BERT suddenly realized that a particular page wasn't that relevant for something.
That's not saying that you're optimizing for BERT, you're probably better off just writing natural in the first place.
[Video Recap] BERT Explained: What You Need to Know About Google's New AlgorithmWatch the video recap of the webinar presentation.
Or check out the SlideShare below.
Image Credits
All screenshots taken by author, November 2019
Join Us For Our Next Webinar! The Data Reveals: What It Takes To Win In AI SearchRegister now to learn how to stay away from modern SEO strategies that don't work.
The Technology Behind OpenAI's Fiction-writing, Fake-news-spewing AI, Explained
Last Thursday (Feb. 14), the nonprofit research firm OpenAI released a new language model capable of generating convincing passages of prose. So convincing, in fact, that the researchers have refrained from open-sourcing the code, in hopes of stalling its potential weaponization as a means of mass-producing fake news.
While the impressive results are a remarkable leap beyond what existing language models have achieved, the technique involved isn't exactly new. Instead, the breakthrough was driven primarily by feeding the algorithm ever more training data—a trick that has also been responsible for most of the other recent advancements in teaching AI to read and write. "It's kind of surprising people in terms of what you can do with [...] more data and bigger models," says Percy Liang, a computer science professor at Stanford.
The passages of text that the model produces are good enough to masquerade as something human-written. But this ability should not be confused with a genuine understanding of language—the ultimate goal of the subfield of AI known as natural-language processing (NLP). (There's an analogue in computer vision: an algorithm can synthesize highly realistic images without any true visual comprehension.) In fact, getting machines to that level of understanding is a task that has largely eluded NLP researchers. That goal could take years, even decades, to achieve, surmises Liang, and is likely to involve techniques that don't yet exist.
Four different philosophies of language currently drive the development of NLP techniques. Let's begin with the one used by OpenAI.
Linguistic philosophy. Words derive meaning from how they are used. For example, the words "cat" and "dog" are related in meaning because they are used more or less the same way. You can feed and pet a cat, and you feed and pet a dog. You can't, however, feed and pet an orange.
How it translates to NLP. Algorithms based on distributional semantics have been largely responsible for the recent breakthroughs in NLP. They use machine learning to process text, finding patterns by essentially counting how often and how closely words are used in relation to one another. The resultant models can then use those patterns to construct complete sentences or paragraphs, and power things like autocomplete or other predictive text systems. In recent years, some researchers have also begun experimenting with looking at the distributions of random character sequences rather than words, so models can more flexibly handle acronyms, punctuation, slang, and other things that don't appear in the dictionary, as well as languages that don't have clear delineations between words.
Pros. These algorithms are flexible and scalable, because they can be applied within any context and learn from unlabeled data.
Cons. The models they produce don't actually understand the sentences they construct. At the end of the day, they're writing prose using word associations.
Linguistic philosophy. Language is used to describe actions and events, so sentences can be subdivided into subjects, verbs, and modifiers—who, what, where, and when.
How it translates to NLP. Algorithms based on frame semantics use a set of rules or lots of labeled training data to learn to deconstruct sentences. This makes them particularly good at parsing simple commands—and thus useful for chatbots or voice assistants. If you asked Alexa to "find a restaurant with four stars for tomorrow," for example, such an algorithm would figure out how to execute the sentence by breaking it down into the action ("find"), the what ("restaurant with four stars"), and the when ("tomorrow").
Pros. Unlike distributional-semantic algorithms, which don't understand the text they learn from, frame-semantic algorithms can distinguish the different pieces of information in a sentence. These can be used to answer questions like "When is this event taking place?"
Cons. These algorithms can only handle very simple sentences and therefore fail to capture nuance. Because they require a lot of context-specific training, they're also not flexible.
Linguistic philosophy. Language is used to communicate human knowledge.
How it translates to NLP. Model-theoretical semantics is based on an old idea in AI that all of human knowledge can be encoded, or modeled, in a series of logical rules. So if you know that birds can fly, and eagles are birds, then you can deduce that eagles can fly. This approach is no longer in vogue because researchers soon realized there were too many exceptions to each rule (for example, penguins are birds but can't fly). But algorithms based on model-theoretical semantics are still useful for extracting information from models of knowledge, like databases. Like frame-semantics algorithms, they parse sentences by deconstructing them into parts. But whereas frame semantics defines those parts as the who, what, where, and when, model-theoretical semantics defines them as the logical rules encoding knowledge. For example, consider the question "What is the largest city in Europe by population?" A model-theoretical algorithm would break it down into a series of self-contained queries: "What are all the cities in the world?" "Which ones are in Europe?" "What are the cities' populations?" "Which population is the largest?" It would then be able to traverse the model of knowledge to get you your final answer.
Pros. These algorithms give machines the ability to answer complex and nuanced questions.
Cons. They require a model of knowledge, which is time consuming to build, and are not flexible across different contexts.
Linguistic philosophy. Language derives meaning from lived experience. In other words, humans created language to achieve their goals, so it must be understood within the context of our goal-oriented world.
How it translates to NLP. This is the newest approach and the one that Liang thinks holds the most promise. It tries to mimic how humans pick up language over the course of their life: the machine starts with a blank state and learns to associate words with the correct meanings through conversation and interaction. In a simple example, if you wanted to teach a computer how to move objects around in a virtual world, you would give it a command like "Move the red block to the left" and then show it what you meant. Over time, the machine would learn to understand and execute the commands without help.
Pros. In theory, these algorithms should be very flexible and get the closest to a genuine understanding of language.
Cons. Teaching is very time intensive—and not all words and phrases are as easy to illustrate as "Move the red block."
In the short term, Liang thinks, the field of NLP will see much more progress from exploiting existing techniques, particularly those based on distributional semantics. But in the longer term, he believes, they all have limits. "There's probably a qualitative gap between the way that humans understand language and perceive the world and our current models," he says. Closing that gap would probably require a new way of thinking, he adds, as well as much more time.
This originally appeared in our AI newsletter The Algorithm. To have it directly delivered to your inbox, sign up here for free.

Comments
Post a Comment