Exploring Artificial Intelligence Career Paths: Opportunities in 2025



ai understanding natural language :: Article Creator

AI Boom Requires New Benchmarks For Natural Language Understanding

Aarne Talman's  work helps us develop better benchmarks and, thus, better AI models, while also allowing us to devise new AI models that better mimic human language understanding.

Current benchmarks unable to measure language understanding capabilities of AI models

Talman says that current benchmarks measuring the language understanding capabilities of AI models do not actually measure what they claim to, as the models are able to perform any tasks assigned to them by relying on other patterns in datasets.

As part of his research, Talman not only assessed benchmarks for language understanding, but also developed methods to enhance AI models' language understanding.

"To the best of my knowledge, I was the first to apply the Stochastic Weight Averaging Gaussian (SWAG) method in the context of language understanding," Talman says. "This method enables the development of AI models with language understanding capabilities that better capture the uncertainty involved in human language understanding."

In his research, Talman also clarifies concepts of language understanding and opens up discussion on what requirements AI models must meet for us to say that they understand natural language.

Current benchmarks have been used to compare AI models in terms of their language understanding capabilities.

Talman also discusses the nature of language understanding more generally and considers the extent to which AI models are able to understand language. 

"Can we say that an AI model actually understands the language it reads?" he asks.

AI will play (and is already playing) a major role in our society. Language understanding is one of the cornerstones of intelligence. 

"It's important that we're able to develop better AI models that more closely match human language understanding. To do so, we must grasp what language understanding means and how it can be measured." 

Timely research on the capability of AI models to understand natural language

Aarne Talman, MSc, will defend his doctoral thesis Towards Natural Language Understanding: Developing and Assessing Approaches and Benchmarks on 23 February at 13.15 in the Doctoral Programme in Language Studies at the Department of the Digital Humanities of the University of Helsinki's Faculty of Arts.

The public examination will take place in Banquet Room 303 at Unioninkatu 33. The event can also be attended via live stream.


Artificial Intelligence Is Learning To Understand People In Surprising New Ways

AI now detects personality traits from text and explains its reasoning, advancing psychology and ethical tech. (CREDIT: CC BY-SA 4.0)

A growing body of research shows that AI can detect key parts of your personality just from the words you use. It doesn't need long interviews or tests. Instead, it looks at your writing—social media posts, essays, or even everyday messages—and picks up signals about who you are.

Researchers are now taking this a step further. They've developed methods to peek inside AI's "mind" to understand how it makes those judgments. This new level of explainability could change how we assess personality, making it easier, faster, and more accurate across many areas of life—from therapy to education and even job hiring.

How AI Sees Personality in Words

A team of scientists from the University of Barcelona recently ran an in-depth study using cutting-edge AI models to explore personality detection. They tested models trained on large sets of written texts, comparing two well-known personality systems: the Big Five and the Myers-Briggs Type Indicator (MBTI).

From left to right, experts Daniel Ortiz, David Saeteros and David Gallardo, at the University of Barcelona. (CREDIT: University of Barcelona)

The Big Five model breaks personality into five traits: openness, conscientiousness, extraversion, agreeableness, and emotional stability. MBTI, often used in online quizzes and corporate settings, divides people into four categories: introvert vs. Extrovert, sensing vs. Intuitive, thinking vs. Feeling, and judging vs. Perceiving.

The study used two leading AI models: BERT and RoBERTa. These models are good at processing natural language. To train them, the researchers used essays and posts from people whose personality traits had already been measured using questionnaires.

Once the AI could detect patterns, the team turned to a method called integrated gradients. This tool reveals which words or phrases had the most impact on the model's predictions. The goal was to ensure that the AI wasn't just guessing based on random patterns or biases in the data.

According to the scientists, integrated gradients allow researchers to "open the black box" of AI. This means they can tell why the system makes a specific decision. For example, the word "hate" might seem negative, but in a phrase like "I hate to see others suffer," it may actually show empathy. Without context, a model might misinterpret that.

Related Stories

"This methodology has allowed us to visualize and quantify the importance of various linguistic elements in the model's predictions," the researchers said.

AI Models BERT and RoBERTa

BERT (Bidirectional Encoder Representations from Transformers) and RoBERTa (Robustly Optimized BERT Pretraining Approach) are both transformer-based language models developed for natural language understanding, but they differ in their training strategies, data usage, and certain architectural choices.

BERT, introduced by Google in 2018, was groundbreaking because it used bidirectional context in pretraining, meaning it considered both left and right context simultaneously to predict masked words. It was trained on the BookCorpus and English Wikipedia using two pretraining objectives: Masked Language Modeling (MLM) and Next Sentence Prediction (NSP).

The MLM task randomly masks some tokens and predicts them based on context, while NSP trains the model to determine whether one sentence logically follows another. This combination helped BERT excel in a variety of NLP tasks, such as question answering, sentence classification, and named entity recognition.

A schematic depiction of the BERT model and its training process. (CREDIT: Cameron R. Wolfe / Substack)

RoBERTa, released by Facebook AI in 2019, kept the core BERT architecture but modified the pretraining methodology to improve performance. It removed the NSP task entirely, as experiments showed that it wasn't necessary for strong downstream results.

RoBERTa was trained on a much larger dataset — including BookCorpus, English Wikipedia, CC-News, OpenWebText, and Stories — and used significantly more training steps with larger batch sizes. Additionally, it dynamically changed the masking pattern during training rather than keeping it fixed, which helped the model learn more robust language representations.

In essence, BERT established the bidirectional transformer framework, while RoBERTa refined it through better optimization choices, more data, and altered training objectives. The result is that RoBERTa generally outperforms BERT on benchmark NLP tasks, not because of a radically different architecture, but due to more aggressive and well-tuned pretraining strategies. BERT remains historically significant as the foundation, while RoBERTa represents a direct, empirically improved successor.

Why the Big Five Beats MBTI

The study revealed that the Big Five model worked better with AI tools than MBTI. The researchers found that MBTI led the models to focus more on surface-level clues rather than deep, consistent patterns in language.

Occurrences for each letter of the MBTI in the personality cafÃ(c) dataset. (CREDIT: David Saeteros, et al.)

"Despite being widely used in computer science and some applied fields of psychology, the MBTI model has serious limitations," they explained. "Our results indicate that the models tend to rely more on artefacts than on real patterns."

In contrast, the Big Five showed more reliable links between language and personality. AI predictions based on this model were more stable and matched known psychological patterns. This makes the Big Five a stronger choice for future research and practical use.

A New Tool for Psychology and Beyond

Automatic personality detection has wide applications. In psychology, it opens the door to more natural ways of understanding people. It could replace or support traditional personality tests. By studying language, therapists and researchers might spot changes in mood or personality over time. This could help in early diagnosis or track a patient's progress during treatment.

"With these methods, psychologists will identify linguistic patterns associated with different personality traits that, with traditional methods, might go unnoticed," the team explained. Beyond the clinic, AI-based personality analysis could help in hiring, customizing education, or building smarter digital assistants. It could even shape how social scientists study populations, making it easier to examine huge sets of written data.

In hiring, for instance, employers might use writing samples to learn if someone fits a certain work style. In education, teachers could better tailor learning based on student personality. AI could even help digital assistants, like chatbots or virtual tutors, respond more naturally by adjusting their behavior based on user traits.

Bar plot for the geometric mean positive attribution scores for Agreeableness. (CREDIT: David Saeteros, et al.)

The team emphasized the need for ethics and transparency in all uses. "It is important to stress that all such applications should be based on scientifically sound models and incorporate the explainability techniques we have explored, to ensure ethical and transparent use," they added.

The Future: Combining AI and Traditional Tests

Although this technology is powerful, researchers don't expect it to replace standard personality tests anytime soon. Instead, they see it working alongside traditional tools to give a richer, more complete view of someone's personality.

"We see an evolution towards a multimodal approach," they said. "Traditional assessments are combined with natural language analysis, digital behavior and other data sources to get a more complete picture." This mix of methods could make personality research more accurate and useful. For example, digital behavior, such as online activity or voice tone, might be added to written text. The team is also exploring tools like Whisper.Ai, which can turn spoken words into text, for future analysis.

AI models are especially helpful in places where people don't want to take long tests, or where there's a lot of writing to review. That makes them useful in real-life settings where time or access is limited.

The researchers plan to test their findings across different types of writing, platforms, languages, and cultures. They want to see if these language-personality links hold true for people in other countries or who speak other languages.

The researchers also aim to study other mental and emotional traits, not just personality. They are working with professionals in therapy and human resources to apply these tools in the real world. This helps make sure the AI has a useful, fair, and positive impact. The goal is not just better science, but better tools that work for people from all walks of life.

Research findings are available online in the journal PLOS One.

Note: The article above provided above by The Brighter Side of News.

Like these kind of feel good stories? Get The Brighter Side of News' newsletter.


Google Has Open Sourced SyntaxNet, Its AI For Understanding Language

If you tell Siri to set an alarm for 5 am, she'll set an alarm for 5 am. But if you start asking her which prescription pain killer is least likely to upset your stomach, she's not really gonna know what to do---just because that's a pretty complicated sentence. Siri is a long way from what computer scientists call "natural language understanding." She can't truly understand the natural way we humans talk---despite the way Apple portrays her in all those TV ads. In fact, we shouldn't really be talking about her as a "her" at all. Siri's personhood is a marketing fiction concocted by Apple---and not a very convincing one, at that.

Which is not to say that our digital assistants will never live up to their simulated humanity. So many researchers working at so many tech giants, startups, and universities are pushing computers towards true natural language understanding. And the state-of-the-art keeps getting better, thanks in large part to deep neural networks---networks of hardware and software that mimic the web of neurons in the brain. Google, Facebook, and Microsoft, among others, are already using deep neural nets to identify objects in photos and recognize the individual words we speak into digital assistants like Siri. The hope is that this same breed of artificial intelligence can dramatically improve a machine's ability to grasp the significance of those words, to understand how those words interact to form meaningful sentences.

Google is among those at the forefront of this research---such tech plays into both its primary search engine and the Siri-like assistant it operates on Android phones---and today, the company signaled just how big of a role this technology will play in its future. It open sourced the software that serves as the foundation for its natural language work, freely sharing it with the world at large. Yes, that's the way it now works in the tech world. Companies will give away some of their most important stuff as a way of driving a market forward.

This newly open source software is called SyntaxNet, and among natural language researchers, it's known as a syntactic parser. Using deep neural networks, SyntaxNet parses sentences in an effort to understand what role each word plays and how they all come together to create real meaning. The system tries to identify the underlying grammatical logic---what's a noun, what's a verb, what the subject refers to, how it relates to the object---and then, using this info, it tries to extract what the sentence is generally about---the gist, but in a form machines can read and manipulate.

"The accuracy we get substantially better than what we were able to get without deep learning," says Google research director Fernando Pereira, who helps oversee the company's work with natural language understanding. He estimates that the tool has cut the company's error rate by between 20 and 40 percent compared to previous methods. This is already helping to drive live Google services, including the company's all-important search engine.

Share and Share Alike

According to at least some researchers outside Google, SyntaxNet is the most advanced system of its kind---if not exactly leaps and bounds over the competition. Google previously released a research paper describing this work. "The results of that paper are quite good. They're pushing us forward a little bit," says Noah Smith, a professor of computer science at the University of Washington who specializing in natural language understanding. "But there are a lot of people who continue to work on this problem." What's perhaps the most interesting about this project is that Google---an enormously powerful company that previously kept so much of its most important research to itself---continues to openly share such tools.

In sharing SyntaxNet, Google aims to accelerate the progress of natural language research, much as when it open sourced the software engine known as TensorFlow that drives all its AI work. By letting anyone use and modify SyntaxNet (which runs atop TensorFlow), Google gets more human brains attacking the problem of natural language understanding than if it kept the technology to itself. In the end, that could benefit Google as a business. But an open source SyntaxNet is also a way for the company to, well, advertise its work with natural language understanding. That could also benefit Google as a business.

Undoubtedly, with technology like SyntaxNet, Google is intent on pushing computers as far as it can towards real conversation. And in a competitive landscape that includes not just Apple's Siri but many other would-be conversant computers, Google wants the world to know just how good its tech really is.

Digital Assistants Everywhere

Google is far from alone in the personal assistant race. Microsoft has its digital assistant called Cortana. Amazon is finding success with its voice-driven Echo, a standalone digital assistant. And countless startups have also entered the race, including most recently Viv, a company started by two of the original designers of Siri. Facebook has even broader ambitions with a project it calls Facebook M, a tool that chats with you via text rather than voice and aims to do everything from schedule your next appointment at the DMV or plan your next vacation.

Still, despite so many impressive names working on the problem, digital assistants and chatbots are still such a long way from perfect. That's because the underlying technologies that handle natural language understanding are still such a long way from perfect. Facebook M relies partly on AI, but more on real-life humans who help complete more complex tasks---and help train the AI for the future. "We are very far from where we want to be," Pereira says.

Indeed, Pereira describes SyntaxNet as a stepping stone to much bigger things. Syntactic parsing, he says, merely provides a foundation. So many other technologies are needed to take the output of SyntaxNet and truly grasp meaning. Google is opening sourcing the tool in part to encourage the community to look beyond syntactic parsing. "We want to encourage the research community---and everyone who works on natural language understanding---to move beyond parsing, towards the deeper semantic reasoning that is necessary," he says. "We're basically telling them: 'You don't have to worry about parsing. You can take that as a given. And now you can explore harder.'"

Enter the Deep Neural Net

Using deep neural networks, SyntaxNet and similar systems do take syntactic parsing to a new level. A neural net learns by analyzing vast amounts of data. It can learn to identify a photo of a cat, for instance, by analyzing millions of cat photos. In the case of SyntaxNet, it learns to understand sentences by analyzing millions of sentences. But these aren't just any sentences. Humans have carefully labelled them, going through all the examples and carefully identifying the role that each word plays. After analyzing all these labeled sentences, the system can learn to identify similar characteristics in other sentences.

Though SyntaxNet is a tool for engineers and AI researchers, Google is also sharing a pre-built natural language processing service that it has already trained with the system. They call it, well, Parsey McParseface, and it's trained for English, learning from a carefully labeled collection of old newswire stories. According to Google, Parsey McParseface is about 94 percent accurate in identifying how a word relates the rest of a sentence, a rate the company believes is close to the performance of a human (96 to 97 percent).

Smith points out that such a dataset can be limiting, just because it's Wall Street Journal-speak. "It's a very particular kind of language," he says. "It doesn't look like a lot of the language people want to parse." The eventual hope is to train these types of systems on a broader array of data drawn straight from the web, but this is much harder, because people use language on the web in so many different ways. When Google trains its neural nets with this kind of dataset, the accuracy rate drops to about 90 percent. The research here just isn't as far along. The training data isn't as good. And it's a harder problem. What's more, as Smith point out, research using languages other than English isn't as far along either.

In other words, a digital assistant that works like a real person sitting next isn't by no means reality, but we are getting closer. "We are a very long way from building human capabilities," Pereira says. "But we're building technologies that are ever more accurate."






Comments

Follow It

Popular posts from this blog

What is Generative AI? Everything You Need to Know

Top Generative AI Tools 2024

60 Growing AI Companies & Startups (2025)