VIEW ALL COURSE TIMES AND SESSIONS



natural language processing with deep learning stanford :: Article Creator

COMP_SCI 497: Deep Learning For Natural Language Processing

VIEW ALL COURSE TIMES AND SESSIONS Prerequisites CS-349 or consent of instructor Description

In the first half of this course, we will explore the evolution of deep neural network language models, starting with n-gram models and proceeding through feed-forward neural networks, recurrent neural networks and transformer-based models.  In the second half of the course we will apply these models to natural language processing tasks, including question answering, text classification (including fakes detection), text summarization, text generation (including dialogue, neural machine translation and program synthesis) and natural language inference, among others.  After completing this course, students will be able to generalize these techniques to a wide variety of applied and research problems in natural language processing.

COURSE COORDINATORS: Prof. David Demeter

COURSE INSTRUCTOR: Prof. David Demeter


Signal Processing And Machine Learning

Signal Processing and Machine Learning

Signal processing algorithms, architectures, and systems are at the heart of modern technologies that generate, transform, and interpret information across applications as diverse as communications, robotics and autonomous navigation, biotechnology and entertainment. The growth in signal processing capability from early simpler, model based, low bandwidth applications to this current wide scope of impact has been enabled by the past 50 years of dramatic advances in semiconductor technology which made faster computation and high density rapidly accessible memory increasingly more available and affordable. In the past ten years machine learning and deep learning has continued this progress using data driven methods which do not require explicit models. This focus area includes courses in theory, architectures, implementations, and specific applications.

Faculty-in-Charge: Dr. Tokunbo Ogunfunmi and Dr. Sally Wood Program planning information for subareas in Signal Processing and Machine Learning

Each of the five subareas, described briefly below, has distinct core courses, although many subareas are closely related and programs will typically include some overlap. The programs for each subarea specify 6 units of required focus area courses and 14 units of subarea courses and electives. Recommendations for applied mathematics and breadth courses are also included. Consult with an academic advisor about these sample programs to match your specific interests and build on your prior academic coursework.

Machine Learning

Machine learning extracts information from data based on supervised and unsupervised learning methods. This includes understanding image content, spoken language, printed language, and large data sets for wide ranging applications such as autonomous driving, natural language processing, and medical data analysis. The rapid growth in AI with significant advances across such a broad application space is built on architectures and implementations enabled by increasing availability of GPU and FPGA parallel processing and large scale rapidly accessible data storage.

Computer Vision

Computer vision extracts content information from images, and it includes detection and identification of objects in an image, building three dimensional models of objects from image data, and interpreting scene information for both navigation and localization. Application areas include content searchable image data, augmented reality, robotics, and autonomous vehicles. Courses in this subarea cover both traditional model-based methods from image processing and current data-driven methods based on machine learning.

Image Processing

Image processing analyzes and improves images using both linear and nonlinear signal processing techniques to extract information from the image data. This includes image restoration, super-resolution, image reconstruction for medical imaging applications, image compression, mapping for high dynamic range, and image preprocessing, resizing, and enhancement.

Speech Processing

Speech processing area deals with analysis, synthesis, coding of speech signals, the most common form of human verbal communication. This includes linear predictive coding, waveform coding, quantization, predictive coding, transform coding, hybrid coding and sub-band coding. Applications of speech coding invarious systems. International and proprietary standards for speech and audio coding. Real-Time DSP implementation of speech coders. Voice over internet protocol. Recent advances in speech and speaker recognition, biometrics, etc. Deep Learning applications in speech processing.

Theory and Methods

Signal processing theory and methods includes the foundational knowledge for this focus area. It includes courses in the basic theory of digital signal processing and filter design and the use of statistical methods for detection and estimation.


The Debate Over Neural Network Complexity: Does Bigger Mean Better?

Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More

Artificial intelligence (AI) has made tremendous progress since its inception, and neural networks are usually part of that advancement. Neural networks that apply weights to variables in AI models are an integral part of this modern-day technology.

Research is ongoing, and experts still debate whether bigger is better in terms of neural network complexity.

Traditionally, researchers have focused on constructing neural networks with a large number of parameters to achieve high accuracy on benchmark datasets. While this approach has resulted in the development of some of the most intricate neural networks to date — such as GPT-3 with more than 175 billion parameters now leading to GPT-4. But it also comes with significant challenges. 

For example, these models require enormous amounts of computing power, storage, and time to train, and they may be challenging to integrate into real-world applications.

Event

Transform 2023

Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.

Register Now

Experts in the AI community have differing opinions on the importance of neural network complexity. Some argue that smaller, well-trained networks can achieve comparable results to larger models if they are trained effectively and are efficient.

For instance, newer models such as Chinchilla by Google DeepMind — comprising "just" 70 billion parameters — claims to outperform Gopher, GPT-3, Jurassic-1 and Megatron-Turing NLG across a large set of language benchmarks. Likewise, LLaMA by Meta — comprising 65 billion parameters — shows that smaller models can achieve greater performances.

Nevertheless, the ideal size and intricacy of neural networks remain a matter of debate in the AI community, raising the question: Does neural network complexity matter? 

The essence of neural network complexity

Neural networks are built from interconnected layers of artificial neurons that can recognize patterns in data and perform various tasks such as image classification, speech recognition, and natural language processing (NLP). The number of nodes in each layer, the number of layers and the weight assigned to each node determine the complexity of the neural network. The more nodes and layers a neural network has, the more complex it is.

With the advent of deep learning techniques that require more layers and parameters, the complexity of neural networks has increased significantly. Deep learning algorithms have enabled neural networks to serve in a spectrum of applications, including image and speech recognition and NLP. The idea is that more complex neural networks can learn more intricate patterns from the input data and achieve higher accuracy. 

"A complex model can reason better and pick up nuanced differences," said Ujwal Krothapalli, data science manager at EY. "However, a complex model can also 'memorize' the training samples and not work well on data that is very different from the training set."

Larger is better

A paper presented in 2021 at the leading AI conference NeurIPS by Sébastien Bubeck of Microsoft Research and Mark Sellke of Stanford University explained why scaling an artificial neural network's size leads to better results. They found that neural networks must be larger than conventionally expected to avoid specific fundamental problems.

However, this approach also comes with a few drawbacks. One of the main challenges of developing large neural networks is the amount of computing power and time required to train them. Additionally, large neural networks are often challenging to deploy in real-world scenarios, requiring significant resources.

"The larger the model, the more difficult it is to train and infer," Kari Briski, VP of product management for AI software at Nvidia, told VentureBeat. "For training, you must have the expertise to scale algorithms to thousands of GPUs and for inference, you have to optimize for desired latency and retain the model's accuracy." 

Briski explained that complex AI models such as large language models (LLMs) are autoregressive, and the compute context inputs decide which character or word is generated next. Therefore, the generative aspect could be challenging based on application specifications. 

"Multi-GPU, multi-node inference are required to make these models generate responses in real-time," she said. "Also, reducing precision but maintaining accuracy and quality can be challenging, as training and inference with the same precision are preferred."

Best results from training techniques

Researchers are exploring new techniques for optimizing neural networks for deployment in resource-constrained environments. Another paper presented at NeurIPS 2021 by Stefanie Jegelka from MIT and researchers Andreas Loukas and Marinos Poiitis revealed that neural networks do not require to be complex and best results can be achieved alone from training techniques. 

The paper revealed that the benefits of smaller-sized models are numerous. They are faster to train and easier to integrate into real-world applications. Moreover, they can be more interpretable, enabling researchers to understand how they make predictions and identify potential data biases.

Juan Jose Lopez Murphy, head of data science and artificial intelligence at software development firm Globant said he believes that the relationship between network complexity and performance is, well, complex.

"With the development of "scaling laws", we've discovered that many models are heavily undertrained," Murphy told VentureBeat. "You need to leverage scaling laws for general known architectures and experiment on the performance from smaller models to find the suitable combination. Then you can scale the complexity for the expected performance."

He says that smaller models like Chinchilla or LLaMA — where greater performances were achieved with smaller models — make an interesting case that some of the potential embedded in larger networks might be wasted, and that part of the performance potential of more complex models is lost in undertraining.

"With larger models, what you gain in the specificity, you may lose in reliability," he said." We don't yet fully understand how and why this happens — but a huge amount of research in the sector is going into answering those questions. We are learning more every day."

Different jobs require different neural schemes

Developing the ideal neural architecture for AI models is a complex and ongoing process. There is no one-size-fits-all solution, as different tasks and datasets require different architectures. However, several key principles can guide the development process. 

These include designing scalable, modular and efficient architectures, using techniques such as transfer learning to leverage pre-trained models and optimizing hyperparameters to improve performance. Another approach is to design specialized hardware, such as TPUs and GPUs, that can accelerate the training and inference of neural networks.

Ellen Campana, leader of enterprise AI at KPMG U.S., suggests that the ideal neural network architecture should be based on the data size, the problem to be solved and the available computing resources, ensuring that it can learn the relevant features efficiently and effectively.

"For most problems, it is best to consider incorporating already trained large models and fine-tuning them to do well with your use case," Campana told VentureBeat. "Training these models from scratch, especially for generative uses, is very costly in terms of compute. So smaller, simpler models are more suitable when data is an issue. Using pre-trained models can be another way to get around data limitations." 

More efficient architectures

The future of neural networks, Campana said, lies in developing more efficient architectures. Creating an optimized neural network architecture is crucial for achieving high performance.

"I think it's going to continue with the trend toward larger models, but more and more they will be reusable," said Campana. "So they are trained by one company and then licensed for use like we are seeing with OpenAI's Davinci models. This makes both the cost and the footprint very manageable for people who want to use AI, yet they get the complexity that is needed for using AI to solve challenging problems."

Likewise, Kjell Carlsson, head of data science strategy and evangelism at enterprise MLOps platform Domino Data Lab, believes that smaller, simpler models are always more suitable for real-world applications. 

"None of the headline-grabbing generative AI models is suitable for real-world applications in their raw state," said Carlsson. "For real-world applications, they need to be optimized for a narrow set of use cases, which in turn reduces their size and the cost of using them. A successful example is GitHub Copilot, a version of OpenAI's codex model optimized for auto-completing code."

The future of neural network architectures

Carlsson says that OpenAI is making models like ChatGPT and GPT4 available, because we do not yet know more than a tiny fraction of the potential use cases. 

"Once we know the use cases, we can train optimized versions of these models for them," he said. "As the cost of computing continues to come down, we can expect folks to continue the "brute force-ish" approach of leveraging existing neural network architectures trained with more and more parameters."

He believes that we should also expect breakthroughs where developers may come up with improvements and new architectures that dramatically improve these models' efficiency while enabling them to perform an ever-increasing range of complex, human-like tasks. 

Likewise, Amit Prakash, cofounder and CTO at AI-powered analytics platform ThoughtSpot, says that we will routinely see that larger and larger models show up with stronger capabilities. But, then there will be smaller versions of those models that will try to approximate the quality of the output of smaller models. 

"We will see these larger models used to teach smaller models to emulate similar behavior," Prakash told VentureBeat. "One exception to this could be sparse models or a mixture of expert models where a large model has layers that decide which part of the neural network should be used and which part should be turned off, and then only a small part of the model gets activated."

He said that ultimately, the key to developing successful AI models would be striking the right balance between complexity, efficiency and interpretability.

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.






Comments

Follow It

Popular posts from this blog

Reimagining Healthcare: Unleashing the Power of Artificial ...

What is Generative AI? Everything You Need to Know

Top AI Interview Questions and Answers for 2024 | Artificial Intelligence Interview Questions