Natural Language Processing (NLP)

Natural Language Processing (NLP) is a field of artificial intelligence focused on enabling computers to understand, interpret, and generate human language. The "natural" part is that the goal is that this AI language use is meaningful and contextually relevant. This might be used for tasks such as language translation, sentiment analysis, and speech recognition.

NLP sample
NLP sample by Seobility - License: CC BY-SA 4.0

Search engines leverage NLP to improve various aspects of search. Understanding what a user means when searching for a search string and understanding what the different pages on the web are about and what questions they answer are all vital aspects of a successful search engine.

According to AWS, companies commonly use NLP for these automated tasks:
•    Process, analyze, and archive large documents
•    Analyze customer feedback or call center recordings
•    Run chatbots for automated customer service
•    Answer who-what-when-where questions
•    Classify and extract text

NLP crosses over into other fields. Here are three.

Computational linguistics is the science of understanding and constructing human language models with computers and software tools. Researchers use computational linguistics methods, such as syntactic and semantic analysis, to create frameworks that help machines understand conversational human language. Tools like language translators, text-to-speech synthesizers, and speech recognition software are based on computational linguistics. 

Machine learning is a technology that trains a computer with sample data to improve its efficiency. Human language has several features like sarcasm, metaphors, variations in sentence structure, plus grammar and usage exceptions that take humans years to learn. Programmers use machine learning methods to teach NLP applications to recognize and accurately understand these features from the start.

Deep learning is a specific field of machine learning which teaches computers to learn and think like humans. It involves a neural network that consists of data processing nodes structured to resemble the human brain. With deep learning, computers recognize, classify, and co-relate complex patterns in the input data.

Overview of NLP

Neural Networks and Artificial General Intelligence

neural network

A neural network is a type of deep learning model within the broader field of machine learning (ML) that simulates the human brain.

It was long thought that the way to add "intelligence" to computers was to try to imitate or model the way the brain works. That turned out to be a very difficult - some might say impossible - goal.

Neural networks process data through interconnected nodes or neurons arranged in layers—input, hidden, and output. Each node performs simple computations, contributing to the model’s ability to recognize patterns and make predictions.

These deep learning neural networks are effective in handling tasks such as image and speech recognition, which makes them a key component of many AI applications.

When neural networks are being "trained," they make random guesses. A node on the input layer randomly decides which of the nodes in the first hidden layer to activate, and then those nodes randomly activate nodes in the next layer, and so on, until this random process reaches the output layer. If you know of any of the large language models (LLM) then you have seen this at work. GPT-4 has around 100 layers, with tens or hundreds of thousands of nodes in each layer.

Have you ever clicked thumbs-up or thumbs-down to a computer’s suggestion? Then you have contributed to the reinforcement learning of that network.

I have found that predicting the future of technology is rarely accurate, and predictions on AI have generally been wrong. In 1970, one of the top AI researchers predicted that “in three to eight years, we will have a machine with the general intelligence of an average human being.” Well, that did not happen.

Most current AI systems are "narrow AI" which means they are specialized to perform specific tasks very well (such as recognizing faces) but lack the ability to generalize across different tasks. Human intelligence involves a complex interplay of reasoning, learning, perception, intuition, and social skills, which are challenging to replicate in machines.

That idea of reaching artificial general intelligence (AGI) has its own set of predictions with experts having varying opinions on when AGI might be achieved,. I have seen optimistic estimates of a few decades to more conservative views spanning centuries or even beyond. It is hard to predict but breakthroughs in AI research, particularly in areas like reinforcement learning, neural architecture search, and computational neuroscience, could accelerate progress towards AGI.

Is Your 2024 Phone Finally Smarter Than You?

playing chess against a smartphone

The prediction game is a tough one to win. I wrote a piece in 2013 titled "In 4 Years Your Phone Will Be Smarter Than You (and the rise of cognizant computing)" That would mean I should have checked back in 2017 to see if my predictions came to pass. Well, not my predictions but those from an analysis of market research firm Gartner. I did check back at the end of 2022 and now I'm checking in again after just a few years.

That original report was predicting that it wouldn't have as much to do with hardware, but rather from the growth of data and computational ability in the cloud. That seems to be true about hardware. My smartphone for 2024 is not radically different from the one I had in 2017. More expensive, better camera, new apps, but still the same basic functions as back then. It looks about the same too. New radical changes.

If phones seem smarter it means that you have a particular definition of "smart." If smart means being able to recall information and make inferences, then my phone, my Alexa, and the Internet are all smarter than me. And in school, remembering information and making inferences are still a big part of being smart. But it's not all of it.

"Cognizant computing" was part of that earlier piece. That is software and devices that predict your next action based on personal data already gathered about you. It might at a low level suggest a reply to an email. At a high level, it might suggest a course of treatment to your doctor. The term "cognizant computing" doesn't seem to occur much anymore. In fact, looking for it today on Wikipedia brought the result "The page "Cognizant computing" does not exist."

It seems to have been grouped in with machine learning, natural language processing, computer vision, and human-computer interaction and any intelligent systems that can perceive and understand its environment, interact with users in natural ways, and adapt behavior based on changing circumstances. I think the average person would say to all that, "Oh, you mean AI?"

It's there in virtual assistants (like Siri, Alexa, or Google Assistant), personalized recommendation systems (such as those used by Netflix or Amazon), smart home devices, and various other domains where systems need to understand and respond to user needs effectively.

I asked a chatbot if it was an example of cognizant computing and it replied, "Yes, a chatbot can be considered an example of cognizant computing, particularly if it is designed to exhibit certain key characteristics."

The characteristics it meant are context awareness, personalization, adaptability, and natural interaction.

Chatbots can be aware of the context of the conversation and may remember previous interactions with the user, understand the current topic of conversation, and adapt their responses accordingly. In these ways, it can personalize interactions. That shows its adaptability and ability to learn from user interactions and improve over time. Using natural language processing (NLP) techniques to understand and generate human-like responses makes for more natural conversations between humans.

Is my smartphone smarter than me in 2024? It is smarter, but I think I still have some advantages. I'll check in again in a few years.

The Wayback Machine

wayback

The Wayback Machine (part of https://web.archive.org) has been making backups of the World Wide Web since 1996. Mark Graham, its director, describes it as "a time machine for the web." It does that by scanning hundreds of millions of webpages every day and storing them on their servers. To date, there are nearly 900 billion web pages backed up. Computer scientist Brewster Kahle says "The average life of a webpage is a hundred days before it's changed or deleted."

The first time I heard the name "Wayback Machine" I immediately thought of the fictional time-traveling device used by Mister Peabody (a dog) and Sherman (a boy) in the animated cartoon The Adventures of Rocky and Bullwinkle and Friends. In one of the show's segments, "Peabody's Improbable History", the characters used the machine to witness, participate in, and often alter famous historical events.

Sherman and Peabody

Sherman and Peabody

It has been many years since I watched these cartoons, but I recall them as funny and educational. I might be wrong about the latter observation.

I visited the website today and searched this blog's URL https://www.serendipity35.net and found that our site has been saved 153 times between February 8, 2009, and May 3, 2024. However, this blog started in February 2006, but that was when it was a little project in blogging I started with Tim Kellers when we were working at the New Jersey Institute of Technology. At that time it was hosted on NJIT's servers, so our URL was http://dl1.njit.edu/serendipity, for which there is no record. Perhaps, the university did not allows the Wayback Machine to crawl our servers.

serendipity35 2009

According to Wikipedi's entry, The Wayback Machine's software has been developed to "crawl" the Web and download all publicly accessible information and data files on webpages, the Gopher hierarchy, the Netnews (Usenet) bulletin board system, and downloadable software. The information collected by these "crawlers" does not include all the information available on the Internet, since much of the data is restricted by the publisher or stored in databases that are not accessible. To overcome inconsistencies in partially cached websites, Archive-It.org was developed in 2005 by the Internet Archive as a means of allowing institutions and content creators to voluntarily harvest and preserve collections of digital content, and create digital archives.

Crawls are contributed from various sources, some imported from third parties and others generated internally by the Archive. For example, crawls are contributed by the Sloan Foundation and Alexa, crawls run by Internet Archive on behalf of NARA and the Internet Memory Foundation, that mirror Common Crawl

screenshot 2014

A screenshot from the blog from a decade ago (2014).

Searching on another website of mine - Poets Online - I find pages from 2003 when it was hosted on the free hosting platform Geocities. There are broken lonks and missing images but they give a taste of what the site was back then in the days before customizable CSS and templated websites. They have archived a page from March of this year and most of the links and some images come through.

The online Wayback Machine is not the one that sparked by time-traveling imagination as a child. Yes, I wanted to accompany Sherman and Mr. Peabody, but I will have to be content to the time travel of looking at things from my past on and offline.

Waybackmachine3.png
Screen shot from DVD of Rocky and Bullwinkle cartoons., Fair use, Link