Folksonomy Taxonomy Fauxonomy

I wrote about the topic of folksonomy back in 2006. The word joins folk + taxonomy and refers to the collaborative but informal way in which information is being categorized on the web.

As users, usually voluntarily, assign keywords or "tags" (from hashtags) to images, posts or data, a folksonomy emerges. These things are found on sites that share photographs, personal libraries, bookmarks, social media and blogs which often allow tags for each entry.

Taxonomy is a more familiar and very formal process. You are probably familiar with scientific classifications and might have studied the taxonomy of organisms. Remember learning about Domain, Kingdom, Phylum, Class, Order, Family, Genus, and Species? As an avid gardener, i encounter the taxonomy of plants regularly.

There are taxonomies that are not considered "scientific" because they include sociological factors. In academia, many of us know Bloom's Taxonomy - the classification of educational objectives and the theory of mastery learning.

Non-scientific classification systems are referred to as folk taxonomies, but the academic community does not always accept folksonomy into either area. In fact, some who support scientific taxonomies have dubbed folksonomies as fauxonomies.

Others see folksonomy as a part of the path to creating a semantic web. It's a web that contains computer-readable metadata that describes its content. This metadata (tags) allows for precision searching.

If you have ever tried to get a group of readers or graders to agree on how to evaluate writing using a rubric, you might understand how hard it would be to get the creators of web content tag content in a consistent and reliable way.

Some examples of standards for tagging include Dublin Core and the RSS file format used for blogs and podcasts. All of this really grew out of the use of XML. Extensible Markup Language (XML) is a general-purpose markup language (as is HTML) that was at least partially created to facilitate the sharing of data across different systems, particularly systems connected via the Internet.

Folksonomies do have advantages. They are user-generated and therefore easy (inexpensive) to implement. Metadata in a folksonomy (for example, the photo tags on Flickr.com) comes from individuals interacting with content not administrators at a distance. This type of taxonomy conveys information about the people who create the tags and a kind of user community portrait may emerge. Some sites allow you to then link to other content from like-minded taggers. (We have similar taste in photos or music, so let's check out each others links.) Users become engaged.

There are problems: idiosyncratic tagging actually makes searches LESS precise. Some people post items and add many hashtags in the hopes of having their content found in a search on that tag. They may even add irrelevant tags for that reason. Tagging your post with the names of currently popular people or adding "free, nude, realestate, vacations" even though none of those are relevant to your content might cause someone searching for those things to find your content - but that person is likely to be unhappy at landing at your place.

 

Is Your Phone Smarter Than You Yet?

IoT
      Image by Chen from Pixabay

Predictions can be interesting, but people rarely look back at ones to see if they were correct. I wrote a post titled "In 4 Years Your Phone Will Be Smarter Than You (and the rise of cognizant computing)"  It has more than 969,000 views since I posted it in November 2013. Next year will be 10 years since that prediction. Is your phone smarter than you yrt?

That was not my prediction but it was an analysis from the market research firm Gartner. They weren't as concerned with hardware as with data and cloud computational ability. I said then that phones will appear smarter than you IF you equate smarts with being able to recall information and make inferences. Surely, those two things are part of being "smart" but not all of it.

"Smart" is also defined sometimes as being knowledgeable of something especially through personal experience, mindful, even cognizant of the potential dangers. Cognizant is a synonym for awareness. I have bee reading a lot about artificial intelligence lately. While cognizant computing does use algorithms to anticipate users' needs, dpong so doesn't approach actual "consciousness."

If an app has my browsing history, purchase records, financial information, and whatever is available somewhere on the cloud (known or unbeknownst to me) it can be pretty good at predicting somethings about me.

Cognitive computing isn't the same thing, though so much of all this seems to overlap. Cognitive computing (part of cognitive science) and attempts to simulate the human thought process.

As I said, these things overlap, at least to someone like myself who isn't really working in these fields. Maybe it makes a kind of sense that AI, cognitive and cognizant computing, signal processing, machine learning, natural language processing, speech and vision recognition, human-computer interaction and probably a dozen I'm forgetting. I suspect that all these things will converge at some point in the future to create the ultimate AI.

I don't see as many mentions these days to the Internet of things (IoT) as I did a decade ago. Internet-enabled objects exist in my home as "appliances." This morning I was checking my Ecobee app which is my wireless home energy monitor. I assume that it is already and will in the future be better at a kind of cognizant device that monitors my home environmental conditions and make adjustments based on my settings and the three sensors that monitor our activity. It knows that no one is upstairs and so drops the temperature there - though no lower than what I have told it. It also suggests changes to my settings and reminds me to change the filter every three months. I always di that on the solstices and equinoxes anyway but if I miss that date by a day or two, it adjust the next change accordingly. Quite a fussy and OCD device. It could connect to my Alexa devices but I haven't allowed that yet. Maybe one day it will just do it on its own and tell me "It's for your own good, Kenneth."

It Is Way Past the Time to Update the Communications Act of 1996

social media
Image by Pete Linforth from Pixabay

If you have been using the Internet for the past 25 years, you know how radically it has changed. And yet, no comprehensive regulations have been updated since then.

The news is full of complaints about tech companies getting too big and too powerful. Social media is often the focus of complaints. We often hear that these companies are resistant to changes and regulations, but that is not entirely true. 

On Facebook's site concerning regulations, they say "To keep moving forward, tech companies need standards that hold us all accountable. We support updated regulations on key issues."

Facebook may be at the center of fears and complaints, but they keep growing. Two billion users and growing.

There are four issues that address that they feel need new regulations.

Combating foreign election interference
We support regulations that will set standards around ads transparency and broader rules to help deter foreign actors, including existing US proposals like the Honest Ads Act and Deter Act.

Protecting people’s privacy and data
We support updated privacy regulations that will set more consistent data protection standards that work for everyone.

Enabling safe and easy data portability between platforms
We support regulation that guarantees the principle of data portability. If you share data with one service, you should be able to move it to another. This gives people choice and enables developers to innovate.

Supporting thoughtful changes to Section 230
We support thoughtful updates to internet laws, including Section 230, to make content moderation systems more transparent and to ensure that tech companies are held accountable for combatting child exploitation, opioid abuse, and other types of illegal activity.

The Telecommunications Act of 1996 was the first major overhaul of telecommunications law in almost 62 years. Its main goal was stated as allowing "anyone [to] enter any communications business -- to let any communications business compete in any market against any other." The FCC said that they believed the Act had "the potential to change the way we work, live and learn." They were certainly correct in that. But they continued that they expected that it would affect "telephone service -- local and long distance, cable programming and other video services, broadcast services and services provided to schools."

And it did affect those things. But communications went much further and much faster than the government and now they need to play some serious catchup. It is much harder to catch up than it is to keep up. 

 

Probability

coin tossI took one course in statistics. I didn't enjoy it, though the ideas in it could have been interesting, the presentation of them was not.

I came across a video by Cassie Kozyrkov that asks "What if I told you I can show you the difference between Bayesian and Frequentist statistics with one single coin toss?" Cassie is a data scientist and statistician. She founded the field of Decision Intelligence at Google, where she serves as Chief Decision Scientist. She has another one of those jobs that didn't exist in my time of making career decisions.

Most of probably had some math teacher use a coin toss to illustrate simple probability. I'm going to toss this quarter. What are the odd that it is heads-up? 50/50. The simple lesson is that even if it has come up tails 6 times in a row the odds for toss 7 is still 50/50.

But after she tosses it and covers it, she asks what is the probability that the coin in my palm is up heads now? She says that the answer you give in that moment is a strong hint about whether you’re inclined towards Bayesian or Frequentist thinking.

The Frequentist: “There’s no probability about it. I may not know the answer, but that doesn’t change the fact that if the coin is heads-up, the probability is 100%, and if the coin is tails-up, the probability is 0%.”

The Bayesian: “For me, the probability is 50% and for you, it’s whatever it is for you.”

Cassie's video about this goes much deeper - too deep for my current interests. However, I am intrigued by the idea that if the parameter may not be a random variable (Frequentist) you can consider your ability to get the right answer, but if you let the parameter be a random variable (Bayesian), there's no longer any notion of right and wrong. She says, "If there’s no such thing as a fixed right answer, there’s no such thing as getting it wrong."

I'll let that hang in the air here for you to consider.



If you do have an interest to go deeper, try:
Frequentist vs Bayesian fight - your questions answered
An 8 minute statistics intro
Statistical Thinking playlist
Controversy about p-values (p as in probabllity)

 

Law of Large Numbers

roulette
Image by Thomas Wolter from Pixabay

A recent episode of the PBS program NOVA took me back to my undergraduate statistics course. It was a course I didn't want to take because I have never been a math person and I assumed that is what the course was about. I was wrong. 

The interesting episode is on probability and prediction and its approach reminded me of the course which also turned out to be surprisingly interesting. Program and course were intended for non-math majors and the producers and professor focused on everyday examples.

I suggest you watch the NOVA episode. You will learn about things that are currently in the news and that you may not have associated with statistics, such as the wisdom of crowds, herd immunity, herd thinking and mob thinking.

For example, the wisdom of crowds is why when a contestant on a Who Wants to Be a Millionaire type of programs asks the audience and out of a few hundred people 85% answer "B," then there's an excllent chance that "B" is the correct answer. And larger samples get more accurate. Why is that?

One of the things I still recall from that class that the program highlighted was the law of large numbers. The law of large numbers states that as a sample size grows, its mean gets closer to the average of the whole population. It was proposed by the 16th century, mathematician Gerolama Cardano but was proven by Swiss mathematician Jakob Bernoulli in 1713.

It works for many situations from the stockmarket to a roulette wheel. I recall that we learned about the "Gambler’s Fallacy." The fallacy is that gamblers don't know enough math, or statistics. They stand by the wheel and see that red has won once and black has now won 5 times in a row. Red is due to win, right? Wrong. The red and black is the same as a coin flip. The odds are always 50/50. The casino knows that. They even list which color and numbers have come up on a screen to encourage you to believe the fallacy.

Flip the coin or spin the wheel 10 times and if could be heads or reds 9 times. Flip or spin 500 times and it will come out to be a lot closer to 50-50.

The "house edge" for American Roulette exists because there is that double zero on the wheel. That gives the house an edge of 2.70%. The edge for European roulette is 5.26%. 

Knowing about probability greatly increases your accuracy in making predictions. And more data makes that accuracy possible.

 

Strong and Weak AI

programming
Image by Gerd Altmann from Pixabay

Ask several people to define artificial intelligence (AI) and you'll get several different definitions. If some of them are tech people and the others are just regular folks, the definitions will vary even more. Some might say that it means human-like robots. You might get the answer that it is the digital assistant on their countertop or inside their mobile device.

One way of differentiating AI that I don't often hear is by the two categories of weak AI and strong AI.

Weak AI (also known as “Narrow AI”) simulates intelligence. These technologies use algorithms and programmed responses and generally are made for a specific task. When you ask a device to turn on a light or what time it is or to find a channel on your TV, you're using weak AI. The device or software isn't doing any kind of "thinking" though the response might seem to be smart (as in many tasks on a smartphone). You are much more likely to encounter weak AI in your daily life.

Strong AI is closer to mimicking the human brain. At this point, we could say that strong AI is “thinking” and "learning" but I would keep those terms in quotation marks. Those definitions of strong AI might also include some discussion of technology that learns and grows over time which brings us to machine learning (ML), which I would consider a subset of AI.

ML algorithms are becoming more sophisticated and it might excite or frighten you as a user that they are getting to the point where they are learning and executing based on the data around them. This is called "unsupervised ML." That means that the AI does not need to be explicitly programmed. In the sci-fi nightmare scenario, the AI no longer needs humans. Of course that is not even close to true today as the AI requires humans to set up the programming, supply the hardware and its power. I don't fear the AI takeover in the near future.

But strong AI and ML can go through huge amounts of data that it is connected to and find useful patterns. Some of those are patterns and connections that itis unlikely that a human would find. Recently, you may have heard of the attempts to use AI to find a coronavirus vaccine. AI can do very tedious, data-heavy and time-intensive tasks in a much faster timeframe.

If you consider what your new smarter car is doing when it analyzes the road ahead, the lane lines, objects, your speed, the distance to the car ahead and hundreds or thousands of other factors, you see AI at work. Some of that is simpler weak AI, but more and more it is becoming stronger. Consider all the work being done on autonomous vehicles over the past two decades, much of which has found its way into vehicles that still have drivers.

Of course, cybersecurity and privacy become key issues when data is shared. You may feel more comfortable in allowing your thermostat to learn your habits or your car to learn about how you drive and where you drive than you are about letting the government know that same data. Discover the level of data we share online dong financial operations or even just our visiting sites, making purchases and our search history, and you'll find the level of paranoia rising. I may not know who you are reading this article, but I suspect someone else knows and is more interested in knowing than me.