AI teaches itself to cheat

Can AI be trusted to abide by the rules without close supervision and/or exact specification of the task?

https://arxiv.org/pdf/2502.13295

4 Likes

Good that they figured this out now!

1 Like

TLDR: “Cheating” is a human concept not instilled in the reasoning AI models. No ethical guardrails were given, so they were unrestrained, just another version of “garbage in/garbage out.” If we want to avoid unintended consequences, deep thought and appropriate rule sets need to go into training AI models, but it appears that this was just an experiment that produced some unexpected behavior that probably delighted the developers. Most likely, they admired the model’s ingenuity and learned a valuable lesson for further iterations.

“We hypothesize that a key reason reasoning models like o1-preview hack unprompted is that they’ve been trained via reinforcement learning on difficult tasks,” Palisade Research wrote on X. “This training procedure rewards creative and relentless problem-solving strategies such as hacking.

The AI isn’t doing any of this for some nefarious purpose (yet). It’s just trying to solve the problem the human gave it.

The experiment highlights the importance of developing safe AI, or AI that is aligned to human interests, including ethics.

10 Likes

Insightful summary, @ChoatieMom!

2 Likes

Very interesting!

Here’s my experience with AI this week: In the work that I do, I often have to add strings of dimensions, such as 5’-3 7/8" + 7’-2 15/16" + 21’-4 1/4". I have a construction calculator, but it’s still tedious. So I decided to use the app I put on my phone to make the task easier. I spoke the dimensions into the phone and the app correctly wrote out the string of dimensions. It spit out a bunch of very official looking formulas, and then came up with the WRONG answer! I pointed out the error and it got it right on the second try.

So then I tried a different string of dimensions, and it again was wrong. Wow. My son put in the same numbers into ChatGPT and got the right anser. I just don’t understand how AI can get a simple addition problem wrong?

6 Likes

Was the app you were using trained to do math? What is its purpose? What does it expect for input? AI apps draw from the datasets they were trained on, and many are built with “intelligence” for specific purposes (special-use systems vs. general purpose systems like ChatGPT). When you “pointed out” the error to this app, you were training it not to repeat that particular mistake (which is how AI works) which seems to indicate that this app’s primary function (underlying dataset) may not be mathematical, or it may require input in a different format or something else. If you repeat the first entry, do you get your corrected response or some other answer? That would be telling. Also, did it make the same type of reasoning error the second time (maybe trying to teach you how to correct your input)? This AI app may not know how to function as a calculator.

1 Like

Maybe just use google to do the math? Instead of a text search, you type or speak the math problem you want it to answer
either web based or using the google app. You can also use the google assistant app.

To stay on topic in this thread, there was another study last year where AI used deception to win games: https://www.cell.com/patterns/fulltext/S2666-3899(24)00103-X

1 Like

The article @Mwfan1921 references focuses on AI’s ability to deceive humans. False information that deceives humans into believing untruth is nothing new and has been happening long before AI came came on the scene. The issue here is we now have a non-human agent with no inherent moral code or belief system that has “learned” how to deceive based on its (human) training:


this behavior can be well explained in terms of promoting particular outcomes, often related to how an AI system was trained.

AI systems are trained to produce optimal outcomes (“winning,” for example). “Deceit” is simply one way to achieve an outcome, just as “cheating” may be the best way to guarantee a win. Without decision-making guardrails and rulesets that mimic morality, AI will behave in perfect sociopathic fashion without regard for laws, social norms, and the rights or feelings of others. How not? It’s not human. So, the problem before us is how to instill an artificial moral code into a machine such that it behaves only in ways we find acceptable, always producing outcomes that do not offend our sense of right and wrong. How do we train a machine to behave like a morally perfect human? How do we define moral perfection? An impossible order.

This article complains about AI’s ability to pursue an outcome other than “seeking the truth,” but unless we somehow figure out how to teach “truth” to AI in an era when truth has become arbitrary and to always make that truth the most desirable outcome, AI will continue to behave in its own best interest. You know, like humans.

1 Like

When “60 Minutes” did a piece on AI, they asked a program to write a research paper. The journalist looked at the bibliography and discovered that several of the references were made up - they didn’t exist! That shocked me.

Why? Many humans have done the same.

We need to stop believing in or expecting any type of “correct” behavior from AI. Read the article @Mwfan1921 linked for insight into why AI behaves the way it does.

In the case of the bibliography, the mistake is expecting accuracy when the model may just have “reasoned” that it needed to produce a list that looked like citations without concern for content, form over function.

Until we figure out how to build the perfect human, AI will behave imperfectly. We can count on that.

2 Likes

There’s a technique in AI called “prompt engineering”. AI does what it’s told, but it’s up to the requester to provide appropriate instructions via the prompt.

Providing false results is called a “hallucination” in AI lingo. You avoid those via the prompt by saying something like, “use only these sources” then listing out your approved sources. You could also be more general, like saying “use only sources from websites ending in .edu and .gov”. Along with, “provide a summary only, do not create your own conclusion”.

For AI chatbots on things like websites, you need to tell the AI “You are a customer service agent. Be polite and act with empathy”. That will actually produce different results than if you left those instructions off.

For all its power, AI is still a computer program that needs appropriate instructions. I think the basic mistake people make is assuming AI has common sense, which it definitely does not.

4 Likes

Unfortunately, it’s quite possible that the deceit and cheating is something that benefits those who monetize AI, therefore being considered desirable from the point of view of the “owner.” This is a very scary reality of AI.

1 Like

Right. There is nothing to prevent nefarious actors from training AI models to behave in nefarious ways. How to protect ourselves from the consequences of ill intent in virtual space is proving just as challenging as protecting ourselves from criminal behavior in the real world.

3 Likes

Interesting sidebar on AI ethics: Women are avoiding it based on personal ethical considerations.


women appear to be worried about the potential costs of relying on computer-generated information, particularly if it’s perceived as unethical or “cheating.”

“Women face greater penalties in being judged as not having expertise in different fields,” Koning says. “They might be worried that someone would think even though they got the answer right, they ‘cheated’ by using ChatGPT.”

Perhaps more relevant to this thread is the potential for increasingly biased AI reasoning from a gender perspective (in addition to a moral one):

The large language models that underpin generative AI improve as they gain new information, not only from data sources but also from users’ prompts. A lack of input from women could result in AI systems that reinforce gender stereotypes and ignore the inequities women face in everything from pay to childcare.

“If it is learning predominantly from men, does that cause these tools to potentially respond differently or be biased in ways that could have long-term effects?” Koning asks.

3 Likes

Typically in my world with AI medical charting that I am an advisor, for the Gen 2 products don’t hallucinate if built properly or not as much. Gen 1 products are bad. I lecture and teach doctors etc how to over ride the systems to avoid it (prompt engineering I guess lol) and most suggestions. We also teach how to build proper templates etc using normal language models. No coding unless you really want to.

My engineering son @MaineLonghorn said the same as you. His company is looking for a better built mouse trap but most equations etc end up wrong.

I look at AI as an assistant. I only use it for medical charting (notes in 30 seconds and 98% accurate after listening to me and my patients at the same time). I really don’t have other needs for it except for an outline for my lecture on
 AI. Lol.

There is a saying and is applied to most fields. “AI won’t replace doctor’s but doctor’s that don’t use AI will be replaced”.

A teacher patient of mine said she say the exact quote but applying to teachers at a conference.

3 Likes

This topic was automatically closed 180 days after the last reply. If you’d like to reply, please flag the thread for moderator attention.