Mon. Dec 23rd, 2024
Github's Ai Powered Coding Chatbot Now Available To Individuals « Machine

It was first published strange loop canon, April 23, 2024

On-gold lift and reduced reliability. Or why can’t LLMs play Conway’s game of life?

Over the past few years, whenever a problem has arisen that LLM cannot perform, it has successfully solved the problem. But even though they passed with flying colors, they still couldn’t answer the seemingly easy questions, and it’s unclear why.

So for the past few weeks, I’ve been obsessed with figuring out failure modes in LLMs. This started with me exploring what I found. It’s definitely a little suspicious, but I think it’s interesting. AI failures tell us more about what AI can do than successes.

The starting point was much larger: the many jobs that LLMs would end up doing needed to be evaluated task by task. But then I started asking myself how I could find out the limits of that reasoning ability so that I could trust its learning ability.

As I’ve written many times, LLMs are difficult, and it’s difficult to separate their reasoning abilities from what they’re training for. So I wanted to find a way to test my ability to repeatedly reason and answer questions.

To continue reading this article, click here.