affenlehrer

Understanding questions and summarizing information is a field where LLMs are quite good at. If they actually look up the sources and "read" the websites they are often able to give good answers. If they don't use tools and just answer from what they "remember", the information often contains hallucinations.

So from a user perspective I think search will get better for specific questions.

However, traffic to websites and all the things the LLMs omits are lost. If the LLMs gives you the answer you don't learn about the author of the information, the design of the website, the nuanced and maybe thoughtful story the author built around the information and all the other stuff the author put there.

The models and methods are improving. Especially through tool use (Internet search, MCP, using programming languages) the model output improves a lot. Reasoning models are allowed to admit mistakes (during thinking) "wait, that's wrong" (in normal conversation they will never say that if you don't point it at the mistake). Otherwise they basically predict tokens, the inference engine selects one and they go with what was selected.

It's a bit like you remember some wrong information (Mandela effect), you're confident it's correct so you don't double check and go with it. They usually don't even know how confident they are, they have no introspection.

In software development scenarios LLMs, due to their high "compression", often hallucinate (misremember) methods and parameters that don't exist in APIs or in different APIs or they don't know about new versions of the API. Many of those errors are catched when the code doesn't compile or unit tests fail but some of them stay (e.g. if the model created the unit tests and they don't test what they're supposed to test).

Also a bit like humans the models often don't have the whole codebase in the context so they make assumptions about the rest. Since they have no introspection they often don't double check if those assumptions are correct.

In case of frontend design they often can't "see" the output or at least not in the way way we do. They don't really know if something looks "good" or not (depending on their training).

Verification with other agents can help but fundamentally these agents have the same issues. It's a workaround.

I'm actually not sure if the bubble will pop. I believe LLMs can be useful in some fields but they're but the path to AGI. They're also way too resource intensive and used in a lot of situations where it's dangerous or doesn't make sense.

However, it's not that the AI researchers don't know about the limitations. They trying to work around the issues of the LLMs with some success for years now and they kind of have to because they kind of work and bring publicity. Behind the scenes the AI craze also brought money for research into different directions with other fundamentals. E.g. JEPA, world models, diffusion models, logic based models, energy based models, small recursive models and a lot of optimizations to make things faster and cheaper to compute.

The bubble could pop if one of the major companies does something stupid and their stock tanks but as long money is pumped in there, there is also actually progress in new fundamentals and if they are developed before the bubble popped we might get "real" AI or AGI.

It can't not hallucinate. It's just predicting (not even selecting) next tokens. It doesn't know what it knows and what it doesn't know. It can't introspect. It just gives probabilities for all possible tokens in it's vocabulary based on the context window and the inference engine selects the next one (based on it's settings). Without having the correct answer in the context window it can just make a prediction based on it's (fixed) neutral net parameters and these are severely limited, even for big models. What I mean is, they basically "learned" the whole Internet and compressed the whole thing into some hundred billion or a few trillion parameters. That's an insane compression ratio. This compression is lossy. For niece information and the results are similar to the "unimportant" details in highly compressed JPGs, you can make out the general image but fine details are just a mush. The LLM itself doesn't know this, it just gives wrong predictions.

For what it does I think the result is extremely impressive but the way it works is severely limited.

Not sure if was there from the beginning but it was originally developed by Linus Torvalds and he can be quite harsh to the Linux contributors.

I'm not so sure. It's possible but I believe it's quite hard to find a vampire template where the head and hands fit so well. Also one of the hands comes from the chest and looks weird

Thank you. I have to say I'm a bit disappointed but I'm glad your cat is happy and well!

Our cat is around 15 years but very sick unfortunately.

Please tell me about the cat on Adderall. I need to know.

Bamboo raids the chat

Yeah and allowing it specifically adds goblin analogies to pretty much anything you talk about, at least in my experience. I kinda like it though

I usually allow it to speak about goblins

I've had similar issues when trying to order stuff from NL with a German Postleitzahl

Time to hoard toilet paper and ivermectin I guess

Do something or if your comfort zone. Maybe something where you have concentrate fully on what you're doing on the moment, like climbing or downhill mountain biking or something like that but stay safe (instructor, safety equipment etc)

The crafting system was pretty shit.

What tune? Is this something like peanut butter jelly time or banana phone?

affenlehrer

@ affenlehrer @feddit.org

Posts

5
Comments

695
Joined

1 yr. ago

affenlehrer

The future

YouTube really showing top quality in recent update

YouTube really showing top quality in recent update

YouTube really showing top quality in recent update

Even my own code sometimes.

crap

Mistakes were made [AzulCrescent]

Which grass?

Mistakes were made [AzulCrescent]

Mint

Sayonara

Ladies and gentleman, we have reached peak Agentic AI Coding - Goblin instructions in OpenAI's Codex system prompt

Ladies and gentleman, we have reached peak Agentic AI Coding - Goblin instructions in OpenAI's Codex system prompt

Cant order new shiny things

Sesame Street would make a lot more sense if Cookie Monster was rebranded as Coffee Monster

WHO head tells countries to prepare for more hantavirus cases

Have you ever just felt so bored with life that you wanna die?

Anon collects minerals

Put tha lime in tha coconut

Playtime

Amazon demands proof of productivity from employees

Amazon demands proof of productivity from employees

Chinese products and brands

Do xenomorphs, if prepared correctly, taste like shrimp?

"Trump Derangement Syndrome" as a mental illness