What do you think AI still struggles with the most?
Sean Parker
4 replies
I’m not exactly a prompt engineer, but I’ve managed to get decent results. Yet, whether it’s few-shot or chat-based, there are still times when no amount of prompt tweaking can get AI to match human-level quality or deliver the desired outcome. What do you think are some cases where AI just can’t quite hit the mark?
Replies
Ben Syverson@bensyverson
Mattebox
Yeah, it's strange—sometimes AI falls down on stuff that feels like common sense. I've spent a ton of time over the past 12 months using LLMs to do categorization. What I've learned is that you can get 80% results pretty easily, but if you need 99%, it's nearly impossible.
Take a simple example, such as categorizing products in Product Hunt to separate B2C and B2B SaaS based on description. This will require extensive experimentation just to get to 80%. Some observations:
🤷♂️ Giving it lots of descriptive categories can be better than giving it a binary choice. Rather than just "B2B" and "B2C," you'll probably see better results with concrete categories (a long list of "Business scheduling software", "Personal scheduling software", etc).
💭 Pseudo chain-of-thought can help. For example, having the AI generate a thought about what category something might be and why, then having it down select to a few candidates, then have it finally make its selection (all in the same response).
⚖️ Occasionally adding a "judge" pass can help. The first LLM makes the call, and the second one critiques it. It can help to have the judge generate arguments for & against before rendering its verdict.
💡 Another approach is to have tell the LLM to categorize the product, but don't give it the categories. Then have a second LLM map the generated category to a list of valid categories (something it does pretty well).
Anyway, I guess the takeaway is don't get me started on categorization. 😅
Share
AI is learning right now but soon it may outpace human in almost everything.
Hey there! Honestly, I think AI still struggles with understanding context and nuance, especially in creative writing or sarcasm. It's like trying to teach a robot to get your jokes—some things just need that human touch! 😂