General Discussion
Related: Editorials & Other Articles, Issue Forums, Alliance Forums, Region ForumsChatGPT Blows Mapmaking 101: A Comedy of Errors (Gary Marcus, Substack). And AI models flunk finance, too.
https://garymarcus.substack.com/p/chatgpt-blows-mapmaking-101I once again put it to the test. It was very good at some things, but not others. It was very good at giving me bullshit, my dog-ate-my-homework excuses, offering me a bar graph after I asked for a map, falsely claiming that it didnt know how to make maps.
A minute later, as I turned to a different question, I discovered that it turns out ChatGPT does know how to draw maps. Just not very well.
-snip-
The recent vals.ai financial benchmarks showed comparable failures reading basic financial reports. (On questions like Which Geographic Region has Airbnb (NASDAQ: ABNB) experienced the most revenue growth from 2022 to 2024? performance was near zero.)
-snip-
See the Substack article for the maps, errors, and chatbot responses.
The April 2025 financial benchmarks testing results Gary mentioned are at
https://www.vals.ai/benchmarks/finance_agent-04-22-2025
and they're dismal.
The foundation models are currently ill-suited to perform open-ended questions expected of entry-level finance analysts
A majority of models struggled with tool use in general, and more specifically for information retrieval, leading to inaccurate answers most notably the small models like Llama 4 Scout or Mistral Small 3.1.
Models on average performed best in the simple quantitative (37.57% average accuracy) and qualitative retrieval (30.79% average accuracy) tasks. These tasks are easy but time-intensive for finance analysts.
On our hardest tasks, the models perform much worse. Ten models scored 0% on the Trends task, and the best performance on this task was only 28.6% by Claude Sonnet 3.7.
-snip-
The report concluded that "none of the existing AI models exceed 50% accuracy" and they "still have a long way to go before they can be deployed reliably and trusted in the finance industry."
No kidding.
But according to the AI bros and those duped by them, we should
keep using generative AI models like this in every possible way,
change laws and regulations to give the AI companies whatever they want,
allow every bit of intellectual property to be used for free to train the AI,
accept degradation of education and the environment and the internet by genAI,
and approve and applaud hundreds of billions of dollars invested in AI
because it's AI!!!!
SheltieLover
(76,818 posts)highplainsdem
(60,014 posts)SheltieLover
(76,818 posts)Hugin
(37,440 posts)Itll get better! Its leeearning!
No, not really. Besides blaming the victim, most users arent even interacting with the teaching cycles other than their scraped texts, artwork, and personal data being used as source material. Thats an entirely different part of the process. Which are usually not interactive.
highplainsdem
(60,014 posts)hlthe2b
(112,818 posts)took from newspaper reports only. I know this because I know the case tallies in Colorado--which were not even included, even though they have been reported to CDC and confirmed. Even the May 2009 data on the CDC page is more up-to-date. (31 states had reported cases by May 9 and chatGPT reported only the 10 states with the most cases as of (supposedly) May 13. Worthless
SUCKS... Students that do not do their own research and depend on AI are going to fail (all of us), miserably.
highplainsdem
(60,014 posts)that even when the teachers can point to reliable sources proving the chatbot is wrong, the kids will still assume the chatbot is correct. Which is both sad and scary.