Apple researchers find LLMs rely more on pattern-matching than logical reasoning, struggling with complex questions.

Apple researchers have raised concerns about the mathematical reasoning abilities of large language models (LLMs), finding that their responses vary significantly based on slight input changes. This suggests LLMs rely more on probabilistic pattern-matching than true logical reasoning. To better assess these capabilities, they introduced the GSM-Symbolic benchmark, revealing that LLMs struggle with complex questions, highlighting their limitations in reliable reasoning.

October 11, 2024
8 Articles