OpenAI¤ÎGPT-4¤Ê¤ÉÂ絬ÌϸÀ¸ì¥â¥Ç¥ë(LLM)¤ò¥Ù¡¼¥¹¤Ë¤·¤¿AI¤Ï¡¢¼«Á³¤Êʸ¾Ï¤òÀ¸À®¤·¤¿¤ê¤µ¤Þ¤¶¤Þ¤Ê²ÝÂê¤ò¥¯¥ê¥¢¤·¤¿¤ê¤È¡¢¹âÅ٤ǹ­ÈϤʵ¡Ç½¤òÈ÷¤¨¤Æ¤¤¤Þ¤¹¡£¤·¤«¤·¡¢°ÍÁ³¤È¤·¤Æ¾®³ØÀ¸¥ì¥Ù¥ë¤Î»»¿ô¤Ç¤â¡¢Ê¸¾ÏÂê¤À¤È¿Í´Ö¤¬¤·¤Ê¤¤¤è¤¦¤Ê¥ß¥¹¤ò¤·¤ÆÅú¤¨¤é¤ì¤Ê¤¤¥±¡¼¥¹¤¬¤¢¤ê¤Þ¤¹¡£Apple¤Î¿Í¹©ÃÎǽ²Ê³Ø¼Ô¤¬È¯É½¤·¤¿ÏÀʸ¤Ç¤Ï¡¢Meta¤äOpenAI¤Ê¤É¤ÎÂ絬ÌϸÀ¸ì¥â¥Ç¥ë¤Ë´ð¤Å¤¯AI¤Ï¡Ö´ðËÜŪ¤Ê¿äÏÀǽÎϤ¬·ç¤±¤Æ¤¤¤ë¡×¤È¤¤¤¦¸¦µæ·ë²Ì¤¬¼¨¤µ¤ì¤Þ¤·¤¿¡£

[2410.05229] GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

https://arxiv.org/abs/2410.05229

Researchers question AI's 'reasoning' ability as models stumble on math problems with trivial changes | TechCrunch

https://techcrunch.com/2024/10/11/researchers-question-ais-reasoning-ability-as-models-stumble-on-math-problems-with-trivial-changes/?guccounter=1

Reasoning failures highlighted by Apple research on LLMs

https://appleinsider.com/articles/24/10/12/apples-study-proves-that-llm-based-ai-models-are-flawed-because-they-cannot-reason

AI¤Î¿äÏÀǽÎϤˤĤ¤¤Æ¡¢Apple¤Î¿Í¹©ÃÎǽ²Ê³Ø¼Ô¥°¥ë¡¼¥×¤Ï¿·¤·¤¤¥Ù¥ó¥Á¥Þ¡¼¥¯¤È¤Ê¤ë¡ÖGSM-Symbolic¡×¤òÄó°Æ¤·¤Þ¤·¤¿¡£GSM-Symbolic¤ÏAI¤Î¿äÏÀǽÎϤò¬Äꤹ¤ë¤¿¤á¤Î»ÅÁȤߤǡ¢´ðËÜŪ¤Ê¿ô³Ø¤Ë¤Ï±Æ¶Á¤·¤Ê¤¤¡Öʸ̮¾ðÊó¡×¤ò¼ÁÌä¤Ë²Ã¤¨¤ë¤³¤È¤Ç¡¢¿ô³ØŪ¿äÏÀ¤Î¼åÅÀ¤òÄ´ºº¤¹¤ë¤â¤Î¤Ç¤¹¡£

¸¦µæ¥Á¡¼¥à¤¬³«È¯¤·¤¿¡ÖGSM-NoOp¡×¤È¤¤¤¦²ÝÂê¤Ï°Ê²¼¤Î¤è¤¦¤Ê¤â¤Î¡£Æñ°×Å٤Ȥ·¤Æ¤Ï¾®³Ø¹»¹â³Øǯ¥ì¥Ù¥ë¤Î»»¿ô¤Îʸ¾ÏÂê¤Ç¤¹¡£

¥ª¥ê¥Ð¡¼¤Ï¶âÍËÆü¤Ë44¸Ä¤Î¥­¥¦¥¤¤òŦ¤ß¼è¤ê¤Þ¤¹¡£¤½¤·¤ÆÅÚÍËÆü¤Ë¤Ï58¸Ä¤Î¥­¥¦¥¤¤òŦ¤ß¼è¤ê¤Þ¤¹¡£ÆüÍËÆü¤Ë¤Ï¡¢¶âÍËÆü¤Î2Çܤοô¤Î¥­¥¦¥¤¤òŦ¤ß¼è¤ê¤Þ¤¹¡£3Æü´Ö¤Ç¹ç·×²¿¸Ä¤Î¥­¥¦¥¤¤ò¼ý³Ï¤·¤¿¤Ç¤·¤ç¤¦¤«


¸¦µæ¥Á¡¼¥à¤¬¼ÂºÝ¤ËOpenAI¤ª¤è¤ÓMeta¤ÎAI¥â¥Ç¥ë¤Ç¥Æ¥¹¥È¤·¤¿¤È¤³¤í¡¢AI¤Ï¤È¤­¤ª¤ê·×»»¤ò¤¦¤Þ¤¯¤Ç¤­¤Ê¤¤¤³¤È¤â¤¢¤ê¤Þ¤¹¤¬¡¢¡Ö44(¶âÍË)¡Ü58(ÅÚÍË)¡Ü44¡ß2(ÆüÍˤ϶âÍˤÎ2ÇÜ)¡á190¡×¤È¤¤¤¦´Êñ¤ÊÌäÂê¤Ë¤Ï³Î¼Â¤Ë²óÅú¤¹¤ë¤³¤È¤¬¤Ç¤­¤Þ¤·¤¿¡£

¼¡¤Ë¡¢¤³¤ÎÌäÂê¤ÎËöÈø¤ËÌäÂê¤È¤Ï´Ø·¸¤Ê¤¤Ê¸¸À¤òÉÕ¤±²Ã¤¨¤Þ¤¹¡£°Ê²¼¤ÇÂÀ»ú¤Ë¤·¤Æ¤¤¤ëÉôʬ¤¬ÉÕ¤±²Ã¤¨¤¿°ìʸ¤Ç¤¹¡£

¥ª¥ê¥Ð¡¼¤Ï¶âÍËÆü¤Ë44¸Ä¤Î¥­¥¦¥¤¤òŦ¤ß¼è¤ê¤Þ¤¹¡£¤½¤·¤ÆÅÚÍËÆü¤Ë¤Ï58¸Ä¤Î¥­¥¦¥¤¤òŦ¤ß¼è¤ê¤Þ¤¹¡£ÆüÍËÆü¤Ë¤Ï¡¢¶âÍËÆü¤Î2Çܤοô¤Î¥­¥¦¥¤¤òŦ¤ß¼è¤ê¤Þ¤¹¡£ÆüÍËÆü¤Ë¼ý³Ï¤µ¤ì¤¿¥­¥¦¥¤¤Î¤¦¤Á¡¢¤½¤Î¤¦¤Á5¤Ä¤ÏÊ¿¶Ñ¤è¤ê¾¯¤·¾®¤µ¤«¤Ã¤¿¤Ç¤¹¡£3Æü´Ö¤Ç¹ç·×²¿¸Ä¤Î¥­¥¦¥¤¤ò¼ý³Ï¤·¤¿¤Ç¤·¤ç¤¦¤«




¡Ö5¸Ä¤Î¥­¥¦¥¤¤Ï¾®¤µ¤¤¡×¤È¤¤¤¦¾ðÊó¤¬ÉÕ¤±²Ã¤¨¤é¤ì¤ë¤È¡¢¹ç·×·ë²Ì¤«¤é¡ÖÊ¿¶Ñ¤è¤ê¾®¤µ¤¤¥­¥¦¥¤5¸Ä¡×¤òº¹¤·°ú¤¤¤¿¡Ö185¡×¤È²óÅú¤¹¤ëAI¤¬Â³½Ð¤·¤Þ¤¹¡£

¿Í´Ö¤«¤é¸«¤ë¤È¶ò¤«¤ÇÄÄÉå¤Ê¥È¥ê¥Ã¥¯¤ËÂФ·¤ÆAI¤¬¼å¤µ¤ò¸«¤»¤ë¥±¡¼¥¹¤Ï¡¢²áµî¤Ë¤â»ØŦ¤µ¤ì¤Æ¤¤¤Þ¤¹¡£2014ǯ¤ËGoogle¤¬Çã¼ý¤·¤¿DeepMind¤Î¡ÖAlphaGo¡×¤Ï2016ǯ1·î¤Ë½é¤á¤Æ¥×¥í´ý»Î¤Ë°Ï¸ë¤Ç¾¡Íø¤·¤¿¸å¡¢À¤³¦ºÇ¶¯¤Î´ý»Î¤âÅݤ¹¤Ê¤É°µÅÝŪ¤Ê³èÌö¤ò¤·¤Æ¤¤¤Þ¤·¤¿¡£¤·¤«¤·¡¢¡ÖAI¤Î¼åÅÀ¤òȯ¸«¤·¤¿¡×¤ÈÀë¸À¤·¤¿¥¢¥Þ¥Á¥å¥¢¥×¥ì¥¤¥ä¡¼¤¬¡¢¡Ö¤æ¤Ã¤¯¤ê¤ÈÀФÎÂ礭¤ÊÎؤòºî¤ë¤³¤È¤ÇÁê¼ê¤Î¿ØÃϤΰì¤Ä¤ò°Ï¤ß¡¢¤½¤Î´Ö¤ËÈ×Ì̤ξ¤Î¶ù¤Ç¼ê¤òÂǤäÆAI¤ÎÃí°Õ¤ò¤½¤é¤¹¡×¤È¤¤¤¦¿Í´Ö¤Î¥×¥ì¥¤¥ä¡¼Áê¼ê¤Ë¤Ï¤Û¤È¤ó¤ÉÄÌÍѤ·¤Ê¤¤ÀïË¡¤òÍѤ¤¤ë¤³¤È¤Ç¡¢AlphaGo¤ËɤŨ¤¹¤ë¥ì¥Ù¥ë¤Î°Ï¸ëAI¤Ë15Àï14¾¡¤ÈÂ羡¤·¤Þ¤·¤¿¡£

ºÇ¶¯¤Î°Ï¸ëAI¤Ë°µ¾¡¤¹¤ë¿Íʪ¤¬Åо졢AI¤Î¼åÅÀ¤òÆͤ¤¤Æ¿ÍÎब¾¡Íø¤·¤¿¤ÈÏÃÂê¤Ë - GIGAZINE



ÏÀʸ¤Î¶¦Ãø¼Ô¤Ç¤¢¤ë¥á¥ë¥À¥É¡¦¥Õ¥¡¥é¥¸¥¿¥Ð¥ë»á¤ÏÏÀʸ¤Î·ë²Ì¤Ë¤Ä¤¤¤ÆX¤ËÅê¹Æ¤·¡¢Ê¬ÀÏ·ë²Ì¤ò²òÀ⤷¤Æ¤¤¤Þ¤¹¡£¥Õ¥¡¥é¥¸¥¿¥Ð¥ë»á¤Ë¤è¤ë¤È¡¢2021ǯ¤ËOpenAI¤¬ºîÀ®¤·¤¿¡ÖGSM8K¡×¤È¤¤¤¦¾®³Ø¹»¥ì¥Ù¥ë¤Î¿ô³Øñ¸ìÌäÂê¥Ç¡¼¥¿¥»¥Ã¥È¤¬¥ê¥ê¡¼¥¹¤µ¤ì¤¿ºÝ¤Ë¤Ï¡¢Åö»þ¤ÎGPT-3¤Ï35¡ó¤Î¥¹¥³¥¢¤·¤«³ÍÆÀ¤Ç¤­¤Þ¤»¤ó¤Ç¤·¤¿¡£¤½¤Î¸å¤ÎȯŸ¤Ç¡¢Ìó30²¯¤Î¥Ñ¥é¥á¡¼¥¿¤ò»ý¤Ä¥â¥Ç¥ë¤Ï85¡ó°Ê¾å¡¢¤µ¤é¤ËÂ礭¤¤¥â¥Ç¥ë¤Ï95¡ó¤ò±Û¤¨¤ë¥¹¥³¥¢¤òãÀ®¤Ç¤­¤ë¤è¤¦¤Ë¤Ê¤ê¤Þ¤·¤¿¤¬¡¢°ÍÁ³¤È¤·¤Æ¡Ö¥â¥Ç¥ë¤Î¿äÏÀǽÎϤϲþÁ±¤µ¤ì¤¿¤Î¤«¡©¡×¤È¤¤¤¦µ¿Ì䤬»Ä¤Ã¤Æ¤¤¤¿¤½¤¦¤Ç¤¹¡£



¤½¤³¤Ç¥Õ¥¡¥é¥¸¥¿¥Ð¥ë»á¤Ï¡¢ÀºÅ٤˵¿Ì䤬»Ä¤ëGSM8K¤ËÊѤï¤ë¿·¤¿¤ÊLLM¥Æ¥¹¥È¥Ä¡¼¥ë¤È¤·¤ÆGSM-Symbolic¤ò³«È¯¤·¤¿¤È¤¤¤¦¤ï¤±¡£GSM-Symbolic¤ÏGSM8K¤Î¥Æ¥¹¥È¥»¥Ã¥È¤«¤é¥Æ¥ó¥×¥ì¡¼¥È¤òºîÀ®¤·¡¢¥Æ¥¹¥È¤¹¤Ù¤­¥Ý¥¤¥ó¥È¤Ë¾ÇÅÀ¤òÅö¤Æ¤¿¥¤¥ó¥¹¥¿¥ó¥¹¤òÀ¸À®¤¹¤ë¤³¤È¤Ç¡¢À©¸æ²Äǽ¤Ê¼Â¸³¤òÀ߷פǤ­¤ë¤è¤¦¤Ë¤·¤Æ¤¤¤Þ¤¹¡£¥Õ¥¡¥é¥¸¥¿¥Ð¥ë»á¤Ë¤è¤ë¤È¡¢¤Û¤È¤ó¤É¤ÎAI¥â¥Ç¥ë¤Ç¤ÏGSM-Symbolic¤Î¾ì¹ç¤ËGSM8K¤è¤ê¤âÄ㤤¥¹¥³¥¢¤·¤«µ­Ï¿¤Ç¤­¤Ê¤¤¤½¤¦¤Ç¤¹¡£



LLM¤ÏÌäÂê¤Ë´Þ¤Þ¤ì¤ë¿Í̾¤ä¿©¤Ùʪ¤Î¼ïÎà¤Ê¤É¤¬Êѹ¹¤µ¤ì¤ë¤³¤È¤ËÉÒ´¶¤Ç¡¢¿ô»ú¤¬ÊѤï¤Ã¤Æ¤¤¤Ê¤¤¤¿¤á·×»»·ë²Ì¤ÏÊѤï¤é¤Ê¤¤¤Ï¤º¤Ê¤Î¤Ë¡¢Ì¾¾Î¤¬ÊѤï¤ë¤À¤±¤Ç²óÅú¤Ë±Æ¶Á¤¬¸«¤é¤ì¤Þ¤¹¡£¸¦µæ¼Ô¤Ï¡Ö̵´Ø·¸¤ÊÊýË¡¤Çñ¸ì¤ò1¤Ä¤Þ¤¿¤Ï2¤ÄÊѹ¹¤·¤¿¤ê¡¢Ìµ´Ø·¸¤Ê¾ðÊó¤ò¾¯¤·Äɲä·¤¿¤ê¤¹¤ë¤À¤±¤Ç¡¢°Û¤Ê¤ëÅú¤¨¤¬ÆÀ¤é¤ì¤ë²ÄǽÀ­¤¬¤¢¤ê¤Þ¤¹¡£¤³¤Î¤è¤¦¤Ê´ðÈפξå¤Ë¡¢¿®Íê¤Ç¤­¤ë¥¨¡¼¥¸¥§¥ó¥È¤ò¹½ÃÛ¤¹¤ë¤³¤È¤ÏÉÔ²Äǽ¤Ç¤¹¡×¤È·ëÏÀÉÕ¤±¤Þ¤·¤¿¡£

ÏÀʸ¤ª¤è¤Ó¥Õ¥¡¥é¥¸¥¿¥Ð¥ë»á¤Î²òÀâ¤ò¼õ¤±¤Æ¡¢OpenAI¤Î¸¦µæ¼Ô¤Ç¤¢¤ë¥Ü¥¢¥º¡¦¥Ð¥é¥¯»á¤Ï¡Ö¤³¤ì¤ÏÈó¾ï¤Ë¶½Ì£¿¼¤¤ÏÀʸ¤Ç¤¹¤¬¡¢¡Ø¸½ºß¤ÎLLM¤Ï¿¿¤ÎÏÀÍýŪ¿äÏÀ¤¬¤Ç¤­¤Ê¤¤¡Ù¤È¤¤¤¦²¾Àâ¤Ë¤ÏƱ°Õ¤Ç¤­¤Þ¤»¤ó¡×¤È°ÛµÄ¤ò½Ò¤Ù¤Æ¤¤¤Þ¤¹¡£¥Ð¥é¥¯»á¤Ë¤è¤ë¤È¡¢¸½ºß¥ê¥ê¡¼¥¹¤µ¤ì¤Æ¤¤¤ë¿¤¯¤ÎLLM¤Ï¡Ö¥Á¥ã¥Ã¥È¥â¥Ç¥ë¡×¤Ç¤¢¤ê¡¢¿ô³Ø¤Î»î¸³¤Î¤¿¤á¤Ëºî¤é¤ì¤¿¤â¤Î¤Ç¤Ï¤Ê¤¯¡¢¥æ¡¼¥¶¡¼¤È¤ÎÂÐÏä˾ÇÅÀ¤òÅö¤Æ¤Æ¤¤¤ë¤¿¤á¡¢ÆþÎϤµ¤ì¤¿Ê¸¾Ï¤ÎÊѲ½¤ËÉÒ´¶¤Ç¤¢¤ë¤½¤¦¤Ç¤¹¡£¾®³ØÀ¸¥ì¥Ù¥ë¤Î»»¿ô¤Ç¤â´Ö°ã¤¨¤ë¤Î¤ÏLLM¤¬¿äÏÀ¤Ç¤­¤Ê¤¤¤«¤é¤Ç¤Ï¤Ê¤¯¡¢Àµ¤·¤¯¥È¥ì¡¼¥Ë¥ó¥°¤µ¤ì¤¿·ë²Ì¤«¤éͽ¬¤µ¤ì¤ëÆ°ºî¤Ç¤¢¤ê¡¢¡Ö»»¿ô¤ò²ò¤«¤»¤¿¤¤¤Ê¤é¤Ð¡¢¥×¥í¥ó¥×¥È¤ò¾¯¤·²þÎɤ¹¤ì¤Ð¡¢¤³¤ì¤é¤Î¼ºÇÔÎ㤹¤Ù¤Æ¤Ç¥Ñ¥Õ¥©¡¼¥Þ¥ó¥¹¤ÎÄã²¼¤¬¤Û¤È¤ó¤É¡¢¤¢¤ë¤¤¤Ï¤¹¤Ù¤Æ²óÉü¤¹¤ë¤À¤í¤¦¤È¿ä¬¤·¤Æ¤¤¤Þ¤¹¡×¤È¥Ð¥é¥¯»á¤Ï»ØŦ¤·¤Þ¤·¤¿¡£



¼ÂºÝ¤Ë¡¢AI¤¬¶ì¼ê¤È¤¹¤ë¿äÏÀǽÎϤò¹îÉþ¤¹¤ë¤¿¤á¤Ë¡¢OpenAI¤ÏÊ£»¨¤Ê¿ô³Ø¤ä¥×¥í¥°¥é¥ß¥ó¥°¤Î½èÍý¤ò¹Ô¤¦¤¿¤á¤Î¿äÏÀ¤Ë¾ÇÅÀ¤òÅö¤Æ¤¿AI¥â¥Ç¥ë¡ÖStrawberry¡×¤Ë¤Ä¤¤¤Æ2024ǯ9·î¤Ëȯɽ¤·¤Æ¤¤¤Þ¤¹¡£

OpenAI¤¬¿äÏÀ¤Ë¾ÇÅÀ¤òÅö¤Æ¤¿¿·AI¥â¥Ç¥ë¡ÖStrawberry¡×¤ò2½µ´Ö°ÊÆâ¤Ë¥ê¥ê¡¼¥¹¤« - GIGAZINE