OpenAI¤¬2025ǯ2·î18Æü¤Ë¡¢AI¥â¥Ç¥ë¤Î¥³¡¼¥Ç¥£¥ó¥°À­Ç½¤òɾ²Á¤¹¤ë¤¿¤á¤Î¥ª¡¼¥×¥ó¥½¡¼¥¹¤Î¥Ù¥ó¥Á¥Þ¡¼¥¯¡ÖSWE-Lancer¡×¤ò¸ø³«¤·¤Þ¤·¤¿¡£

[2502.12115] SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?

https://arxiv.org/abs/2502.12115

Introducing the SWE-Lancer benchmark | OpenAI

https://openai.com/index/swe-lancer/







SWE-Lancer¤Ï¡¢¥Õ¥ê¡¼¥é¥ó¥¹¤Î¥½¥Õ¥È¥¦¥§¥¢¥¨¥ó¥¸¥Ë¥¢¤¬Áí³ÛÌó100Ëü¥É¥ë(Ìó1²¯5000Ëü±ß)¤Ç¼õ¤±¤ë¥¿¥¹¥¯¤òAI¤¬¼Â¹Ô¤Ç¤­¤ë¤«¤É¤¦¤«Â¬Äꤹ¤ë¥Ù¥ó¥Á¥Þ¡¼¥¯¥Ä¡¼¥ë¤Ç¡¢50¥É¥ë(Ìó7500±ß)ÁêÅö¤Î¥Ð¥°½¤Àµ¤«¤é3Ëü2000¥É¥ë(Ìó480Ëü±ß)ÁêÅö¤Îµ¡Ç½¼ÂÁõ¤Þ¤Ç¡¢ÆÈΩ¤·¤¿¥¨¥ó¥¸¥Ë¥¢¥ê¥ó¥°¥¿¥¹¥¯¤È¡¢¥â¥Ç¥ë¤¬µ»½ÑŪ¤Ê¼ÂÁõ°Æ¤òÁªÂò¤¹¤ë´ÉÍý¥¿¥¹¥¯¤ÎξÊý¤ò¥Æ¥¹¥È¤¹¤ë¤³¤È¤¬²Äǽ¤Ç¤¹¡£





SWE-Lancer¤Ç¬Äꤵ¤ì¤ë¥¿¥¹¥¯²Á³Ê¤Ï¡¢¼ÂºÝ¤Î»Ô¾ì²ÁÃͤòÈ¿±Ç¤·¤Æ¤ª¤ê¥¿¥¹¥¯¤¬Æñ¤·¤±¤ì¤ÐÆñ¤·¤¤¤Û¤É²Á³Ê¤â¾å¾º¤·¤Þ¤¹¡£





OpenAI¤Ï¡ÖSWE-Lancer¤òÍѤ¤¤ÆAI¥â¥Ç¥ë¤Î¥Ñ¥Õ¥©¡¼¥Þ¥ó¥¹¤ò¬Äꤷ¤¿¤È¤³¤í¡¢¸½ºß¤ÎAI¥â¥Ç¥ë¤Ï¤Þ¤ÀÂçÉôʬ¤Î¥¿¥¹¥¯¤ò²ò·è¤¹¤ë¤³¤È¤¬¤Ç¤­¤Þ¤»¤ó¤Ç¤·¤¿¡×¤ÈÊó¹ð¤·¤Æ¤¤¤Þ¤¹¡£¼ÂºÝ¤ËOpenAI¤¬·ÇºÜ¤·¤¿ÏÀʸ¤Ç¤Ï¡¢100Ëü¥É¥ëÁêÅö¤Î¥¿¥¹¥¯¤ËÂФ·¡¢GPT-4o¡¦o1¡¦Claude 3.5 Sonnet¤¬¿ë¹Ô¤Ç¤­¤¿¥¿¥¹¥¯¤ÏÌó30Ëü¥É¥ë(Ìó4500Ëü±ß)¡Á40Ëü¥É¥ë(Ìó6000Ëü±ß)ÁêÅö¤À¤Ã¤¿¤³¤È¤¬¼¨¤µ¤ì¤Æ¤¤¤Þ¤¹¡£





OpenAI¤Ï¡Ö¥â¥Ç¥ë¤Î¥Ñ¥Õ¥©¡¼¥Þ¥ó¥¹¤ò¶âÁ¬Åª²ÁÃͤ˥ޥåԥ󥰤¹¤ë¤³¤È¤Ç¡¢SWE-Lancer¤¬AI¥â¥Ç¥ë³«È¯¤Î·ÐºÑŪ±Æ¶Á¤Ë´Ø¤¹¤ë¤è¤ê¿¤¯¤Î¸¦µæ¤ò²Äǽ¤Ë¤¹¤ë¤³¤È¤ò´ê¤Ã¤Æ¤¤¤Þ¤¹¡×¤È½Ò¤Ù¤Æ¤¤¤Þ¤¹¡£

¤Þ¤¿¡¢OpenAI¤Ï¾­ÍèŪ¤Ê¸¦µæ¤ËÌòΩ¤Æ¤ë¤¿¤á¤ËSWE-Lancer¤ò¥ª¡¼¥×¥ó¥½¡¼¥¹²½¤·¤Æ¤¤¤Þ¤¹¡£SWE-Lancer¤Î¥½¡¼¥¹¥³¡¼¥É¤ÏGitHub¤Ç³Îǧ²Äǽ¤Ç¤¹¡£

GitHub - openai/SWELancer-Benchmark: This repo contains the dataset and code for the paper "SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?"

https://github.com/openai/SWELancer-Benchmark