OpenAI¤¬AI¥Ù¥ó¥Á¥Þ¡¼¥¯¡ÖSWE-Lancer¡×¤ò¸ø³«¡¢¥Õ¥ê¡¼¥é¥ó¥¹¥¨¥ó¥¸¥Ë¥¢¤Ë100Ëü¥É¥ë¤Ç°ÍÍꤹ¤ë¥ì¥Ù¥ë¤Î¥¿¥¹¥¯¤ò¤³¤Ê¤»¤ë¤«Â¬Äê

OpenAI¤¬2025ǯ2·î18Æü¤Ë¡¢AI¥â¥Ç¥ë¤Î¥³¡¼¥Ç¥£¥ó¥°Àǽ¤òɾ²Á¤¹¤ë¤¿¤á¤Î¥ª¡¼¥×¥ó¥½¡¼¥¹¤Î¥Ù¥ó¥Á¥Þ¡¼¥¯¡ÖSWE-Lancer¡×¤ò¸ø³«¤·¤Þ¤·¤¿¡£
[2502.12115] SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?
https://arxiv.org/abs/2502.12115
https://openai.com/index/swe-lancer/

Today we¡Çre launching SWE-Lancer-a new, more realistic benchmark to evaluate the coding performance of AI models. SWE-Lancer includes over 1,400 freelance software engineering tasks from Upwork, valued at $1 million USD total in real-world payouts. https://t.co/c3pFcL41uK— OpenAI (@OpenAI) February 18, 2025
SWE-Lancer¤Ï¡¢¥Õ¥ê¡¼¥é¥ó¥¹¤Î¥½¥Õ¥È¥¦¥§¥¢¥¨¥ó¥¸¥Ë¥¢¤¬Áí³ÛÌó100Ëü¥É¥ë(Ìó1²¯5000Ëü±ß)¤Ç¼õ¤±¤ë¥¿¥¹¥¯¤òAI¤¬¼Â¹Ô¤Ç¤¤ë¤«¤É¤¦¤«Â¬Äꤹ¤ë¥Ù¥ó¥Á¥Þ¡¼¥¯¥Ä¡¼¥ë¤Ç¡¢50¥É¥ë(Ìó7500±ß)ÁêÅö¤Î¥Ð¥°½¤Àµ¤«¤é3Ëü2000¥É¥ë(Ìó480Ëü±ß)ÁêÅö¤Îµ¡Ç½¼ÂÁõ¤Þ¤Ç¡¢ÆÈΩ¤·¤¿¥¨¥ó¥¸¥Ë¥¢¥ê¥ó¥°¥¿¥¹¥¯¤È¡¢¥â¥Ç¥ë¤¬µ»½ÑŪ¤Ê¼ÂÁõ°Æ¤òÁªÂò¤¹¤ë´ÉÍý¥¿¥¹¥¯¤ÎξÊý¤ò¥Æ¥¹¥È¤¹¤ë¤³¤È¤¬²Äǽ¤Ç¤¹¡£
SWE-Lancer tasks span the full engineering stack, from UI/UX to systems design, and include a range of task types, from $50 bug fixes to $32,000 feature implementations. SWE-Lancer includes both independent engineering tasks and management tasks, where models choose between¡Ä pic.twitter.com/3Dg8bjHOSk— OpenAI (@OpenAI) February 18, 2025
SWE-Lancer¤Ç¬Äꤵ¤ì¤ë¥¿¥¹¥¯²Á³Ê¤Ï¡¢¼ÂºÝ¤Î»Ô¾ì²ÁÃͤòÈ¿±Ç¤·¤Æ¤ª¤ê¥¿¥¹¥¯¤¬Æñ¤·¤±¤ì¤ÐÆñ¤·¤¤¤Û¤É²Á³Ê¤â¾å¾º¤·¤Þ¤¹¡£
SWE-Lancer task prices reflect real-world market value. Harder tasks demand higher payments. pic.twitter.com/0FGWm88RE8— OpenAI (@OpenAI) February 18, 2025
OpenAI¤Ï¡ÖSWE-Lancer¤òÍѤ¤¤ÆAI¥â¥Ç¥ë¤Î¥Ñ¥Õ¥©¡¼¥Þ¥ó¥¹¤ò¬Äꤷ¤¿¤È¤³¤í¡¢¸½ºß¤ÎAI¥â¥Ç¥ë¤Ï¤Þ¤ÀÂçÉôʬ¤Î¥¿¥¹¥¯¤ò²ò·è¤¹¤ë¤³¤È¤¬¤Ç¤¤Þ¤»¤ó¤Ç¤·¤¿¡×¤ÈÊó¹ð¤·¤Æ¤¤¤Þ¤¹¡£¼ÂºÝ¤ËOpenAI¤¬·ÇºÜ¤·¤¿ÏÀʸ¤Ç¤Ï¡¢100Ëü¥É¥ëÁêÅö¤Î¥¿¥¹¥¯¤ËÂФ·¡¢GPT-4o¡¦o1¡¦Claude 3.5 Sonnet¤¬¿ë¹Ô¤Ç¤¤¿¥¿¥¹¥¯¤ÏÌó30Ëü¥É¥ë(Ìó4500Ëü±ß)¡Á40Ëü¥É¥ë(Ìó6000Ëü±ß)ÁêÅö¤À¤Ã¤¿¤³¤È¤¬¼¨¤µ¤ì¤Æ¤¤¤Þ¤¹¡£
Current frontier models are unable to solve the majority of tasks. pic.twitter.com/GP3C3UR3cB— OpenAI (@OpenAI) February 18, 2025
OpenAI¤Ï¡Ö¥â¥Ç¥ë¤Î¥Ñ¥Õ¥©¡¼¥Þ¥ó¥¹¤ò¶âÁ¬Åª²ÁÃͤ˥ޥåԥ󥰤¹¤ë¤³¤È¤Ç¡¢SWE-Lancer¤¬AI¥â¥Ç¥ë³«È¯¤Î·ÐºÑŪ±Æ¶Á¤Ë´Ø¤¹¤ë¤è¤ê¿¤¯¤Î¸¦µæ¤ò²Äǽ¤Ë¤¹¤ë¤³¤È¤ò´ê¤Ã¤Æ¤¤¤Þ¤¹¡×¤È½Ò¤Ù¤Æ¤¤¤Þ¤¹¡£
¤Þ¤¿¡¢OpenAI¤Ï¾ÍèŪ¤Ê¸¦µæ¤ËÌòΩ¤Æ¤ë¤¿¤á¤ËSWE-Lancer¤ò¥ª¡¼¥×¥ó¥½¡¼¥¹²½¤·¤Æ¤¤¤Þ¤¹¡£SWE-Lancer¤Î¥½¡¼¥¹¥³¡¼¥É¤ÏGitHub¤Ç³Îǧ²Äǽ¤Ç¤¹¡£
GitHub - openai/SWELancer-Benchmark: This repo contains the dataset and code for the paper "SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?"
https://github.com/openai/SWELancer-Benchmark
