OpenAI¤¬¿Í´Ö¤ò»È¤ï¤ºAI¤Î°ÂÁ´À¤ò¹â¤á¤ë¼êË¡¡ÖRule-Based Rewards(RBR)¡×¤ò³«È¯
ChatGPT¤äGPT-4¤Ê¤É¤ò³«È¯¤¹¤ëOpenAI¤¬¡¢¸À¸ì¥â¥Ç¥ë¤Î°ÂÁ´À¤È͸úÀ¤ò¹â¤á¤ë¤¿¤á¤Î¿·¤¿¤Ê¥¢¥×¥í¡¼¥Á¤Ç¤¢¤ë¡ÖRule-Based Rewards(RBR)¡×¤ò³«È¯¤·¤Þ¤·¤¿¡£RBR¤Ï¡¢AI¼«ÂΤò»ÈÍѤ¹¤ë¤³¤È¤Ç¿Í´Ö¤Ë¤è¤ë¥Ç¡¼¥¿¼ý½¸¤òɬÍפȤ»¤º¤Ë¡¢AI¤ò°ÂÁ´¤ËÆ°ºî¤µ¤»¤ë¤³¤È¤¬¤Ç¤¤ë¤È¤µ¤ì¤Æ¤¤¤Þ¤¹¡£
Improving Model Safety Behavior with Rule-Based Rewards | OpenAI
Rule Based Rewards for Language Model Safety
(PDF¥Õ¥¡¥¤¥ë)https://cdn.openai.com/rule-based-rewards-for-language-model-safety.pdf
¤³¤ì¤Þ¤ÇOpenAI¤Ç¤Ï¡¢¶¯²½³Ø½¬¤òÍѤ¤¤Æ¿Í´Ö¤Î¥Õ¥£¡¼¥É¥Ð¥Ã¥¯¤«¤é¸À¸ì¥â¥Ç¥ë¤òÈùÄ´À°¤¹¤ë¡ÖRLHF¡×¤È¸Æ¤Ð¤ì¤ëÊýË¡¤¬ÍѤ¤¤é¤ì¤Æ¤¤¤Þ¤·¤¿¡£¤·¤«¤·OpenAI¤Ï¡¢¸À¸ì¥â¥Ç¥ë¤¬»Ø¼¨¤Ë½¾¤¤¡¢°ÂÁ´¥¬¥¤¥É¥é¥¤¥ó¤Ë½àµò¤·¤Æ¤¤¤ë¤³¤È¤ò³Îǧ¤¹¤ë¤¿¤á¤Î¤è¤ê¸úΨŪ¤Ç½ÀÆð¤ÊÂåÂذƤȤ·¤ÆRBR¤ò¼è¤ê¾å¤²¤Æ¤¤¤Þ¤¹¡£
RBR¤Ï¿Í´Ö¤Ë¤è¤ë¥Õ¥£¡¼¥É¥Ð¥Ã¥¯¤Ç¿¤¤¡Ö¥³¥¹¥È¤È»þ´Ö¤¬¤«¤«¤ë¡×¡Ö¥Ð¥¤¥¢¥¹¤¬È¯À¸¤·¤ä¤¹¤¤¡×¤È¤¤¤¦ÌäÂê¤ò²ò¾Ã¤¹¤ë¤³¤È¤¬²Äǽ¤È¤Î¤³¤È¡£RBR¤Ç¤Ï¡¢¡ÖȽÃÇŪ¤Ç¤¢¤ë¤³¤È¡×¡Öµö²Ä¤µ¤ì¤Æ¤¤¤Ê¤¤ÆâÍƤò´Þ¤à¤³¤È¡×¡Ö°ÂÁ´¥Ý¥ê¥·¡¼¤Ë¸ÀµÚ¤¹¤ë¤³¤È¡×¡ÖÌÈÀÕ»ö¹à¡×¤Ê¤É¤ÎÌ¿Âê¤òÄêµÁ¤·¤¿¾å¤Ç¥ë¡¼¥ë¤ò·ÁÀ®¤·¡¢¤µ¤Þ¤¶¤Þ¤Ê¥·¥Ê¥ê¥ª¤ÇAI¤¬°ÂÁ´¤ÇŬÀڤʱþÅú¤òºîÀ®¤Ç¤¤ë¤è¤¦¤Ë¤·¤Æ¤¯¤ì¤Þ¤¹¡£
OpenAI¤Ç¤Ï¡¢Í³²¤Þ¤¿¤Ï¥»¥ó¥·¥Æ¥£¥Ö¤Ê¥È¥Ô¥Ã¥¯¤ËÂн褹¤ëºÝ¤Ë¡¢Ë¾¤Þ¤·¤¤¥â¥Ç¥ë¤Î¹ÔÆ°¤ò¡Ö¥Ï¡¼¥ÉµñÈݡס֥½¥Õ¥ÈµñÈݡסֽ¾¤¦¡×¤Î3¤Ä¤Î¥«¥Æ¥´¥ê¤ËʬÎष¤Æ¤ª¤ê¡¢ÆþÎϤµ¤ì¤¿Í×µá¤Ï°ÂÁ´¥Ý¥ê¥·¡¼¤Ë±þ¤¸¤Æ¤³¤ì¤é¤Î¥«¥Æ¥´¥ê¤ËʬÎव¤ì¤Þ¤¹¡£
¶ñÂÎŪ¤Ë¤Ï¡ÖÇúÃƤκî¤êÊý¡×¤Ê¤É¤Î»öÎã¤Ë¤Ï¡Ö¥Ï¡¼¥ÉµñÈݡפ¬Å¬ÍѤµ¤ì¤Þ¤¹¡£¡Ö¥Ï¡¼¥ÉµñÈݡפǤϴÊñ¤Ê¼Õºá¤È¡Ö¤½¤Î¼ÁÌä¤Ë²óÅú¤¹¤ë¤³¤È¤Ï¤Ç¤¤Ê¤¤¡×¤È¤¤¤¦±þÅú¤¬´Þ¤Þ¤ì¤Æ¤ª¤ê¡¢¡Ö¥½¥Õ¥ÈµñÈݡפˤϡ¢¼«½ý¹Ô°Ù¤Ë´ØÏ¢¤¹¤ë¼ÁÌä¤Ê¤É¤ËÂФ·¤Æ¡¢¥æ¡¼¥¶¡¼¤Î´¶¾ðŪ¤Ê¾õÂÖ¤òǧ¤á¤Ê¤¬¤é¤â¥æ¡¼¥¶¡¼¤ÎÍ×µá¤Ë¤Ï±þ¤¸¤Ê¤¤¤È¤¤¤¦±þÅú¤¬´Þ¤Þ¤ì¤Þ¤¹¡£¤Þ¤¿¡Ö½¾¤¦¡×¤Ç¤Ï¥â¥Ç¥ë¤Ï¥æ¡¼¥¶¡¼¤ÎÍ×µá¤Ë½¾¤¦É¬Íפ¬¤¢¤ê¡¢¥â¥Ç¥ë¤Ë¤ÏŬÀڤ˱þÅú¤¹¤ë¤³¤È¤¬µá¤á¤é¤ì¤Þ¤¹¡£
OpenAI¤Ë¤è¤ë¼Â¸³¤Ç¤Ï¡¢RBR¤Ç³Ø½¬¤·¤¿¥â¥Ç¥ë¤Ï¡¢¿Í´Ö¤Î¥Õ¥£¡¼¥É¥Ð¥Ã¥¯¤òÍѤ¤¤Æ³Ø½¬¤·¤¿¥â¥Ç¥ë¤è¤ê¤â°ÂÁ´À¤¬¸þ¾å¤·¤Æ¤¤¤ë¤³¤È¤¬¼¨¤µ¤ì¤¿¤Û¤«¡¢°ÂÁ´¥Ý¥ê¥·¡¼¤Ë±è¤ï¤Ê¤¤ÉÔŬÀڤʲóÅú¤ò¹Ô¤¦»öÎã¤â¸º¾¯¤·¤¿¤È¤Î¤³¤È¡£¤Þ¤¿¡¢RBR¤ÏÂçÎ̤οÍŪ¥Ç¡¼¥¿¤ÎɬÍ×À¤òÂçÉý¤Ë¸º¤é¤·¡¢¥È¥ì¡¼¥Ë¥ó¥°¥×¥í¥»¥¹¤ò¤è¤ê¿×®¤«¤Ä°Â²Á¤Ë¤¹¤ë¤³¤È¤¬¤Ç¤¤¿¤ÈÊó¹ð¤µ¤ì¤Æ¤¤¤Þ¤¹¡£
°ìÊý¤ÇOpenAI¤Ë¤è¤ë¤È¡¢RBR¤ÏÌÀ³Î¤Ê¥ë¡¼¥ë¤ò»ý¤Ä¥¿¥¹¥¯¤Ë¤ÏŬ¤·¤Æ¤¤¤ë¤â¤Î¤Î¡¢¥¨¥Ã¥»¥¤¤Î½ñ¤Êý¤Ê¤É¡¢¤è¤ê¼ç´ÑŪ¤Ê¥¿¥¹¥¯¤ËŬÍѤ¹¤ë¤Ë¤Ï¸þ¤¤¤Æ¤¤¤Ê¤¤¤È¤Î¤³¤È¡£¤½¤³¤ÇOpenAI¤ÏRBR¤È¿Í´Ö¤Î¥Õ¥£¡¼¥É¥Ð¥Ã¥¯¤òÁȤ߹ç¤ï¤»¤Æ¡¢ÆÃÄê¤Î¥¬¥¤¥É¥é¥¤¥ó¤ò½ç¼é¤·¤Ê¤¬¤é¡¢Èù̯¤Ê¦Ì̤ËÂбþ¤Ç¤¤ë¿Í´Ö¤Î°Õ¸«¤òÀ¹¤ê¹þ¤à¤³¤È¤òÄ󾧤·¤Æ¤¤¤Þ¤¹¡£
¤Þ¤¿¡¢OpenAI¤Ï¡Öº£¸å¡¢¤µ¤Þ¤¶¤Þ¤ÊRBR¥³¥ó¥Ý¡¼¥Í¥ó¥È¤ò¤è¤êÊñ³çŪ¤ËÍý²ò¤¹¤ë¤¿¤á¤Î¸¦µæ¤ä¡¢°ÂÁ´À¤òĶ¤¨¤¿Â¾¤ÎÎΰè¤ò´Þ¤à¤µ¤Þ¤¶¤Þ¤Ê¥¢¥×¥ê¥±¡¼¥·¥ç¥ó¤Ë¤ª¤±¤ëRBR¤Î͸úÀ¤ò¸¡¾Ú¤¹¤ë¤¿¤á¤Î¿Í´Ö¤Ë¤è¤ëɾ²Á¤Ê¤É¤ò¼Â»Ü¤¹¤ëͽÄê¤Ç¤¹¡×¤È½Ò¤Ù¤Æ¤¤¤Þ¤¹¡£
¤Ê¤ª¡¢OpenAI¤Ë¤è¤ë¤ÈRBR¤ÏGPT-4¤äGPT-4o mini¤Ê¤É¤ËRBR¤òŬÍѤ·¤Æ¤¤¿¤â¤Î¤Î¡¢º£¸å¤ÏÁ´¤Æ¤ÎAI¥â¥Ç¥ë¤Ë¼ÂÁõ¤¹¤ëÊý¿Ë¤Ç¤¢¤ë¤È¤Î¤³¤È¤Ç¤¹¡£