¥¯¥é¥¦¥É¥³¥ó¥Ô¥å¡¼¥Æ¥£¥ó¥°¥µ¡¼¥Ó¥¹¤òÄ󶡤¹¤ëSalesforce¤ÎAI¸¦µæÉôÌ硦Salesforce AI Research¤¬¡¢1Ãû¤â¤Î¥Æ¥­¥¹¥È¥È¡¼¥¯¥ó¤ò´Þ¤à¥ª¡¼¥×¥ó¥½¡¼¥¹¤Î¥Þ¥ë¥Á¥â¡¼¥À¥ë¥Ç¡¼¥¿¥»¥Ã¥È¡ÖMINT-1T¡×¤ò¸ø³«¤·¤Þ¤·¤¿¡£

GitHub - mlfoundations/MINT-1T: MINT-1T: A one trillion token multimodal interleaved dataset.

https://github.com/mlfoundations/MINT-1T

MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens

https://blog.salesforceairesearch.com/mint-1t/



AI¤Î³«È¯¤Ë¤ÏËÄÂç¤ÊÎ̤Υƥ­¥¹¥È¤ä²èÁü¤ò´Þ¤à¥Ç¡¼¥¿¥»¥Ã¥È¤¬É¬ÍפǤ¢¤ê¡¢¹âÉʼÁ¤Ê¥Ç¡¼¥¿¥»¥Ã¥È¤¬¥ª¡¼¥×¥ó¥½¡¼¥¹¤Ç¸ø³«¤µ¤ì¤ë¤³¤È¤Ï¡¢AIʬÌî¤ÎȯŸ¤Ë¤È¤Ã¤ÆÂ礭¤Ê¥á¥ê¥Ã¥È¤È¤Ê¤ê¤Þ¤¹¡£

¿·¤¿¤ËSalesforce AI Research¤Ï¡¢¡ÖMINT-1T¡×¤È¤¤¤¦¥Þ¥ë¥Á¥â¡¼¥À¥ë¤Ê¥Ç¡¼¥¿¥»¥Ã¥È¤ò¥ª¡¼¥×¥ó¥½¡¼¥¹¤Ç¸ø³«¤·¤Þ¤·¤¿¡£MINT-1T¤Ë¤Ï1Ãû¤â¤Î¥Æ¥­¥¹¥È¥È¡¼¥¯¥ó¤ä34²¯Ëç¤Î²èÁü¤¬´Þ¤Þ¤ì¤Æ¤¤¤ë¤Û¤«¡¢PDF¤ä¥×¥ì¥×¥ê¥ó¥È¥µ¡¼¥Ð¡¼¤Ç¤¢¤ëArXiv¤ÎÏÀʸ¤Ê¤É¡¢¤³¤ì¤Þ¤Ç¤Î¥Ç¡¼¥¿¥»¥Ã¥È¤Ë¤Ï³èÍѤµ¤ì¤Æ¤¤¤Ê¤«¤Ã¤¿¥Ç¡¼¥¿¤â´Þ¤Þ¤ì¤Æ¤¤¤ë¤È¤Î¤³¤È¡£

°Ê²¼¤Î¿Þ¤¬¼¨¤·¤Æ¤¤¤ë¤è¤¦¤Ë¡¢OBELICS¤äMMC4¤È¤¤¤Ã¤¿¤³¤ì¤Þ¤Ç¤Î¥ª¡¼¥×¥ó¥½¡¼¥¹¥Ç¡¼¥¿¥»¥Ã¥È¤Î¥È¡¼¥¯¥ó¿ô¤ÏºÇÂç1150²¯¤Ç¤¢¤ê¡¢MINT-1T¤ÏÂçÉý¤Ë¥È¡¼¥¯¥ó¿ô¤¬Áý²Ã¤·¤Æ¤¤¤Þ¤¹¡£



°Ê²¼¤ÏMINT-1T¤Ë´Þ¤Þ¤ì¤Æ¤¤¤ë¥É¥­¥å¥á¥ó¥È¤Î¥µ¥ó¥×¥ë¤Ç¤¹¡£²èÁü¤È¶¦¤Ë¥Æ¥­¥¹¥È¤¬Ê»µ­¤µ¤ì¤Æ¤ª¤ê¡¢¤µ¤Þ¤¶¤Þ¤Ê¥°¥é¥Õ¤ä¥Ò¡¼¥È¥Þ¥Ã¥×¤Ê¤É¤â´Þ¤Þ¤ì¤Æ¤¤¤ë¤³¤È¤¬¤ï¤«¤ê¤Þ¤¹¡£Salesforce AI Research¤Ï¡¢¡ÖMINT-1T¤Î¥­¥å¥ì¡¼¥·¥ç¥ó¤Î¼ç¤Ê¸¶Â§¤Ï¡¢µ¬ÌϤÈ¿ÍÍÀ­¤Ç¤¹¡×¡ÖMINT-1T¤Î¿ÍÍÀ­¤ò¸þ¾å¤µ¤»¤ë¤¿¤á¤Ë¡¢HTML¥É¥­¥å¥á¥ó¥È¤òĶ¤¨¤ÆWeb¤ÎPDF¤äArXiv¤ÎÏÀʸ¤â´Þ¤á¤ë¤è¤¦¤Ë¤·¤Æ¤¤¤Þ¤¹¡£¤³¤ì¤é¤ÎÄɲ彡¼¥¹¤Ë¤è¤ê¡¢Æä˲ʳØʸ½ñ¤Î¥É¥á¥¤¥ó¥«¥Ð¥ì¥Ã¥¸¤¬¸þ¾å¤¹¤ë¤³¤È¤¬¤ï¤«¤ê¤Þ¤·¤¿¡×¤È½Ò¤Ù¤Þ¤·¤¿¡£



°Ê²¼¤Î¥°¥é¥Õ¤Ï¡¢Salesforce AI Research¤¬³«È¯¤·¤¿AI¥â¥Ç¥ë¤ÎXGen-MM¤ò»È¤¤¡¢MINT-1T¤Ç¥È¥ì¡¼¥Ë¥ó¥°¤·¤¿¾ì¹ç(º¸)¤ÈOBELICS¤Ç¥È¥ì¡¼¥Ë¥ó¥°¤·¤¿¾ì¹ç(±¦)¤Î¥Ñ¥Õ¥©¡¼¥Þ¥ó¥¹¤òÈæ³Ó¤·¤¿·ë²Ì¤Ç¤¹¡£MINT-1T¤Ç¥È¥ì¡¼¥Ë¥ó¥°¤·¤¿Êý¤¬¡¢Á´ÂÎŪ¤Ê¥Ñ¥Õ¥©¡¼¥Þ¥ó¥¹¤¬¸þ¾å¤·¤Æ¤¤¤ë¤³¤È¤¬¤ï¤«¤ê¤Þ¤¹¡£