³¤³°¤Îµ»½Ñ¼Ô¤¬ÆüËܸì¤Î¡Öʸ»ú²½¤±¡×¤òËܵ¤¤Ç²òÀâ¡¢ÆüËܿʹéÉ餱¤ÎÆüËÜÄ̤äפ꤬ÈäϪ¤µ¤ì¤ë

by Whooym
ʸ»ú¤¬Å¬ÀÚ¤Ëɽ¼¨¤µ¤ì¤º¤ËÆÉ¤á¤Ê¤¯¤Ê¤Ã¤Æ¤·¤Þ¤¦¡Öʸ»ú²½¤±¡×¤Ï¡¢³¤³°¤Îµ»½Ñ¼Ô¤Î´Ö¤Ç¤â¡ÖMojibake¡×¤ÇÄÌÍѤ¹¤ë¤È¤Î¤³¤È¡£¤½¤ó¤Êʸ»ú²½¤±¤Î¼ïÎà¤Ë¤Ä¤¤¤Æ¡¢Åìµþ¤Ç¼«Á³¸À¸ì½èÍý(NLP)¤Î³«È¯¤ò¤·¤Æ¤¤¤ë¥Ý¡¼¥ë¡¦¥ª¥ê¡¼¥ê¡¦¥Þ¥Ã¥¥ã¥ó»á¤¬²òÀ⤷¤Þ¤·¤¿¡£
A Field Guide to Japanese Mojibake
https://www.dampfkraft.com/mojibake-field-guide.html
¢¡UTF-8
UTF-8¤Ï¥¤¥ó¥¿¡¼¥Í¥Ã¥È¾å¤Ç¤ÏºÇ¤â°ìÈÌŪ¤Êʸ»ú¥³¡¼¥É¤Ç¡¢ÆüËܤǤâ¶áǯÉáµÚ¤·¤Æ¤¤Þ¤·¤¿¡£¤½¤ó¤ÊUTF-8¤ÇºîÀ®¤·¤¿¥µ¥ó¥×¥ëʸ¾Ï¤ò¥·¥Õ¥ÈJIS¤Ç³«¤¯¤È¡¢°Ê²¼¤Î¤è¤¦¤Ë¤Ê¤ê¤Þ¤¹¡£
¤Þ¤º¡¢°Ê²¼¤ò¥µ¥ó¥×¥ëʸ¾Ï¤È¤·¤Þ¤¹¡£
¸ãÇÚ¤ÏǤǤ¢¤ë¡£Ì¾Á°¤Ï¤Þ¤À¤Ê¤¤¡£
¥¨¥ó¥³¡¼¥É¤ÎÀßÄê¤ò´Ö°ã¤¨¤ë¤Èʸ»ú¤¬²½¤±¤Æ¤·¤Þ¤¦¡£
Åìµþ¥¿¥ï¡¼¤Î¹â¤µ¤Ï333m¤Ç¤¹¡£
¤½¤·¤Æ¡¢¾åµ¤òUTF-8¤Ç¥¨¥ó¥³¡¼¥É¤·¤Æ¤«¤é¥·¥Õ¥ÈJIS¤Ç³«¤¤¤ÆÊ¸»ú²½¤±¤µ¤»¤¿¤â¤Î¤¬°Ê²¼¡£

¥Þ¥Ã¥¥ã¥ó»á¤Ï¤³¤Îʸ»ú²½¤±¤Ë¤Ä¤¤¤Æ¡ÖÆÃÄê¤Îʸ»ú¤¬ÉѽФ·¤Æ¤¤¤ë¤Î¤¬Ê¬¤«¤ê¤Þ¤¹¤¬¡¢¥Ç¥¤¥ê¡¼¥Ý¡¼¥¿¥ëZ¤Ë¤Ï¤½¤Î°ÕÌ£¤òõµá¤·¤¿ÌÌÇò¤¤µ»ö¤¬¤¢¤ê¤Þ¤¹¡£¤½¤ì¤Ë¤è¤ë¤È¡¢Ê¸»ú²½¤±¤·¤¿Ãæ¤Ë¤¢¤ë¡Øåâ¡Ù¤È¤¤¤¦Ê¸»ú¤Ï¡¢åâåã(¤¦¤ó¤²¤ó)¤È¤¤¤Ã¤Æ¿¥Êª¤Ë»È¤ï¤ì¤ë¿§ÉÕ¤¤Î¼ÊÌÏÍͤΤ³¤È¤ò»Ø¤¹¤½¤¦¤Ç¤¹¡×¤È¥³¥á¥ó¥È¤·¤Þ¤·¤¿¡£
°Ê²¼¤Î²èÁü¤ÇÂÍøµÁËþ¤¬ºÂ¤Ã¤Æ¤¤¤ë¾ö¤Î±ï¤¬¡Öåâåã±ï¡×¤Ç¤¹¡£¡Ö¤³¤ì¤Ï¡¢¸µ¤Ï°Ì¤¬¹â¤¤¤³¤È¤ò°ÕÌ£¤¹¤ë¤â¤Î¤Ç¤·¤¿¤¬¡¢º£Æü¤Ç¤Ï¤Ò¤Êº×¤ê¤Ë»È¤ï¤ì¤ë¤Ò¤Ê¿Í·Á¤Ç¸«¤ë¤³¤È¤¬¤Ç¤¤Þ¤¹¡×¤È¥Þ¥Ã¥¥ã¥ó»á¤ÏÀâÌÀ¤·¤Æ¤¤¤Þ¤¹¡£

¤Þ¤¿¥Þ¥Ã¥¥ã¥ó»á¤Ï¡¢¤³¤ÎÃ챤òÀ¸¤«¤·¤Æ2021ǯ¤ËÊüÁ÷¤µ¤ì¤¿¥¢¥Ë¥á¡Ö΢À¤³¦¥Ô¥¯¥Ë¥Ã¥¯¡×¤ËÅо줷¤¿Ê¸»ú²½¤±¤·¤¿´ÇÈĤ¬¡¢UTF-8¤ò¥·¥Õ¥ÈJIS¤Çɽ¼¨¤·¤¿¤â¤Î¤À¤È¤¤¤¦¤³¤È¤ò¸«È´¤¤Þ¤·¤¿¡£
After seeing a lot of mojibake over the years I realized that different encoding pairs have different visual textures because of the particular kinds of garbage that come out. This scene from Urasekai Picnic has UTF8 rendered as SJIS, a very common combination. pic.twitter.com/798nEE8G1I— Paul O'Leary McCann (@polm23) October 31, 2021
¢¡¥·¥Õ¥ÈJIS
¥·¥Õ¥ÈJIS¤Ï¡¢°ÊÁ°¤ÏÆüËܸì¤Î¥µ¥¤¥È¤ÇºÇ¤â°ìÈÌŪ¤Ë»È¤ï¤ì¤Æ¤¤¤¿Ê¸»ú¥³¡¼¥É¤Ç¤¹¡£¶áǯ¤Î¥µ¥¤¥È¤Ï¤Û¤È¤ó¤ÉUTF-8¤Ë¼è¤Ã¤ÆÂå¤ï¤é¤ì¤Æ¤¤¤Þ¤¹¤¬¡¢¸Å¤¤¥Õ¥£¡¼¥Á¥ã¡¼¥Õ¥©¥ó¡¢¤¤¤ï¤æ¤ë¡Ö¥¬¥é¥±¡¼¡×¤Î¥á¡¼¥ë¤Ë¤Ï¥·¥Õ¥ÈJIS¤¬»È¤ï¤ì¤Æ¤¤¤ë¤È¤Î¤³¤È¡£
¥·¥Õ¥ÈJIS¤ÇºîÀ®¤µ¤ì¤¿Á°½Ò¤Î¥µ¥ó¥×¥ëʸ¾Ï¤òUTF-8¤Ç³«¤¯¤È°Ê²¼¤Î¤è¤¦¤Ë¤Ê¤ê¤Þ¤¹¡£

¥Þ¥Ã¥¥ã¥ó»á¤Ï¡¢¡Ö¥·¥Õ¥ÈJIS¤Çʸ»ú²½¤±¤·¤¿Ê¸¾Ï¤Ë¤è¤¯¸«¤é¤ì¤ë¤Î¤¬¡¢°ìÈÌŪ¤Êʸ»ú¤ÎÄÁ¤·¤¤¥Ñ¥¿¡¼¥ó¤Ç¤¹¡£ÆÃ¤ËÌÜΩ¤Ä¤Î¤Ï¡Ø¹â¡Ù¤Î°ÛÂλú¤Î¡Øüâ¡Ù¤Ç¡¢Àþ¤¬¤Ä¤Ê¤¬¤Ã¤Æ¤¤¤ë¤³¤È¤«¤é¡Ø¤Ï¤·¤´¤À¤«¡Ù¤È¸Æ¤Ð¤ì¡¢Ì¾»ú¤Ë¤è¤¯»È¤ï¤ì¤Þ¤¹¡£¡Øºê¡Ù¤Î¡ØÂç¡Ù¤¬¡ØÎ©¡Ù¤Ë¤Ê¤Ã¤Æ¤¤¤ë¡Øùõ(¤¿¤Ä¤µ¤)¡Ù¤â»÷¤¿¤è¤¦¤Ê¥±¡¼¥¹¤Ç¤¹¤¬¡¢¡Øüâ¡Ù¤ËÈæ¤Ù¤ë¤È¾¯¤Ê¤¤¤è¤¦¤Ç¤¹¡×¤È¥³¥á¥ó¥È¤·¤Þ¤·¤¿¡£
¢¡EUC-JP
UNIXÍѤ˳«È¯¤µ¤ì¤¿EUC-JP¤Ï¡¢¸å½Ò¤ÎISO-2022-JP¤È´ðËÜŪ¤Êµ¬³Ê¤ò¶¦Í¤·¤Æ¤¤¤Þ¤¹¤¬¡¢¥¨¥ó¥³¡¼¥É¤ÎÊý¼°¤¬¤è¤ê¥·¥ó¥×¥ë¤Ë¤Ê¤Ã¤Æ¤¤¤Þ¤¹¡£¥·¥Õ¥ÈJIS¤ÈƱ¤¸¤è¤¦¤Ê»È¤ï¤ìÊý¤ò¤·¤Þ¤·¤¿¤¬¡¢¤½¤ì¤Û¤ÉÉáµÚ¤Ï¤·¤Þ¤»¤ó¤Ç¤·¤¿¡£
EUC-JP¤òÁ°½Ò¤ÎUTF-8¤Çɽ¼¨¤¹¤ë¤È¤³¤ó¤Ê´¶¤¸¤Ë¤Ê¤ê¤Þ¤¹¡£

¥·¥Õ¥ÈJIS¤Ç³«¤¤¤Æ¤â¡¢¤¦¤Þ¤¯É½¼¨¤µ¤ì¤Þ¤»¤ó¡£

¥Þ¥Ã¥¥ã¥ó»á¤Ï¡¢¡ÖEUC-JP¤ò¥·¥Õ¥ÈJIS¤Ç³«¤¤¤¿Ê¸»ú²½¤±¤Ç¶½Ì£¿¼¤¤¤Î¤Ï¡¢È¾³Ñ¥«¥¿¥«¥Ê¤¬¤è¤¯½Ð¤Æ¤¯¤ëÅÀ¤Ç¤¹¡£¤³¤ì¤Ï¡¢¥·¥Õ¥ÈJIS¤¬È¾³Ñ¥«¥¿¥«¥Ê¤ò1¥Ð¥¤¥È¤Çɽ¤·¤Æ¤¤¤ë¤«¤é¤Ç¤¹¡×¤È¥³¥á¥ó¥È¤·¤Æ¤¤¤Þ¤¹¡£
¢¡ISO-2022-JP
ISO-2022-JP¤Ï¥á¡¼¥ë°Ê³°¤Ç¤Ï¤¢¤Þ¤ê»È¤ï¤ì¤Æ¤¤¤Þ¤»¤ó¤¬¡¢»þÀÞ¸«¤«¤±¤ëµ¡²ñ¤¬¤¢¤ë¤È¤Î¤³¤È¡£¤·¤«¤·¡¢ÇÉÀ¸¤¬Â¿¤¤¤Î¤Ç¡¢¤¢¤ë¥·¥¹¥Æ¥à¤Ç¤ÏÆÉ¤á¤Æ¤âÊ̤Υ·¥¹¥Æ¥à¤Ç¤ÏÆÉ¤á¤Ê¤¤¤È¤¤¤¦ÌäÂ꤬ȯÀ¸¤·¤Æ¤·¤Þ¤¦¤½¤¦¤Ç¤¹¡£
ISO-2022-JP¤ÇºîÀ®¤·¤¿¥µ¥ó¥×¥ëʸ¾Ï¤òUTF-8¡¦¥·¥Õ¥ÈJIS¡¦EUC-JP¤Ç³«¤¯¤È¡¢¤¤¤º¤ì¤Îʸ»ú¥³¡¼¥É¤Ç¤â°Ê²¼¤Î¤è¤¦¤Ë¤Ê¤ê¤Þ¤¹¡£

¥Þ¥Ã¥¥ã¥ó»á¤Ë¤è¤ë¤È¡¢ISO-2022-JP¤Îʸ»ú¤òUTF-8¡¦¥·¥Õ¥ÈJIS¡¦EUC-JP¤Ç³«¤¤¤¿ºÝ¤Îʸ»ú²½¤±¤¬Æ±¤¸¤Ê¤Î¤Ï¡¢Â¾¤Îʸ»ú¥³¡¼¥É¤Ç¤Ï¥¨¥¹¥±¡¼¥×¤È²ò¼á¤µ¤ì¤ëʸ»ú¤¬¤Ê¤¤¤«¤é¤À¤È¤Î¤³¤È¤Ç¤¹¡£
