Google¤¬ÆþÎϤ·¤¿¥Æ¥¥¹¥È¤«¤é¼«Æ°¤Çºî¶Ê¤¹¤ëAI¡ÖMusicLM¡×¤ò³«È¯
¥Æ¥¥¹¥È¤«¤é²èÁü¤ò¼«Æ°¤ÇÀ¸À®¤¹¤ë¡ÖStable Diffusion¡×¤ä¡ÖDALL¡¦E¡×¤Î¤è¤¦¤Ë¡¢ÆþÎϤ·¤¿¥Æ¥¥¹¥È¤ÎÄ̤ê¤Ë²»³Ú¤òºî¶Ê¤¹¤ë¼«Æ°ºî¶ÊAI¡ÖMusicLM¡×¤òGoogle¤Î¸¦µæ¥Á¡¼¥à¤¬³«È¯¤·¤Þ¤·¤¿¡£
[2301.11325] MusicLM: Generating Music From Text
https://doi.org/10.48550/arXiv.2301.11325
MusicLM
https://google-research.github.io/seanet/musiclm/examples/
https://techcrunch.com/2023/01/27/google-created-an-ai-that-can-generate-music-from-text-descriptions-but-wont-release-it/
MusicLM¤Ï¤Î¤Ù28Ëü»þ´Ö¤â¤Î²»³Ú¤Ç¹½À®¤µ¤ì¤¿¥Ç¡¼¥¿¥»¥Ã¥È¤Ç¥È¥ì¡¼¥Ë¥ó¥°¤µ¤ì¤Æ¤ª¤ê¡¢¡Ö°õ¾ÝŪ¤Ê¥µ¥Ã¥¯¥¹¤Î¥½¥í¤È²ÎÀ¼¡×¡Ö90ǯÂå¤Î¥Ù¥ë¥ê¥ó¥Æ¥¯¥Î¡×¤Ê¤É¡¢¥Æ¥¥¹¥È¤Ç»Ø¼¨¤·¤¿Ä̤ê¤Ëºî¶Ê¤¹¤ëAI¤È¤Ê¤Ã¤Æ¤¤¤Þ¤¹¡£
Ok now (restrospectively, on high-level) it's kinda simple.
given an training item:
- extract MuLan tokens (M), extract w2v-BERT (S), SS tokens (A)
- train model for M ¢ª S.
- train model for [M;S] ¢ª A
both done by decoder-only transformers. pic.twitter.com/d1BEsu6ZCx— Keunwoo Choi (@keunwoochoi) January 27, 2023
Google¤¬È¯É½¤·¤¿ÏÀʸ¤Ç¤Ï¡¢¼ÂºÝ¤ËMusicLM¤¬ºîÀ®¤·¤¿¶Ê¤¬Î㼨¤µ¤ì¤Æ¤¤¤Þ¤¹¡£°Ê²¼¤Ï¥×¥í¥ó¥×¥È¤Ë¡ÖThe main soundtrack of an arcade game. It is fast-paced and upbeat, with a catchy electric guitar riff. The music is repetitive and easy to remember, but with unexpected sounds, like cymbal crashes or drum rolls.(¥¢¡¼¥±¡¼¥É¥²¡¼¥à¤Î¥µ¥¦¥ó¥É¥È¥é¥Ã¥¯¡£¥¥ã¥Ã¥Á¡¼¤Ê¥¨¥ì¥¥®¥¿¡¼¤Î¥ê¥Õ¤¬¤¢¤ê¡¢¥Ú¡¼¥¹¤¬Â®¤¯¥¢¥Ã¥×¥Ó¡¼¥È¡£²»³Ú¤ÏÈ¿ÉüŪ¤Ç³Ð¤¨¤ä¤¹¤¤¤¬¡¢¥·¥ó¥Ð¥ë¤Î¥¯¥é¥Ã¥·¥å¤ä¥É¥é¥à¥í¡¼¥ë¤Ê¤É¤ÎͽÁÛ³°¤Î²»¤ò´Þ¤à)¡×¤ÈÆþÎϤ·¤¿¾ì¹ç¡£
¡ÖA rising synth is playing an arpeggio with a lot of reverb. It is backed by pads, sub bass line and soft drums. This song is full of synth sounds creating a soothing and adventurous atmosphere. It may be playing at a festival during two songs for a buildup.(¥é¥¤¥¸¥ó¥°¥·¥ó¥»¤¬¥ê¥Ð¡¼¥Ö¤¿¤Ã¤×¤ê¤Î¥¢¥ë¥Ú¥¸¥ª¤òÁդǤ롣¥Ñ¥Ã¥É¤ä¥µ¥Ö¥Ù¡¼¥¹¥é¥¤¥ó¡¢¥½¥Õ¥È¤Ê¥É¥é¥à¤Ë»Ù¤¨¤é¤ì¤Æ¤¤¤ë¡£Íî¤ÁÃ夤¤ÆËÁ¸±Åª¤ÊÊ·°Ïµ¤¤òºî¤ê½Ð¤¹¥·¥ó¥»¥µ¥¦¥ó¥É¤Ç¤¤¤Ã¤Ñ¤¤¡£¥Õ¥§¥¹¤ÇÀ¹¤ê¾å¤²¤ë¤¿¤á¤Ë¶Ê´Ö¤Ë±éÁÕ¤·¤Æ¤¤¤ë´¶¤¸)¡×
¡ÖSlow tempo, bass-and-drums-led reggae song. Sustained electric guitar. High-pitched bongos with ringing tones. Vocals are relaxed with a laid-back feel, very expressive(¥¹¥í¡¼¥Æ¥ó¥Ý¡¢¥Ù¡¼¥¹¡õ¥É¥é¥à¼çƳ¤Î¥ì¥²¥¨¥½¥ó¥°¡£¥µ¥¹¥Æ¥¤¥ó¤Î¸ú¤¤¤¿¥¨¥ì¥¥®¥¿¡¼¡£ÌĤê¤Î¤¢¤ë¹Ã¹â¤¤¥Ü¥ó¥´¡£¥Ü¡¼¥«¥ë¤Ï¤æ¤Ã¤¿¤ê¤È¤·¤¿´¶¤¸¤Ç¥ê¥é¥Ã¥¯¥¹¤·¤Æ¤ª¤ê¡¢Èó¾ï¤Ëɽ¸½ÎÏˤ«)¡×
¥·¥ó¥×¥ë¤Ë¡Örelaxing jazz(Íî¤ÁÃ夯¥¸¥ã¥º)¡×¤ÈÆþÎϤ¹¤ë¤È¡¢¤³¤ó¤Ê´¶¤¸¡£
¤Þ¤¿¡¢ºÆÀ¸»þ´Ö¤ò»ØÄꤹ¤ë¤³¤È¤ÇÊ£¿ô¤Î¶ÊÄ´¤ò¤Þ¤È¤á¤Æ1¤Ä¤Î¶Ê¤Ë¤Ä¤Ê¤²¤ë¤³¤È¤¬¤Ç¤¤Þ¤¹¡£Î㤨¤Ð¡¢¡Öjazz song (0:00-0:15) pop song (0:15-0:30) rock song(0:30-0:45) death metal song (0:45-1:00) rap song (1:00-1:15) string quartet with violins (1:15-1:30) epic movie soundtrack with drums (1:30-1:45) scottish folk song with traditional instruments (1:45-2:00)¡×¤È»ØÄꤷ¤Æ½ÐÎϤ·¤¿¶Ê¤¬°Ê²¼¡£
¤Þ¤¿¡¢¥Æ¥¥¹¥È¤À¤±¤Ç¤Ï¤Ê¤¯²èÁü¤È¤½¤ÎÀâÌÀʸ¤«¤é²»³Ú¤òºî¶Ê¤¹¤ë¤³¤È¤â²Äǽ¡£°Ê²¼¤Ï¥µ¥ë¥Ð¥È¡¼¥ë¡¦¥À¥ê¤Î¡Öµ²±¤Î¸Ç¼¹¡×¤Î²èÁü¤È¡¢¥Ö¥ê¥¿¥Ë¥«É´²Ê»öŵ¤Ë¤ª¤±¤ëƱºîÉʤÎÀâÌÀʸ¤òMusicLM¤ËÆþÎϤ·¤ÆºîÀ®¤µ¤ì¤¿³Ú¶Ê¤Ç¤¹¡£
³Ú¶Ê¤Ë¤Ï¥Ü¡¼¥«¥ë¤ä¥³¡¼¥é¥¹¤òÉÕ¤±¤ë¤³¤È¤â²Äǽ¡£¤¿¤À¤·¡¢¤¢¤¯¤Þ¤Ç¤â¡Ö¥Ü¡¼¥«¥ë¤ä¥³¡¼¥é¥¹¤Ã¤Ý¤¯Ê¹¤³¤¨¤ë¡×ÄøÅ٤Ǥ¢¤ê¡¢¤½¤Î²Î»ì¤Ï¤«¤í¤¦¤¸¤Æ±Ñ¸ì¤Ëʹ¤³¤¨¤ë¤À¤±¤Ç¤Þ¤Ã¤¿¤¯°ÕÌ£¤ò»ý¤¿¤Ê¤¤¸ÀÍդˤʤäƤ¤¤Þ¤¹¡£
Yesterday, Google published a paper on a new AI model called MusicLM.
The model generates 24 kHz music from rich captions like "A fusion of reggaeton and electronic dance music, with a spacey, otherworldly sound. Induces the experience of being lost in space." pic.twitter.com/XPv0PEQbUh— Product Hunt ???? (@ProductHunt) January 27, 2023
Google¤Î¸¦µæ¥Á¡¼¥à¤ÏMusicLM¤Î¤è¤¦¤Ê¥·¥¹¥Æ¥à¤¬¤â¤¿¤é¤¹Â¿¤¯¤ÎÎÑÍýŪ²ÝÂê¤òÍýͳ¤Ë¡¢MusicLM¤ò°ìÈ̸ø³«¤·¤Æ¤¤¤Þ¤»¤ó¡£¸¦µæ¥Á¡¼¥à¤Ë¤è¤ë¤È¡¢MusicLM¤Ç¤Ï¥Ç¡¼¥¿¥»¥Ã¥È¤Ë´Þ¤Þ¤ì¤ë¶Ê¤¬À¸À®¤µ¤ì¤¿³Ú¶Ê¤Ë¤½¤Î¤Þ¤Þ¼è¤ê¹þ¤Þ¤ì¤ë·¹¸þ¤¬¤¢¤ë¤È¤Î¤³¤È¡£¤¢¤ë¼Â¸³¤Ç¤Ï¡¢¥·¥¹¥Æ¥à¤¬À¸À®¤·¤¿³Ú¶Ê¤ÎÌó1¡ó¤¬¥Ç¡¼¥¿¥»¥Ã¥È¤«¤éľÀÜ¥³¥Ô¡¼¤µ¤ì¤¿¤â¤Î¤Ç¤¢¤ë¤ÈȽÌÀ¤·¤¿¤½¤¦¤Ç¤¹¡£¸¦µæ¥Á¡¼¥à¤Ï¡¢¡Ö»ä¤¿¤Á¤Ï¡¢¤³¤Î¥æ¡¼¥¹¥±¡¼¥¹¤Ë´ØÏ¢¤¹¤ëÁϤŪ¤Ê¥³¥ó¥Æ¥ó¥Ä¤ÎÀøºßŪ¤ÊÉÔÀµÍøÍѤΥꥹ¥¯¤òǧ¼±¤·¤Æ¤¤¤Þ¤¹¡£»ä¤¿¤Á¤Ï¡¢¤³¤ì¤é¤Î¥ê¥¹¥¯¤Ë¼è¤êÁȤà¾å¤Ç¡¢¾ÍèŪ¤Ë¤â¤Ã¤È¸¦µæ¤¬É¬ÍפǤ¢¤ë¤³¤È¤ò¶¯¤¯¼çÄ¥¤·¤Þ¤¹¡×¤È½Ò¤Ù¤Æ¤¤¤Þ¤¹¡£
¤Ê¤ª¡¢MusicLM¼«ÂΤϰìÈ̸ø³«¤µ¤ì¤Æ¤¤¤Þ¤»¤ó¤¬¡¢MusicLM¤Îɾ²Á¤Ë»ÈÍѤµ¤ì¤¿¥Ç¡¼¥¿¥»¥Ã¥È¡ÖMusicCaps¡×¤Ï°Ê²¼¤Ç¸ø³«¤µ¤ì¤Æ¤¤¤Þ¤¹¡£
MusicCaps | Kaggle
https://www.kaggle.com/datasets/googleai/musiccaps