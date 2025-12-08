Visual Bank³ô¼°²ñ¼Ò

¡ÚTranslation¡Û

Japanese Crime-Themed Monologue Speech Corpus for ASR and Language Modeling

Long-form natural Japanese speech for ASR training, Conversational AI evaluation, and Educational AI research

Visual Bank Inc.¡ÊMinato-ku, Tokyo; CEO: Saneyuki Nagai, hereinafter ¡ÈVisual Bank¡É¡Ëhas released the Japanese Single-Speaker Crime-Themed Monologue Speech Corpus under its AI training data solution Qlean Dataset, operated through its subsidiary Amanaimages Inc.

This dataset contains single-speaker narrative audio on topics related to incidents and crimes, and is designed for applications in Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and the development of generative AI foundation models.



The dataset consists of continuous explanatory monologues covering historical cases, legal and institutional topics, and social issues related to crime.

It is structured as long-form monologue speech that includes natural topic shifts, context-dependent narration, opinion structuring, and episodic explanations.

All recordings are natural speech not dependent on scripted text.

The total recording duration is approximately 350 hours, with individual audio lengths ranging from 5 to 40 minutes.

The dataset includes male and female speakers in their 20s to 50s and is provided in 44.1 kHz mp3 format suitable for training and evaluation.

Because the dataset contains natural speech with explanatory and domain-specific content related to crime, it is suited for evaluating AI models that require contextual understanding, long-form audio processing, and semantic comprehension.

It can be used for improving ASR accuracy in professional environments, extending knowledge for generative AI systems, and evaluating dialogue models in academic and education-oriented research settings.

Dataset Overview - ¡ÈJapanese Single-Speaker Crime-Themed Monologue Speech Corpus¡É

[É½2: https://prtimes.jp/data/corp/108024/table/113_2_69a22f9451f4256d33a26fe9d5baaf92.jpg?v=202512080358 ]

Use Case Examples - ¡ÈJapanese Single-Speaker Crime-Themed Monologue Speech Corpus¡É

¡ÚAcademic Research¡Û

¡ÚIndustrial Applications¡Û

¡ÚEducation / Public-Sector Use¡Û

- ASR research on long-form Japanese monologuesThe dataset enables evaluation of Japanese ASR systems on context-dependent narration that includes natural topic transitions in the crime domain.- Evaluation of NLP models for contextual understanding and summarizationIts long-form monologue structure supports tasks such as semantic unit extraction, discourse analysis, and summarization model benchmarking.- Enhancing accuracy of AI systems handling domain-specific speech inputBecause the dataset includes specialized vocabulary related to crime and institutional explanations, it can be used to improve speech processing for call centers, knowledge-base search AI, and domain-specific conversational AI.- Strengthening multimodal generative AI pipelines (speech ¢ª text ¢ª semantic understanding)Natural monologue audio enables performance improvement for tasks such as speech-based summarization and explanatory text generation.- AI research for judicial and social education applicationsExplanatory audio on crime topics can be used as foundational material for developing AI systems that support educational content, including automated explanation generation and speech understanding.

About Qlean Dataset

Qlean Dataset is a commercial-use-ready AI training data solution provided by Amana Images Inc., a subsidiary of Visual Bank Inc.

It supports diverse data types including images, videos, audio, 3D, and text-enabling both research and commercial AI development in a legally safe environment.

Through collaborations with data partners such as Chiba Lotte Marines Co., Ltd. and Toyo Keizai Inc., Qlean Dataset continuously expands its specialized, industry-relevant lineup known as the ¡ÈAI Data Recipe.¡É

By reducing the operational burden of data collection and preparation, Qlean Dataset helps build legally compliant and risk-free AI development environments.

▶ Qlean Dataset: https://qleandataset.visual-bank.co.jp/en

▶ AI Data Recipe: https://qleandataset.visual-bank.co.jp/en/lineup

Key Features of Qlean Dataset

- Full consent obtained from all subjects; compliant with GDPR and CCPA- Existing datasets deliverable within one business day- Custom data collection and recording available

▶ Contact: https://qleandataset.visual-bank.co.jp/en/contact

About Visual Bank Inc.

Visual Bank Inc. is a Tokyo-based startup building next-generation data infrastructure to maximize AI development capabilities under the mission, ¡ÈUnlock the potential of all data.¡É

The company operates THE PEN, an AI-assisted creative tool for manga artists, and wholly owns Amana Images Inc., which provides the Qlean Dataset service.

CEO: Saneyuki Nagai

Address: C-Cube Minami Aoyama Building 6F, 7-1-7 Minami-Aoyama, Minato-ku, Tokyo 107-0062

Corporate Site: https://visual-bank.co.jp/en

Amana Images: https://qleandataset.visual-bank.co.jp/en/company-overview