Skip to main content

Automatic audio transcription of plenary sessions and committee meetings

Brazil - Senate

Use case ID: 071

Author: Federal Senate of Brazil

Date: 15 October 2024 

 

Objective:  

Automatically provide audio transcriptions of legislative events, such as plenary sessions and committee meetings, using artificial intelligence (AI) models. The audio may also be subject to a process of diarization and, subsequently, speaker classification.

 Actors: 

  • Parliamentary record and editorial analysts
  • Escriba: an information system for managing the daily activities of the Parliamentary Record and Editorial Secretariat, such as audio transcription and text proofreading tasks 

Prerequisites: 

  • Integration with Escriba
  • AI models for automatic speech recognition (ASR)
  • AI models for audio diarization
  • Trained AI models for speaker classification using audio recordings from the 81 senators in office
  • AI robots that collect audio recordings and control the transcription, diarization and speaker classification processes

Scenario: 

  1. AI robot side:
    1. A set of audio recordings is placed in a remote directory following a predefined hierarchy.
    2. Each AI robot monitoring that remote directory collects an audio recording and begins the transcription process.
    3. After transcription, the robot may also perform diarization and speaker classification, according to the system’s configurations as set by an admin user.
    4. The transcribed text resulting from the AI processing is properly formatted as a JSON file and placed in a different remote directory.
    5. Escriba reads the remote directory containing the JSON files as requested by the user.
  2. User side:
    1. A user accesses Escriba.
    2. The user views the audio trails associated with them.
    3. The user initiates the process of listening to and transcribing the audio recordings.
    4. The user selects the option to use the audio transcripts produced previously by the AI robots.
    5. The user views and, if necessary, adjusts the transcripts returned by the AI robots.

Alternate flows:  

  • A user accesses Escriba to view their associated audio trails.
  • The user initiates the process of listening to and transcribing the audio recordings.
  • The user selects the option to use the audio transcripts produced previously by the AI robots.
  • The user notices that no audio transcripts are available, or that the results are of low quality.
  • The user discards the transcripts produced by the AI model and performs the transcription task as usual.

Expected results: 

  • Acceleration of the overall transcription process performed by legislative analysts
  • Efficient transcription, diarization and speaker classification, providing timely results for legislative analysts to conduct their daily tasks
  • Effective transcription, diarization and speaker classification, with high-quality results

Potential challenges: 

  • State-of-the-art ASR and diarization models require robust computer infrastructure (RAM and GPUs) in order to be used properly, i.e. to provide efficient and effective results.
  • AI models for ASR may be imprecise in some circumstances. For instance, when transcribing a senator’s name, the results might not be as expected.
  • AI models for ASR, especially multimodal generative ones, may return varying transcription results each time they are prompted to do so, lacking consistency.
  • AI models for ASR, especially multimodal generative ones, may block the returned results if they detect harmful content on the submitted audio.
  • Multimodal generative AI models for ASR require good prompts to ensure improved transcription results.
  • It is challenging to detect events of interest in the audio recording, such as background noises and bell rings.

Data requirements: 

  • Audio samples from all members of the Federal Senate

Integrations with other systems: 

  • Escriba

Success metrics: 

  • Transcription quality as measured by the word error rate (WER) metric
  • Average transcription time

The Use cases for AI in parliaments collection is published by the IPU’s Centre for Innovation in Parliament as part of the Parliamentary Data Science Hub’s project to create guidelines for AI governance in parliaments.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International licence. It may be freely shared and reused with acknowledgement of the author and the IPU. 

A use case describes how a system should work. It is used to plan, develop and measure implementation. A use case is not the same as a case study, which is a descriptive text of an actual project’s implementation. Please note that this use case is provided “as is” and neither the IPU nor the author accepts any responsibility for its use.

For more information about the IPU’s work on artificial intelligence, please visit www.ipu.org/AI or contact [email protected]