Translation of natural-language queries into SPARQL queries for parliamentary open data
Public engagement and open parliament
Italy - Senate
Use case ID: 042
Author: Senate of Italy
Date: 18 June 2024
Objective:
Develop an artificial intelligence (AI) system capable of translating natural-language queries into SPARQL queries, enabling users to interact with parliamentary open data stored in a structured ontology, and enhancing the accessibility and usability of the data.
Actors:
- Parliamentary researchers
- Members of the public interested in parliamentary data
- AI development team
- Data ontology specialists
Prerequisites:
- Comprehensive ontology representing parliamentary data
- Natural language processing (NLP) or large language model (LLM) system trained on parliamentary language and SPARQL syntax
- Access to the parliamentary open data repository
Scenario:
- The user inputs a natural-language query (e.g. “What were the voting results for the health reform bill in 2023?”).
- The system processes the input using an NLP or LLM model, identifying key entities and the intent of the query.
- The system translates the processed query into a corresponding SPARQL query.
- The SPARQL query is executed against the parliamentary data ontology.
- The system retrieves the relevant data and presents it to the user in a user-friendly format, such as a table or a summary.
Alternate flows:
- If the natural-language query is ambiguous or incomplete, the system requests clarification or additional information from the user before proceeding with the translation.
- If the requested data is not available or the ontology does not cover the query scope, the system informs the user and suggests possible modifications to the query.
Expected results:
- Access to parliamentary data is improved through an intuitive interface.
- Engagement and transparency are increased by making data easily accessible to non-experts.
- Relevant data are efficiently retrieved and presented, saving time for researchers and the public.
Potential challenges:
- Ensuring the accuracy of the model in understanding and translating queries
- Handling complex or multi-faceted queries that may not map directly to SPARQL
- Maintaining the system’s ability to understand and translate evolving parliamentary language and new data structures
Data requirements:
- A detailed and up-to-date ontology of parliamentary data
- A large dataset of historical queries and corresponding SPARQL queries for training the NLP model
- Continuous updates to both the model and the ontology to handle new data and queries
Integrations with other systems:
- Integration with the existing parliamentary data repository
- Interfaces for both web-based and mobile applications for user interaction
- Analytics tools for monitoring system performance and user interaction patterns
Success metrics:
- Accuracy rate of translated SPARQL queries
- User satisfaction scores based on query results
- Reduction in time taken to retrieve relevant parliamentary data
- Number of queries successfully processed without requiring human intervention