Skip to main content

Systems development: Deployment patterns

Image

About this sub-guideline

This sub-guideline is part of the guideline Systems development. It can be read in conjunction with the sub-guideline Systems development: Deployment and implementation. Refer to the main guideline for context and an overview.

Background

The choice of technology and infrastructure significantly influences the robustness, security, scalability, performance and ease of supervision of AI systems. Together, these factors contribute to the overall reliability of such systems.

Different deployment patterns come with their own set of trade-offs. Some may incur higher costs, while others might require more extensive management resources. As a result, there is no one-size-fits-all “best approach” to AI system deployment. Instead, the key to successful deployment lies in carefully assessing parliament’s needs, resources and goals, and then selecting an approach that offers the best balance of features and practicality for parliament’s particular situation.

Characteristics of deployment patterns

The deployment of AI systems involves following several patterns and practices to ensure that models perform effectively and reliably in production environments. When designing the AI system architecture, parliament should therefore consider the deployment pattern characteristics discussed below.

Deployment architecture

The deployment architecture of an AI system is determined by two key factors: how the AI algorithm responds to requests, and where the AI model is hosted.

Request-handling patterns:

  • Batch processing: Data is processed in large batches at scheduled intervals, making this pattern suitable for non-time-sensitive tasks. 

  • Online serving: Requests are handled in real time as they come in, making this pattern ideal for applications requiring immediate responses.
  • Streaming: Under this pattern, data streams are continuously processed, enabling near-real-time analysis and predictions.

Hosting location types:

  • On-premises: Models are deployed on local servers, often for the purpose of enhanced security or to meet specific compliance requirements.

  • Cloud: Models are hosted on cloud platforms, offering benefits such as scalability, flexibility and reduced infrastructure management.
  • Edge: Models are deployed on edge devices, providing low-latency predictions and offline capabilities, making this approach suitable for Internet of Things (IoT) and mobile applications.
  • Hybrid: This approach combines on-premises, cloud and edge deployments to optimize performance and resource usage based on specific needs.

The choice of deployment architecture depends on factors such as data sensitivity, response-time requirements, available resources and the specific use case of the AI system.

Scalability

It is important to understand the average number of requests the AI system will receive, along with its life cycle. These factors will determine the deployment scalability characteristics:

  • Horizontal scaling: Adding more instances of the model server to handle increased load

  • Vertical scaling: Enhancing the capacity of existing servers (e.g. by adding more memory or faster central processing units (CPUs))

  • Auto-scaling: Automatically adjusting the number of model instances based on demand

Latency and throughput

When deploying AI systems, two critical performance metrics to consider are latency and throughput:

  • Latency refers to the time it takes for the AI model to respond to a request, which is particularly crucial for real-time applications. 
  • Throughput measures the number of requests the AI model can process per unit of time, which is essential for high-volume applications. 

It is important to establish acceptable values for both latency and throughput to ensure that the system meets the specific needs of the application for which it is intended, and that it can handle the expected workload efficiently.

Model management 

Effective AI model management is crucial throughout the entire life cycle of an AI system. However, it becomes particularly important once the AI system is put into operation. A well-designed model management strategy should address several key aspects:

  • Versioning: This involves keeping track of different versions of the model, ensuring traceability and the ability to roll back if needed. Proper versioning allows teams to manage changes, compare performance across iterations and maintain a clear history of the model’s changes over time.
  • Life cycle management: This approach encompasses the tools and processes for deploying, monitoring, updating and, eventually, retiring models. The aim is to ensure that models are properly maintained throughout their operational life, from initial deployment through to eventual replacement.
  • A/B testing: This practice involves running multiple versions of a model simultaneously to compare their performance. A/B testing allows teams to make data-driven decisions about which model version performs best in real-world conditions before full deployment.

Monitoring and observability

  • Performance metrics: Monitoring metrics such as response time, throughput and resource utilization

  • Drift detection: Identifying when the model’s performance degrades owing to changes in data distribution

  • Alerting: Setting up alerts for anomalies or performance degradation

Security

  • Access control: Ensuring that only authorized users and applications can interact with the model

  • Data privacy: Protecting sensitive data and adhering to regulations (e.g. GDPR)

  • Model security: Safeguarding models against adversarial attacks and data poisoning

Continuous integration/continuous deployment (CI/CD)

  • Automation: Automating the deployment process to reduce errors and deployment time

  • Testing: Including automated testing (unit, integration, regression) in the deployment pipeline

  • Rollbacks: Providing mechanisms for quickly reverting to previous versions in case of issues

Resource management

  • Hardware acceleration: Utilizing graphics processing units (GPUs), tensor processing units (TPUs) or other accelerators for improved performance

  • Resource allocation: Managing resources to optimize cost and performance

  • Integration with existing systems: Providing APIs for integration with other systems and services

  • Data pipelines: Integrating with data ingestion and pre-processing pipelines

  • Feedback loops: Implementing systems to collect feedback from model predictions to improve future performance

Resilience and fault tolerance

  • Redundancy: Having multiple instances or backups to ensure availability

  • Failover: Automatically switching to backup systems in case of failure

  • Retry logic: Implementing mechanisms to handle transient failures

Auditability and explainability

In most cases, audit logs are mandatory for predictions, inputs and system interactions.

In addition to auditing, explainability tools can be used to interpret AI model decisions, thus improving trust and compliance.

Combinations of deployment patterns

Various combinations of characteristics are often seen in AI use cases. These are detailed below:

Model-as-a-service (MaaS)

  • Characteristics: Exposing models via web APIs for easy integration
  • Use cases: Real-time predictions, microservices architecture

Model embedded in applications

  • Characteristics: Embedding models directly in applications, either locally or via a microservice
  • Use cases: Edge computing, offline capabilities, low-latency requirements

Containerized deployment

  • Characteristics: Packaging models in containers (e.g. Docker) for consistent deployment across environments
  • Use cases: Cloud deployments, microservices, scalable architectures

Serverless deployment

  • Characteristics: Using serverless computing platforms to deploy models
  • Use cases: Event-driven applications, cost optimization for intermittent workloads

On-demand/batch processing

  • Characteristics: Deploying models that run on demand or process large batches of data periodically
  • Use cases: Data-processing pipelines, periodic analytics

Streaming analytics

  • Characteristics: Deploying models to analyse and predict data from streaming sources
  • Use cases: Real-time analytics, IoT applications

A/B testing and canary releases

  • Characteristics: Testing new models on a subset of traffic before full deployment
  • Use cases: Incremental updates, risk minimization

Federated learning

  • Characteristics: Training models across multiple decentralized devices or servers while keeping data local
  • Use cases: Privacy-sensitive applications, distributed data sources

Below are some suggestions and characteristics for parliaments to consider when planning and executing AI deployments that are scalable, reliable, safe and efficient:

  • Ensure best fit: Select a deployment pattern according to the specific use case, performance requirements and domain constraints.
  • Monitor and iterate: Continuously monitor deployed models and iterate based on user feedback and performance metrics.
  • Maintain security: Implement robust security practices to protect models and data in production environments.
  • Optimize resources: Efficiently manage resources to balance performance and cost, leveraging approaches such as containerization and serverless architectures where appropriate.

The Guidelines for AI in parliaments are published by the IPU in collaboration with the Parliamentary Data Science Hub in the IPU’s Centre for Innovation in Parliament. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International licence. It may be freely shared and reused with acknowledgement of the IPU. For more information about the IPU’s work on artificial intelligence, please visit www.ipu.org/AI or contact [email protected].