Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

How to use service mesh to improve AI model security

Discover the invisible hero behind secure, observable, and flexible AI deployments

June 16, 2025
Maarten Vandeperre
Related topics:
Artificial intelligenceData ScienceIntegrationPlatform engineeringService MeshSystem Design
Related products:
Red Hat AIRed Hat OpenShift Service Mesh

Share:

    Deploying machine learning models has come a long way from manually uploading pickle files or calling Python scripts from cron jobs. Today, enterprise-grade AI must meet the same expectations as any other critical application--uptime, scalability, security, observability, and controlled rollout. As in many Kubernetes-native environments, one of the most effective ways to achieve that is by introducing a service mesh.

    At first glance, a service mesh might seem like overkill, yet another layer of abstraction that complicates your stack. But under the hood, it delivers the kind of foundational capabilities that make real-world AI feasible, not just flashy, especially when you’re dealing with hybrid cloud environments, large language models, or compliance-sensitive data.

    Let’s explore why that is, and how a service mesh becomes the secret weapon for production-grade AI.

    What is a service mesh?

    A service mesh is an infrastructure layer that manages how services communicate with each other in a microservice architecture. It transparently handles service-to-service communication, taking care of routing, security, observability, and resilience without requiring changes to application code. It typically works by injecting a lightweight proxy (called a sidecar) alongside each service instance, intercepting and managing all inbound and outbound traffic (Figure 1). This allows teams to enforce policies like mutual TLS, retries, timeouts, and circuit breaking centrally. 

    Service mesh also provides out-of-the-box telemetry, such as distributed tracing and detailed traffic metrics. It simplifies operations in complex, distributed environments, especially in Kubernetes. In AI deployments, it plays a critical role in securing model APIs, validating new model versions, and ensuring high availability.

    Service Mesh - Proxies injected into pods/containers
    Created by Maarten Vandeperre, License under Apache 2.0.
    Figure 1: A service mesh will take control over of all incoming and outgoing requests from applications by injecting proxies into the application pods.

    Model deployment strategies that work

    AI is inherently iterative. No matter how good your training metrics look, there’s no substitute for real-world testing. But how do you validate a new model in production without compromising the user experience or exposing customers to regressions?

    Here’s where service mesh shines.

    Imagine you're running a chatbot that uses a recommendation model. You’ve trained a new version that improves product suggestions based on customer behavior. But you don’t want to just flip a switch. With a service mesh, you can mirror traffic to this new model, sending the same user queries it would receive in production, but without affecting the real responses. This shadow mode testing gives you confidence before rollout.

    Alternatively, you might want to perform a canary deployment with only a small slice of users (or maybe a test suite) routed to the new model (e.g., based upon headers), say 5% of traffic from internal test accounts. If performance holds, you gradually increase the share. If something goes wrong (e.g., spike in latency, weird output, or infrastructure instability), you can roll back immediately without restarting pods or touching the application logic.

    These strategies are hard to implement manually. A service mesh lets you configure them declaratively, often with a single change in your traffic policy. The result? Safer model evolution, without sleepless nights.

    Service Mesh - Deployment patterns
    Created by Maarten Vandeperre, License under Apache 2.0.
    Figure 2: A service mesh enables different deployment strategies like mirroring, canary releases and blue/green deployments.

    Mirroring in a service mesh allows you to silently send real production traffic to a second service instance (e.g., a new model version or updated inference server) without impacting the user-facing application. While the mirrored service's responses are discarded, they can be fully logged and monitored. This makes it ideal for safely comparing models or server versions using real-world data and production infrastructure (i.e., even just a version bump of the inference server can already cause different LLM behavior). It helps validate improvements, detect regressions, or even benchmark resource efficiency (e.g., if the new model is faster or cheaper to run) without risking production stability.

    Figure 3 is an example of how different models can respond to the same prompt. We'll do the experiment through a Jupyter Notebook, but this is not always possible. You may not have access to the production data or up-to-date production data, and you should base your evaluation(s) on this. So we will use the following prompt to do our evaluation:

    Model validation through Jupyter Notebook - Experimenting, but not enterprise-grade
    Created by Maarten Vandeperre, License under Apache 2.0.
    Figure 3: Example prompt to evaluate two models against each other.

    Now that we have our prompt, we will validate the responses to see if both the results are good enough (Figure 4). If they are, then we could switch to the smaller model to reduce our resource consumption (i.e., costs).

    Model validation result - Different models can result in very different outcomes, validation of models is key (as continuous monitoring)
    Created by Maarten Vandeperre, License under Apache 2.0.
    Figure 4: Model validation result - Different models can result in very different outcomes, validation of models is key (as continuous monitoring).

    We performed this evaluation through Jupyter Notebooks, but the same process could happen when you apply the mirroring approach with a service mesh, which then has the added value of running it on a production system with up-to-date production data.

    Don’t let AI models crash your cluster

    Unlike static applications, AI models (e.g., especially foundation models like LLaMA or GPT derivatives) can demand enormous compute resources. One heavy inference request can max out your GPU, ten can bring your node to its knees.

    Service mesh introduces intelligent throttling mechanisms that prevent such self-inflicted denial-of-service scenarios (Figure 5). For example, you might define a policy that limits incoming requests to your LLM backend to 20 per second. Any burst beyond that is queued, rejected gracefully, or redirected.

    Consider a retail analytics dashboard powered by a backend model. Without throttling, a team-wide sales report on Black Friday could accidentally bring down the inference engine. With throttling enabled through the mesh, you ensure graceful degradation rather than full system failure.

    This is especially relevant when AI models are shared services across multiple teams (i.e., avoid so-called “internal DDoS-attacks”), and even more so when they’re exposed, directly or indirectly, to the internet.

    Service Mesh - Model throttling
    Created by Maarten Vandeperre, License under Apache 2.0.
    Figure 5: A service mesh enables throttling (i.e., rate limiting) on it's managed components/applications.

    Observability: Shedding light on the black box

    Machine learning models are often described as black boxes. That’s true for their decision logic, but it doesn’t have to be true for their infrastructure behavior.

    Service mesh provides out-of-the-box observability into service-level performance (Figure 6). This includes request counts, success/error rates, and latency percentiles. If your sentiment analysis model suddenly shows 30% more failed requests, or if its median response time increases from 200ms to 800ms, you’ll see it on your dashboard before users start complaining.

    Even more valuable is distributed tracing. When you chain multiple services together (e.g., a frontend → API gateway → vector search → model inference), you get a full trace of each request as it flows through the system. You can pinpoint bottlenecks, misrouted calls, or retry storms with precision.

    With this visibility, AI incidents become debuggable events, not mysteries.

    Service Mesh - Monitoring and observability
    Created by Maarten Vandeperre, License under Apache 2.0.
    Figure 6: When configured properly, a service mesh delivers monitoring and observability out of the box.

    Forcing traffic through safe paths

    AI security is a serious matter. Models trained on proprietary or personal data must be handled with care. You don’t want every service in your cluster to directly call the model. In fact, you might want to enforce architectural rules, like requiring all AI calls to pass through a centralized API layer where input validation, logging, and business rules live.

    A service mesh can enforce this communication topology at the network layer (Figure 7). You can design another architecture around Kafka, but this would be off-topic for this article.

    Suppose your public web frontend is compromised. If it can call the model directly, an attacker might be able to jailbreak the prompt or extract embeddings. But if your mesh is configured so that only the API (gateway) can talk to the model and only using mTLS, then the attack surface is significantly reduced. No bypasses. No shortcuts.

    In other words, the service mesh becomes your contract enforcer, not just a router.

    Service Mesh - Safe path (routing)
    Created by Maarten Vandeperre, License under Apache 2.0.
    Figure 7: A service mesh can enforce secure communication topology at the network layer.

    Building a network that heals

    In production, things will go wrong. Models will time out, APIs will crash, and nodes will reboot.

    Service mesh gives you resilience out of the box. If the model is slow or unavailable, you can automatically trigger retries or timeouts. If it fails repeatedly, the mesh can open a circuit breaker (i.e., stopping traffic to the faulty service and protecting the rest of your system from cascading failures), as shown in Figure 8.

    For example, if you’ve integrated an external ML API to enrich your user profiles, and suddenly, the service starts timing out, your app might hang or crash without a mesh. 

    With a mesh, you can detect this quickly and switch to a fallback path, log the event, and alert your team without a developer touching the code. This kind of graceful degradation is crucial in environments where user experience and uptime are non-negotiable.

    Service Mesh - Resilience
    Created by Maarten Vandeperre, License under Apache 2.0.
    Figure 8: When configured properly, a service mesh gives you resilience out of the box.

    Security without custom code

    One of the more under-appreciated benefits of service mesh is its ability to enforce zero-trust principles automatically.

    Every pod-to-pod communication is encrypted using mutual TLS, which means both the sender and receiver are authenticated, and the traffic is encrypted. You don’t have to write any TLS logic in your code. The mesh handles certificate issuance, rotation, and revocation behind the scenes, reducing the cognitive load on developers (Figure 9).

    In regulated industries like healthcare or finance, this can mean the difference between compliance and audit failure. In a world where LLMs are increasingly trained on or exposed to sensitive data, this level of encryption is essential.

    Service Mesh - Zero trust
    Created by Maarten Vandeperre, License under Apache 2.0.
    Figure 9: Every pod-to-pod communication can be encrypted using mutual TLS.

    Keep infrastructure out of the code

    As developers, our job is to build applications, not to wrangle retries, TLS certs, failover logic, or observability plumbing.

    With a service mesh, infrastructure concerns are moved to a centralized and standardized configuration, which GitOps approaches can handle (Figure 10). Developers can rely on platform engineers to set policies, enforce them centrally, and apply them consistently across services. No more reinventing the wheel, and no more inconsistencies between teams.

    This separation of concerns improves maintainability, speeds up onboarding, and reduces burnout.

    GitOps
    Created by Maarten Vandeperre, License under Apache 2.0.
    Figure 10: All application configuration and infrastructure definitions are described in config files within a version control system Git.

    Service mesh is the hero of AI in production

    AI isn’t just about training clever models. It’s about running them responsibly at scale. That includes security, reliability, testing, and performance. A service mesh delivers all of this by acting as a transparent layer of intelligence between your services. It offers the guardrails, observability, and deployment flexibility to turn fragile experiments into robust products.

    So the next time you're preparing to deploy a model, ask the following questions. Is it accurate? Is it secure, observable, and resilient? Is it enterprise-grade? Chances are, the answer starts with a service mesh.

    Check out our article, How Kafka improves agentic AI.

     

    Related Posts

    • Why service mesh and API management are better together

    • 4 steps to run an application under OpenShift Service Mesh

    • How to run a fraud detection AI model on RHEL CVMs

    Recent Posts

    • Create and enrich ServiceNow ITSM tickets with Ansible Automation Platform

    • Expand Model-as-a-Service for secure enterprise AI

    • OpenShift LACP bonding performance expectations

    • Build container images in CI/CD with Tekton and Buildpacks

    • How to deploy OpenShift AI & Service Mesh 3 on one cluster

    What’s up next?

    Learn how large language models (LLMs) are created and how to use Red Hat Enterprise Linux AI to experiment within an LLM in this hands-on learning path.

    Start the activity
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue