Practice this topic in a realistic system design interview
Service-to-service communication becomes hard to govern once a system has many independently deployed services.
Each service needs discovery, identity, encryption, authorization, timeouts, retries, metrics, and trace context. Implementing all of that inside every codebase creates duplicated logic and uneven behavior across languages and teams.
A service mesh moves a large part of that networking responsibility into the platform.
Applications still make normal HTTP, gRPC, or TCP calls. The mesh applies policy, records telemetry, and forwards traffic through a managed data path.
The goal is to govern east-west traffic consistently without reimplementing the same networking, security, and telemetry behavior in every service.
A service mesh is an infrastructure layer for managing communication between workloads.
It usually has two parts:
The mesh can provide:
The mesh does not remove the need for good application behavior. Services still need correct timeouts, idempotency, fallback semantics, schema compatibility, and domain-level authorization.
Without a mesh, each application owns its own networking behavior.
That model works for small systems. It becomes expensive when every language needs the same security policy, telemetry format, retry behavior, certificate rotation, and traffic rollout mechanism.
With a mesh, platform policy moves closer to the network path.
The result is a consistent enforcement point for concerns that should not be reimplemented differently in every service.