Operations engineers are concerned that as networks, services and operations shift to microservice architectures, scale issues with network and service performance need to be addressed.
How will microservice architectures scale?
Telecoms networks tend to take scale and performance requirements to maximum degrees. This means it can be difficult to marry microservice architectures that underpin clouds to the operations and performance needs of telecoms infrastructure. Where the IT infrastructure ends and the network begins is a relevant question today.
At their most atomic level, once networks, services and operations make the move to microservice architectures, they become different environments that must interact with each other either through messaging or via orchestration, or both.
There are plenty of opportunities for things to go wrong from a telecoms services perspective, even if microservices are functioning. The underlying IT infrastructure can go wrong too, which in turn impacts everything above – including, for example, the on-demand availability of 5G core network functions at enterprise scale. Underlying cloud infrastructure is becoming part of the service model, which means its performance – and the performance of its microservices architecture under load – has become relevant to CSP network operations and engineering teams.
One challenge, says TM Forum CTO George Glass, is that resource allocation and prioritization become very complex when networks move to microservices, particularly because telecoms services must meet varying service level agreements (SLAs) and key performance indicators (KPIs). “I need to work out many steps in the background about whether the resources are there, who gets them, how that is prioritized in case of conflict, and what happens if something fails,” says Glass. A best-effort-only approach will not work.
Consider orchestrators configured to repeatedly call for network slice resources when network services respond as “unavailable”. Rather than ceasing their calls, they all call more in the face of an underlying infrastructure failure. Suddenly, this cascading effect looks like a brute force or denial-of-service attack on the 5G core network. Though this is an extreme example, it’s the kind of failure that may occur if microservices-based networks are not designed with some context of services and customers in mind.
As it turns out, orchestration may not provide the complete answer to governing cloud and telecoms complexity together. “One thing is certain: you will fail in any scaling strategy if you think of the design as an orchestration,” says Maria Eugenia Armijo-Marchant, Content Platform Expert in the Office of the CTO at Telecom Argentina. In Armijo-Marchant’s experience, microservices must be designed to be as autonomous as possible, rather than “thinking that you can rely on one service to keep track of or conduct a bunch of other microservices” as an orchestrator might.
Instead, Armijo-Marchant argues in favor of an event-based approach where each microservice “is not keeping track of what others do” but rather are monitoring queues so they can do their job when it is needed.
She explains that events can be used as blocks to build a process and any task starts with an event, or “a task in a counter”, and finishes with events including failures and successes. In this approach, microservice messages are considered events, rather than performing a sort of end-to-end signaling function. If each message is treated as an event, there is no “need to keep track of some ‘result’ of the event” and “each microservice is just reacting – the most atomic action you can think of”, Armijo-Marchant says.
This event-centric approach also allows a CSP to design for failure. In microservice environments, failure could occur with any component, making it necessary to “design for failure by assuming you cannot rely on the availability of any member to be safe”, Armijo-Marchant says.
By using an event-based approach, each microservice only has a few tasks to perform – “one task per event” says Armijo-Marchant. In performing its tasks, a microservice may in turn produce one or more events. Everything, including a failure, is an event that can be reacted to immediately.
This is the approach services like Netflix and Spotify have proven, Armijo-Marchant argues. She adds, however, that “one main principle for any microservice is to make its own health transparent and available” and if something goes wrong “just replace it”.
Look out for our upcoming report on how CSPs are using microservices across their businesses and taking on design and scale challenges.