Dictionary definitions of recovery suggest many things, from salvage to sports. The latter always gives good value for analogies, don’t you think? How about a recovery stroke in golf; playing from the rough to the fairway (or a bunker to the green) – hands up if you’re familiar with that one.
Recovery also covers recuperation (as in convalescence) in addition to repossession and retrieval. In disaster planning – getting closer to technology and SIP call recovery – recovery means the steps to be taken to return all operations and systems to their normal status. In electronic commerce – getting closer still – it’s the ability of a system to be restored so that processing can resume and transactions, aborted due to a failure, can be resubmitted.
Recovery is also the process of regaining possession or control of something that’s been lost; like our SIP call. But has it been lost? Has control over the call been lost? Is it like Dan Quayle saying, “This President is going to lead us out of this recovery.”?
Here’s the scenario…
A gateway is used to interconnect emergency calls between a telco’s SS7 (signalling system #7) network and an ESInet (that’s an emergency services Internet according to the National Emergency Number Association – NENA) to which are connected SIP-based PSAPs (public safety answering points). Calls are made by users from legacy, wired and wireless networks, via the carrier’s central office switch, and the specialised gateway directs them to SIP-based location and routing elements in the ESInet so that the calls can be directed to the appropriate PSAP. With an established call in progress, the SS7-to-SIP gateway is in the middle, and a caller is talking to a PSAP call taker.
In our hypothetical setting, sometime during the call, the RTP (real-time protocol) voice media gets interrupted – for some reason (it doesn’t matter what). Is the call lost; has either the gateway or the SIP end point lost control over the call; can it be recovered?
The call is only lost if one party hangs up. The call taker will not hang up and NENA’s next generation, i3 specification mandates that the SIP-based equipment, receiving and onwards routing the call from the gateway, should not force the call to stop. The caller is likely to hang up if (s)he thinks there’s nobody listening at the other end, but that will happen only after a time measured in seconds and a few, panicky attempts at “Hello! Hello! Are you still there?”
To avoid the caller hanging up and exacerbating an already fraught situation by forcing a redial, the gateway has to take responsibility. Its task is to deal with the loss of RTP, between it and the other SIP end point, within milliseconds. It’s important for the gateway to play its part before the expiry of time out #1 (that’s ‘caller time out’ to you). It’s not acceptable for the gateway to wait until the caller hangs up – after all, nobody wants the caller to suffer the ultimate time out, because (s)he wasn’t able to reconnect. It’s vital to ensure connection to the caller is maintained.
But at this precise point, the call hasn’t been lost. It hasn’t failed, nor has it been aborted and – importantly, the gateway is still in control of the call; both the SS7 leg and the SIP leg. Crucially, the outgoing SIP call monitoring functionality in the gateway has detected that the RTP stream received from the remote SIP end point, has ceased. The onus is on the gateway to automatically ‘recover’ the call by releasing the failed SIP call leg and establishing a replacement call to an alternative SIP application server. Needless to say, the time that must elapse without receiving an RTP packet from the remote end, before call recovery is invoked, is configurable in the gateway.
In a well-engineered implementation, the IP-SR (IP-based selective router) or ACD (automatic call distribution) equipment receiving SIP calls from the gateway will have multiple elements. That will be for reasons of both redundancy and scalability (i.e., capacity). Those additional servers present the gateway with one (or more) ‘secondary’ SIP servers to which to direct the ‘recovery’ call. Using the original call attributes, a SIP header in the INVITE (it’s not a RE-INVITE) enables the alternative server to identify both calls and be able to route the ‘recovery’ call back to the original call taker – within time out #1.
If there is more than one alternative SIP server, the gateway is able – a bit like Robert the Bruce and his mythical spider – to try and retry again until it succeeds with the ‘recovery’ call. If we overturned and misappropriated Dollo’s Law – in defiance of nature – we’d say the gateway was able to automatically retrieve and return (restore or re-establish) our SIP call to its previous status. That may not be as good as recovering lost youth, but in terms of preserving the quality of life, it’s infinitely preferable and, unlike some plastic surgery – it’s affordable.