17 Nov Zscaler and Microsoft Teams Media Optimisation
I just want to say this is largely an opinion piece based on what I’ve seen in the field with Microsoft Teams and Zscaler. Id love to see a whitepaper from Zscaler that covers the expected media establishment behaviour for Teams when Zscaler Internet Access is deployed but in all the years I’ve been exposed to the product I’ve found very little information from Zscaler covering what the expected behaviour for Zscaler should be in regards to Teams media call flows specifically.
The crux of this Op-Ed is based entirely on what i’ve observed through troubleshooting in that the Zscaler introduces sub-optimal behaviour to Teams call flows -specifically for establishing media- that do not take into the account the default behaviour of the Teams VOIP stack, namely the Interactive Connectivity Establishment protocol or ICE which determines the suitable path and candidate for establishing media with the called party.
Whilst Zscaler provide support for Skype for Business Online and its possible that SfBO is also affected by this same issue, Teams is a very different proposition in that A) SfBO has been deprecated and replaced with Teams and in time Teams will have a far greater footprint than SfBO for voice workloads, B) VOIP is a time sensitive workload that cannot be treated the same as any other Microsoft 365 workload and C) With the move to the Cloud ensuring a quality voice user experience for Teams is essential to adoption. To that end Teams call flows and the default behaviour of its media stack must be taken into account when designing any solution and especially more so if Enterprise Voice/PSTN is involved.
Secure Web Gateway solutions have become a necessary part of delivering security for Cloud applications. The network no longer represents the security boundary or perimeter as rather than the application workloads being delivered from your datacentre they’re now delivered from the Cloud. This means the security boundary has also moved to the Cloud which renders the traditional network perimeter redundant. This also means the user identity is now a core part of the security boundary. Zscaler helps complete the Cloud security story by securing the internet perimeter by using a ‘Zero Trust’ security model which helps augment the native capabilities within the M365 Security stack which are largely focused on Identity and Endpoint security.
Zscaler provides deep integration into Microsoft 365. To ensure traffic is delivered as rapidly as possible into Microsoft 365 Zscaler peers with Microsoft 365 Datacentres in a number of locations globally usually with very low round trip times of only a few milliseconds.
This does imply there may be some coverage gaps as Zscaler doesn’t have a presence in or near every single Microsoft datacentre where Teams media services are delivered from.
As a rule Microsoft prefer users to connect to the Internet as via the shortest path possible. This generally requires local internet egress, and localised DNS resolution. Any Internet Proxy in the network path needs to be configured to bypass M365 traffic so that A) The traffic isn’t subject to any inspection and B) to prevent traffic being back-hauled via a Central Egress that may not be in the same geography as the user. This way traffic would then hit the nearest Azure ‘Front Door’ or ‘POP’ to that user ensuring the best quality of service and experience for that user as traffic stays within region. Zscaler encourages and supports this direct internet connectivity model due to the ubiquity of the Azure CDN POP locations which are distributed all over globe.
M365 Connectivity guidance can be found here. This guide covers a number of reasons why proxies should absolutely be avoided for M365 workloads. If bypassing a proxy is not possible Microsoft do provide guidance here on the implications to quality of experience and the additional necessary proxy configuration required to allow UDP traffic.
Allowing UDP is an important consideration, and even more so for VOIP which i will expand on later in this post.
Teams media default behaviour
Media establishment in the VOIP world is somewhat analogous to the network OSPF protocol in that the media expects to take the “Open Shortest Path First”. Call flows with multiple lengthy network hops can introduce delay, jitter and latency into a call all of which have a direct impact on the user. Teams operates on this same premise and during call setup will always attempt to establish media via the shortest network hop. This means media should always flow point to point where possible and using a Cloud Teams media relay is always the last resort. This process is by design within the ICE protocol where a candidate list of IP addresses are returned to the Callers client and ICE then selects the best candidate to connect to. ICE is even capable of using NAT traversal to ensure the call is established. Essentially ICE will always try its best by whatever available connectivity method and transport protocol to establish media and if media cannot be established the call will fail.
It’s important to note that media can be established in a variety of different ways and will depend on the type of call, whether the user is off or on net, and if any network devices are preventing successful media establishment with a preferred candidate.
To illustrate this better I’ve detailed Teams P2P and Multi-Party media behaviour and how media will expect to flow when ICE is unimpinged by proxies, firewalls or any other network devices that sit in the call flow.
P2P On-Net user
As both users are on the LAN and a network path between them exists media is established point to point. The teams client signals up to the Teams service in the Cloud over the internet, Teams then provides the candidate list to “Teams User A”, and as the ‘host’ candidate (Team User B IP address) is always the preferred candidate in any Teams call media is established with the host candidate.
Note: – Teams will always expect to use UDP to send media as UDP is a connectionless protocol which offers the best user experience over its counterpart TCP. TCP is only ever used if UDP is unavailable, EG UDP blocked by an upstream network device like a proxy or a firewall.
P2P Off-Net user
Both users are off-net therefore no direct connectivity exists between the two users to establish media. The most direct path is still p2p though so Teams selects the ‘Peer Reflexive’ candidate which is essentially the NAT client IP address and media is then establish through NAT via each users firewall/router.
P2P On or Off-Net user using media relay
Teams User B is behind a firewall. So media cannot be established over the WAN to User B. Therefore the Teams Transport Relay is used to establish the call. In this scenario the ‘Relay’ or ‘Server Reflexive’ Candidate is used to establish media via the Internet leveraging STUN or TURN. STUN and TURN are NAT traversal techniques the former originally known as “Simple Traversal of UDP through NAT”.
The Teams Transport Relay is analogous to the Skype for Business Edge server role that also provides media relay functionality for Skype for Business users connecting to a remote called party, EG an Off-net user. Transport Relays can be used whether a user is on-net or off-net and their use is largely determined by lack of direct connectivity to the called party.
Multi-Party On or Off-Net user
As Teams is in the Cloud all users regardless of their location have to traverse the internet for the media stream delivered from the MCU to their Teams client.
As explained by the diagrams above the order of candidate connectivity is always:-
- Host Candidate
- Peer Reflexive
- Server Reflexive
(The first two host and peer are direct connections between the A and B party and the lata two are via a media or transport relay).
Likewise UDP is always the preferred protocol for media over TCP in Teams. This matters because “UDP is a protocol optimised for getting data packets to their destination in a timely fashion; it’s designed for real-time services like VoIP where it’s important to keep the data stream going”. If packets have to be re-transmitted -as would be the case with network latency on call using TCP- this could cause unacceptable levels of jitter for the end user who would hear garbled audio being replayed into their headset. TCP is not designed for real time traffic nor delay and is only concerned about the accuracy of delivery of packets.
In almost every case TCP should always be avoided for VOIP as its sub-optimal for real time communication.
Taking the above into account regarding Teams designed behaviour for media establishment this now brings me to the problem. On a number of deployments I’ve seen Zscaler proxying media. Whilst i absolutely expect it to be involved in call setup as the Zscaler is responsible for authorisation to M365 for Teams, it obviously won’t be inspecting any of the signalling traffic as its already encrypted with TLS, but i’m not expecting media to be pinned via a Zscaler POP and then onwards to Teams. I see no beneficial reason why the Zscaler is in the media path and its impact and behaviour is particularly worrying for Enterprise Voice deployments where dial tone is expected to be a reliable service.
This scenario is problematic for a few reasons:-
- The call path is lengthened if a Zscaler POP is not near the Teams service which could introduce additional latency or jitter, which is contrary to ICE designed behaviour.
- The best path for media is always the most direct or shortest path, which means the Teams client establishing media directly with the called party and lastly via a Transport Relay if direct connectivity isn’t possible.
- Media is encrypted using SRTP so there is no added benefit with Zscaler inserted in the media path.
- VOIP is moving to the Cloud en masse. Teams supports Direct Routing with Cloud hosted SBC’s. Therefore its very important that media is optimised as best as possible and especially so as QoS is not available over the Internet. Any network configuration introducing additional delay could impact adoption as well as the end user experience.
- Because Zscaler is still effectively proxying media either through its proxy or its inline fire-walling its using TCP instead of UDP. This is fine for most M365 workloads but its sub-optimal for Teams media specifically.
- Media Bypass could be impacted. MB is designed to shorten the call path and reduce transcoding or media processing overhead. As media is always pinned at the Zscaler POP MB may not be possible.
The call flow below could become a very popular deployment architecture as more and more customers are deploying their voice workloads into the Cloud as well as increasing uptake in Teams and Zscaler, Zscaler almost being mandatory for M365. Enterprise Voice is a complicated enough workload and any impact on voice has to be understood prior to deployment.
Some questions come to mind:
- How are Teams Phones impacted that are leveraging DR that need to access the internet via Zscaler as they wont be able to run the ZApp client?
- Whats the impact for VTC’s such as Microsoft Teams Room (MTRs) or Surface Hubs? Is installing the ZApp a supported configuration?
- How does Zscaler affect Media Bypass if MB is required?
- Is Maintenance of session state for established calls possible during a Zscaler outage?
- Is a Zscaler configuration possible that removes Zscaler from media establishment entirely?
Secure Web Gateways are a vital part to any M365 deployment. Zscaler is easily the best SWG available in my opinion due to its deep integration into Microsoft 365 such as Microsofts Cloud App Security (MCAS) – Cloud App Security Broker (CASB) which helps to increase Cloud Security. But real time communication is also part of the Cloud journey now. Hopefully in time answers will available to the many questions i have for Zscaler and Teams media optimisation.