We’ve Got Answers: Upstream Providers and the Reality of SLAs

We’ve Got Answers: Upstream Providers and the Reality of SLAs post thumbnail image

Hosted by Angelique Medina and Archana Kesavan


Watch on YouTube – The Internet Report – Ep. 23: Sep 7 – Sep 13, 2020

This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. It was another quiet week on the Internet, so we wanted to spend some time answering your questions around some recent outages. Catch this episode as we discuss how you can understand the upstream relationships of the services you rely on to assess your risk profile. We also cover why SLAs fall short in protecting your business in the event of an outage, and why you need to proactively collaborate with your providers to solve issues faster.

Find us on:

Finally, don’t forget to leave a comment here or on Twitter, tagging @ThousandEyes and using the hashtag #TheInternetReport.

Catch up on past episodes of The Internet Report here.

Listen on Transistor – The Internet Report – Ep. 23: Sep 7 – Sep 13, 2020

ThousandEyes T-shirt Offer

Follow Along with the Transcript

Angelique Medina:

This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet and why. I’m Angelique Medina, and I’m joined by my cohost, Archana Kesavan.

Archana Kesavan:

Hey, guys.

Angelique Medina:

So, last week was a quiet week, and so we thought we’d take the opportunity to answer some of the more interesting questions that have come in. And one came in through Twitter, one came in on our YouTube channel, and so we’re just going to dive right in.

Angelique Medina:

So taking the first one, I think that one was related to the Level 3 outage that happened a couple of weeks back.

Archana Kesavan:

Right.

Angelique Medina:

And one of the things that we saw when we were looking at a few examples of how enterprises had been impacted was that one of the examples we looked at, this particular company, only had one active peer. So they did have two peers, but only one was active and the other one was passive. And so someone had posed, “How do I know whether or not my application that I rely on, a service that I rely on, is just using a single provider? Because I want to know what my risk profile is, and if something happens to their provider, are they going to go down?” So let’s maybe dive into some of the ways that you can figure that out.

Archana Kesavan:

Yeah. So I’m just going to share my screen right here. There are a couple of ways that you could do this, and we’ll take the example of this provider or this service that we went deeper as a part of the Level 3 outage. It was GoToMeeting, owned by LogMeIn. So, if you go to bgpview.io, you have the option to throw in their ASN numbers. So that can be the first place you could start. So let’s go back and see what this throws up. So this is the ASN of the service that you’re interested in, so finding out how many upstream providers they have and what the appearing ecosystem looks like.

Archana Kesavan:

I think the cool thing about this particular service, and this is a free service that we are looking at right now, is you can actually go down and investigate this from…

Angelique Medina:

Based on the prefix. Right?

Archana Kesavan:

So, the exact prefix that we were looking at as a part of the outage two weeks ago was 68.64.14.0/24. And then you click on routing. You can actually see these upstream providers in there. And I think the cool thing, and this is as we’re doing it today, and we see two providers here at GTT and Level 3. And the interesting thing when we went back… And obviously you can use ThousandEyes to understand this as well. And we went back in… This is right at the time of the outage, or actually right before the outage. You could see that this particular service that people are looking at has only one upstream or at least an active upstream, which is Level 3, but as of today, they have two upstream providers Level 3 and GTT. And that’s exactly why here, in this site, you’re able to see two providers.

Angelique Medina:

Yeah. So BGPView… One method to look it up. There’s other methods that and other tools that can be used. I think the nice thing that you can see here is that you get also that historical context, so you can see changes over time. You can understand maybe how they reacted in response to an incident, that there’s clearly some design behind the scene for this, and you can chart that out. So, that’s a nice thing here.

Archana Kesavan:

Especially if you are the actual provider of the service itself, and this view, get that you go back in time, see how you are performing, and any recoveries that you put in place is also taken into effect. But this is really cool. They learned from an unfortunate incident that happened a couple of weeks ago, and it’s really nice to see that now they have two upstream providers in here.

Angelique Medina:

Well, the other thing, too, is certainly if you’re the application provider, you can monitor yourself and you can see when there are issues, but if you have a critical service, and this goes back to the question of how can you tell… Yes, you can use this historical view. You can see that, but you can also alert on changes. So you suddenly see if there is a new peer that’s thrown into the mix. You can get an alert on that. You can check it out and see, “okay, what’s going on with my provider? What are they doing? They did they lose a peer? Did they gain one? What’s happening?” So that’s also a really nice-.

Archana Kesavan:

Definitely, yeah. That’s totally fair. I think the second question also relates to this whole who do you hold accountable? How do you recover? What happens once there’s an outage? Was around SLAs that we had… It was more in terms of what is the impact to SLAs? How are these penalty fees computed for different types of outages? And really the question was, “Who wins?”

Angelique Medina:

The answer is no one, no one wins.

Archana Kesavan:

No one wins, to be honest. Because even if you were able to prove to your provider that they did miss their SLAs, or prove to your service provider, not saying just an ISP. Any service provider. The damage that the outage might’ve caused really outweighs the penalty that you might even try to recover.

Angelique Medina:

Yeah. So, a lot of the providers… Let’s just take cloud providers. So, SLA contracts… Those can vary depending on who you are. So it’s hard to say what the outcome would be in some of these circumstances because they could be very individual. The other thing is that a lot of services don’t offer SLAs, so that’s not even something necessarily available to you. And even to your point, if there is an SLA and you are able to prove who was at fault, there’s not a lot of teeth to them. They’re not going to necessarily compensate you for the damage that’s been done to your own service, necessarily. And then the other thing to keep in mind, if we just use Level 3 outage as an example, is that it had such a massive impact that it affected a lot of service providers and a lot of enterprises. So if you relied on one of those service providers or an enterprise’s service, even if you weren’t Level 3’s customer, you have no recourse with Level 3.

Archana Kesavan:

Your digital delivery supply chain might have different providers who actually do have different providers in the mix, but SLAs are still a very siloed metric. So every provider looks at their own region, of sorts, and from their lens. And they’re like, “Okay, well, my service was up and available. If you couldn’t reach it, then that’s not my problem.” So how do you really impose an SLA there?

Angelique Medina:

Yeah. So that’s where it’s good, one, to think about it not from the standpoint of, “Okay, what can I recover?” But to be much more proactive about when you start to see issues, how do you then go about remediating it before you’re impacted? So not really thinking about this from a reactive, “I’ll just wait and then have my penalties paid to me.”

Archana Kesavan:

It becomes a very finger-pointing exercise, with “it’s your problems, so this happened,” and I think in this whole interconnected web that we are a part of, that approach doesn’t necessarily work for the good of the service or the larger good. You want to make sure that we’re to collaborate and get past the issue rather than go down the path of, “I have an SLA, so I don’t have to worry about this,” type of approach.

Angelique Medina:

Yeah. But then at the end of the day, to your point, you have to understand your whole ecosystem, how it’s all working, and be very active in managing that because it’s not really about something that you can do directly. To your point, it’s about collaboration, and it’s almost like governance, really managing your vendors as well. And their vendors is how you’re going to be able to ensure that you don’t need to worry about things like SLAs.

Archana Kesavan:

Sounds like utopia.

Angelique Medina:

So, that was a bit of a quick run through, a couple of the things that we thought were particularly interesting. I think that’s probably about it for this week.

Archana Kesavan:

Yeah. Those were the couple of questions that came over the last few weeks and yeah, it’s been a quiet week, which is great.

Angelique Medina:

Absolutely. All right. Well, that’s our show. If you do subscribe, and we recommend that you do, of course, you get a free t-shirt. So all you have to do is send an email to InternetReport@thousandeyes.com, and give us your address and your t-shirt size, and we’ll get that right over to you.

Archana Kesavan:

All right. See you next week.

Angelique Medina:

Until next time.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Post