Traffic Director & Envoy-Based L7 ILB for Production-Grade Service Mesh & Istio (Cloud Next ’19)


PRAJAKTA JOSHI: Hello, everyone. It’s a pleasure to be here
at Next with all of you. I’m Prajakta. I’m a product manager
in cloud networking. MIKE COLUMBUS: Yeah,
thanks for being here. My name’s Mike Columbus. I manage a specialist team
of networking professionals that help customers like
yourselves on their journey to Google Cloud. PRAJAKTA JOSHI: So
what does it really take to build and manage
services in today’s world? The world is hybrid. The world is multi-cloud. Customers have hybrid and
multi-cloud deployments because they want
higher availability, they want to mix and match the
best of breed solutions from each of the clouds, or
because they cannot move all of their services to cloud. Now, there’s no doubt that
hybrid and multi-cloud deployments are here to stay. But one of the questions we
hear all the time from customers is how can we go and manage
these deployments seamlessly without all of the toil? One of the big
trends that we see is the move from monoliths
to microservices. And they’re moving
to them for agility and then also for manageability. But imagine you had a
monolith, you took it, you chopped it up into
120 microservices. How are you going to deploy,
secure, connect, manage these microservices? What you really need is a way to
create new services uniformly. You also need services
management infrastructure, which you obviously don’t want
to create, build, or manage. And you need security,
resiliency, and observability for your services. And that is where
Service Mesh comes in. Now, Service Mesh is a
very powerful abstraction that’s becoming
increasingly popular to deliver
microservices and also for multi-cloud applications. So think a few years back. Like, what exactly
is Service Mesh? Think of boxes, appliances
a few years back. You had all of these
complex network appliances. What did you do to them? Then came software-defined
networking. And you took this complex
box, you dis-aggregated it. You created a very
simple forwarding plane. And you moved all
of the complexity up to the control plane. And then these
simple data planes were controlled by
that control plane. Think of Service Mesh as
exactly SDN for services. Now, what are you doing here? You’re taking a
complex application. You’re removing all of the
networking code from it. You’re putting a lot of
it as a sidecar proxy. And then you need
something to go manage all of these sidecar proxies. And that is where the control
plane of the Service Mesh comes into the picture. So let’s take a closer
look at the Service Mesh. What is a mesh? Essentially, a
mesh abstracts away the network from
your application so your individual services
or the apps you write don’t necessarily have to
be aware of the network. They only know about
their local sidecar proxy. Now, each proxy obviously has
to be configured and managed. And so that’s where the Service
Mesh control plane comes in. It takes all of these
distributed proxies and configures them
into something larger. And then you get visibility. You get resiliency. You get traffic control
security and observability. So that is actually the value
add of your Service Mesh. Think of the data plane. We talked about a sidecar proxy. One of the most popular
sidecar proxies is Envoy. It was built at Lyft. It is a very high
performance proxy. It has [? eventually ?]
consistent service discovery. And the reason we like
is it’s extensible. And it has a protocol
called xDS v2, which it uses between the
control plane and the proxy. And so you could use
any other proxy that supports this xDS protocol. And therefore, you can use
Envoy that Istio provides. You can use on Envoy
that is your own. Or you could use
another proxy that is compliant with
this xDS v2 protocol. So it’s fairly–
it doesn’t lock you in into any specific proxy
or even any specific control plane. So the biggest benefit
of Service Mesh is that it decouples
development from operations. And so as a developer,
you don’t need to worry about writing
and maintaining policies and the networking code
inside your application at all. One of the really nice
things about Service Mesh is it doesn’t make
any assumptions. Now, it doesn’t make
assumptions whether your service is in cloud, on-prem. It doesn’t make assumptions
about whether your service is a VM-based service, a
container-based service. So it assumes things
are heterogeneous. It assumes they are spread out. And that is what it affords
you in that it gives you this unifying abstraction to
create services and obviously manage them at scale. So one of the most popular
Service Mesh technologies, which is an open
source, is Istio. What is Istio? It’s essentially giving
you three main things– one is advanced networking
capabilities and routing capabilities– but without you
having to write that code– enhanced security,
and it is also giving you a lot
of observability into what is happening as
your services communicate with each other. Let’s take a closer
look at Istio. Now, look at the data plane. That is where you
have your service. Sitting next to it
is a sidecar proxy. And then on the top
is the control plane. So it has three main pieces– pilot, mixer, and then there’s
the Istio security bit. Pilot is going to give
Envoys the configuration. It’s going to do
the load balancing. And then it’s also going
to do service discovery. What you have on
the security side is things such as mutual
TLS between your services. And then observability lets you
map out all of your services along with the traffic flow. And then other things
that are happening is your services
communicate with each other. Now, one of the other
things that you would also need for your Service
Mesh is you need something that guards your entry point. And that is Istio
gateway or your ingress. And now, this is where you would
essentially deploy something like a global load
balancer in front. The reason is that you
need the DDoS defense. You would put your Istio
gateway on instances behind. And then that becomes your
entry point to the mesh. So if you think of
it, you’ve taken all of the traffic routing,
all of the customization, all of the programmability right
to the edge of your service mesh with this one. One of the key pieces
of Istio is pilot. This is the one that
goes and manages and configures all
of the Envoy proxies. It is the one that
knows about all of the other Envoy instances. It’s the one that provides
service discovery. And it also delivers
advanced traffic control. Now, our customers
love Service Mesh. But they always say, we
don’t want to go and build these control planes. We don’t want to
go manage pilot. And so what did we do? If you weren’t there in
the keynote this morning, we launched Traffic Director–
so you are one of the first people to hear about
it after that launch– our new GCP-managed service
for the Service Mesh. And what it really
does is it helps you power up service meshes
of all types, of all sizes. You get global load
balancing, which is huge. And we’ll talk about
it a little bit more. You get centralized
health checking. So there is somebody that’s
offloading your proxies from health checking every
other application instance that is in your network. You also get traffic
driven autoscaling. So as your traffic grows, your
services scale up and down. And then you get traffic control
capabilities like canaries, and so on that we’ll talk
about a little bit more. One of the nice things
is Traffic Director is GCP-managed,
which means you get a production grade [INAUDIBLE]. And so if something happens,
you can call our support. You don’t have to worry
about it yourself. So if you look at this,
you see Traffic Director in the control plane. It’s managing the
sidecar proxies. One of the nice things
about Traffic Director is it works out of the box
for VMs and containers. So if you look at
the service here, imagine that your front-end
service is essentially self-managed docker. Your shopping cart
services deployed in VMs because you possibly had
that service from before. And then your payment
service could be again in VMs or it could be a GKE service. And so what this means
is you have a unified way to do traffic management
for any type of service without having to have to
have a separate implementation or a solution for each
one of these types. Now, a lot of you here
are likely our global load balancing customers. And one of the questions
we would always get is when are you bringing
global load balancing to microservices that are
internal, not external facing? And that is where
Traffic Director is going to bring
global load balancing to your internal microservices
and to the Service Mesh. And to take a closer
look at it, I’d like to invite Mike to
give you a deep dive into the global load
balancing aspect. MIKE COLUMBUS: Thanks, Prajakta. Good job. So in the next
section, I’m going to talk about global load
balancing with Traffic Director and also have the pleasure of
doing a few live demos in front of my new friends here. So no pressure there. And then finally, we’ll
take a deeper look at what happened under the
covers when I ran these demos. We’re going to taste
a few different views into the environment. So it should be fun. But before we get
to that, I think it’s really important to
understand the data model that Traffic Director uses. So I’m sure a lot of
folks in the audience have deployed a load balancer
on Google Cloud Platform. This data model
should not look– it should look similar, right? So the first thing that you need
is a global forwarding role. This consists of an IP,
a port, and a protocol. And what’s a little
bit different about this global
forwarding role is that the IP only
has significance to the services when they try
to talk to other services inside of the mesh. The IP can really
be anything, right? And I’ll get into
a little bit more about how that works later. So think of it this way. For every service inside the
mesh that needs a unique VIP, you’re going to need
a global forwarding role for that service. Global forwarding rules
point to target HTTP proxies. A good way to think about these
are the actual service proxies inside of your mesh, right? So this is really powerful. These proxies are the ones that
are getting traffic redirected to it and creating the
policy that’s distributed throughout the mesh. So within a target HTTP
proxy, think of this as really just a config pointer. And in this target HTTP proxy,
you’ll configure a URL map. The URL map has been
enhanced to allow for things like rule
matches and rule, so really powerful things
that provide granular traffic routing policies on top of just
looking at the host of URL path as part of the L7 mesh. The URL map points to
your back end service. This back end service is global. And this is where you’re going
to configure things like health checking, along with
some enhancements that are part of this launch,
which are traffic policies that Prajakta will cover later. In each back end service, you
can configure your back ends. These can consist of managed
instance groups for VMs, or network end point groups
for containers or GKE. So what’s really cool
about the solution is you don’t have to be running
containers or Kubernetes to take advantage
of this, as long as you can install a
service proxy on your VM, or on your pod, or
on your container, you can leverage a
lot of the power that comes from this platform. All right. So here we can see a high level
view of the Traffic Director service mesh. We’ve shown this a few times. It’s also what maps to our
demo, so it’s pretty significant here. So what we have here
is in a web service that’s behind our
global load balancer. So we have two back end
managed instance groups. They happen to be running
Docker containers, but they’re managed
instance groups of VMs deployed in US
central and Asia southeast. So depending on where
that client hits our edge, our front end network
is going to route them to the optimal healthy back
end with available capacity. From there, we
have a cart service and a payment service that
is mirrored in each region. So what’s really cool
about Traffic Director is we have a global view of
available service endpoints. And our infrastructure
keeps a map of what is the RTT
between these end points. And the default policy
is that we’re always going to route traffic to
the healthiest endpoint with available serving capacity. So we’ll start with
the closest zone and fail over to regions and
redirect traffic accordingly. Now, I want to
take a minute just to contrast this
architecture with how you might do this in a
traditional three tier app. You’re going to have some
sort of middle edge proxy load balancer and you’re going
to configure it on a hop by hop basis. With Traffic
Director, think of it as embedding a client side
load balancer at every service, and the policy is also
embedded, as a sidecar, directly on that service. So once traffic gets
directed to the proxy, the proxy makes a really
intelligent routing decision for the next endpoint. And that’s a direct
communication, without having to funnel
traffic through something else to make a load
balancing decision. So here we can see the
view of Maya in California. I guess I’m Maya today, but
I’ll be running the demo here from the Moscone Center. So under normal
circumstances, I’m going to hit that the
global load balancer. That’s going to direct
me to the web front end. The web front end is going
to call the cart service, and then that’s going to
call the payment service. So we’re going to buy
a lot of things today. But under normal
circumstances, default policy, it’s going to route me as
such, through US central. For Shen, in Singapore– and I have a client that’s going
to emulate a Singapore client– I’m going to be routed
through the US– I’m sorry, the
Asia-southeast1 web service, and then throughout the
mesh in that region. So what we’re also going to do,
and this animation shows this, is that we’re going to fail
the US-central1 cart service and observe the behavior. So because Traffic
Director has global view, and tries to route things
optimally, once the US-central1 service fails, you’re going
to see that we get almost instantaneously directed
to the healthy capacity in Asia-southeast1. And then we’ll fail
the server back, and we’ll take a look at this. So if we can switch
to the podium. Very good. So here you can see we have a
Traffic Director demo project. This is where the
environment’s contained. And I want to show
a few things, so I can prove to you that this
is a real environment. So the first thing I want to
look at is our instance groups. So we have a few. The first, you can see, is
that we have two web back ends. These are back ends behind
the global load balancer. This is our web service, and
we have two managed instance groups of one instance. I don’t recommend doing
a managed instance group of one
instance, but I guess it allows us to be fairly frugal
for this demo environment. Next, I’ll call out
the cart service, which is this managed
instance group here. So that’s running as a
VM managed instance group with Envoy installed on each
in this instance group VM. Finally we have our
payment service. This is running on
Container Engine, or GKE. So we have a deployment,
which is our V1. And later in this
presentation, we’ll show a traffic splitting
canary style release, but that’s running on GKE. And we can jump over into
our Kubernetes engine, and we can take a look
at these workloads, or these deployments. Now these manifests have
an annotation on them that says “Use network
endpoint groups.” So if you’re not familiar with– I know I described this
a little bit earlier, but if you’re not familiar
with what network endpoint groups are, think
of it as a port and IP pair that allows us to
directly target the container, rather than the node. And why this is
important is, if you’re familiar with how
Kubernetes works, there’s something called
the kube-proxy that’s almost like a second hop load
balancer, that would then select a pod behind
that service. This allows us to directly
target the pods themselves. And there’s a controller
that gets spun up. So as pods are scaled up
or down, or they go away, or new services
are attached, that is going to program the endpoint
awareness for Traffic Director to know where to send traffic. So a little bit
more on that later. And finally, before we jump
in and start buying things, we have our network services. We have Traffic
Director in the UI. So here, you can see
that we have our cart and our payment
back end services. And they each– thank
god, they’re healthy. But they each have a back
end in Asia and US central. So that’s the environment. So what I’ll do first is– this is a local client. This is me, Mike, in California. And you can see that this front
end is served by US-central1. So the global load balancer
directed us to the web service in US-central1, which is good. The other thing you might note
is that this domain is secured. Just a little side
note, this is using a Google Cloud managed cert
on our global load balancer. It’s also using
Identity-Aware Proxy. So if you’re trying
to get to the site, while I trust
everyone here, I can’t have you stopping the services. So we’re just going
to leave it at that. So normal connection here. I’m on the web front end. Let’s say we want to
buy a $300 popsicle. Kind of crazy. We’ll add this to our cart. And you can see now this
changed to say this is the cart service in US central. And then we’ll
confirm the purchase, and we can enter our
shipping information. But it’s interesting, I
don’t see any shipping price being calculated here. Wouldn’t it be cool
if we had that? So it’s a little bit of a
tease to our V2 of payments. The next view we’ll take,
let’s buy something as Shen, in Singapore. So you can see that
Shen was directed– I can refresh this
so you don’t think– So Shen was directed to
Asia southeast region, being a Singapore client. We can get some green slip-ons. And we’re directed to the card
service in Asia-southeast1. And, as you might
guess, also the payment service in Asia-southeast1. But what would happen
if we failed a service? So what I’m going to do
is stop the cart service in US-central1. So this is not actually
like turning down the VM. It’s a managed
instance group of one. But we’re basically
telling the service to now return 500
errors that are going to fail the health check. So now let’s see what happens. So let’s go back to our– this is me, connecting locally. So we’ll go to
td.gcpnetworking.com. All right. So we’re still going
to the web front end. This is, again, the global
load balancer directing us to the web. Let’s buy a clock, and
we’ll add it to our cart. And you’ll notice, this is
now served by Asia southeast. So Traffic Director
had no local capacity. It’s a managed
instance group of one. If it had something in another
zone, it would have used that, but we want to make
this more dramatic and fail across region. So we did that. And when we confirm
the purchase, it stays in Asia, because
the payment service is the closest next hop, or
endpoint, for the cart service. So the final thing
we’ll do here is let’s start our cart service back up. This will now allow
the cart service to pass the health
check, and hopefully we see everything
stick in US central. So let’s buy something else. Anyone have any preferences? Goldfish bowl? What was it? Radio? I heard radio first. Where’s radio? Oh, yeah. It’s a bargain at $300. All right. We’ll add this to our cart. And the cart, I brushed by
it, but it was US central, and we stuck on the US central
for the payment service. So if we can switch
back to the slides, I just want to cover
what happened there. Oh, cool. So end point awareness. How does Traffic
Director and associated– these service proxies
know where to go? The way this works for
VMs is we have something called Back End Manager. Back End Manager is
informing Traffic Director of endpoint awareness,
and then that’s getting sent to the
relevant service proxies so they know where
everything lives. I described this a little bit
for Kubernetes and our GKE. The network endpoint
group controller is talking to Traffic
Director as an annotation, as part of the
deployment manifest. And that’s how we get
end point awareness and we know where the pods are. So what happened when
something failed– when I failed that service,
the US-central1 card service? We’re all checking
our end points as a service, which
is very powerful. And then once the shopping
cart service went down, again, Traffic Director removed,
within the service mesh, that path– the shopping cart end point
for US central for that path. So we’re creating this
L7 overlay network here, which is really cool. So from the data
plane, what happened? Again, this is
client side decision, so I’m going to show this
from the context of I came in from the global load
balancer, terminated on the web service in US-central1. And via control plane, we
know that that endpoint, the healthiest endpoint, the
cart service near central one was unhealthy. So we’re going to look at this
from the perspective of the web service. So the way this works,
we have our VM here. We have our proxy
installed, and that’s got a persistent connection
to Traffic Director. And the proxy is listening on
port 15001 on the local host. And then the web service
is bound on port 80 for the actual IP
address of the VM. So the web service wants to
talk to the cart service. Doesn’t know that any service
is down, it just has a fail. This Is, again, the
global forwarding rule. So the app– I’m sorry. The app, the web service,
requests the cart service, which is this VIP
that I configured. And then there’s IP tables
rules that configures netfilter to say anything– either everything
that goes outbound, or specific to the
VIPs in my service, gets redirected as part
of the output chain to the service proxy. And, again, the service
proxy has a configuration. It intercepts the
request, and now it can apply the policy
that was configured as part of Traffic Director,
which targets the endpoint. So it knows all the– it’s targeting the
healthiest endpoint, which actually sends that
traffic over our global VPC to Asia-southeast1 endpoint. So I’ll hand it
back to Prajakta. PRAJAKTA JOSHI: Thank you, Mike. So now– [APPLAUSE] MIKE COLUMBUS: I wondered
if I was going to get– PRAJAKTA JOSHI: So
now a quick warning. We’re going to take
you through a whirlwind of different features. We obviously won’t have enough
time to spend on each of them, but we wanted you
to know about them. And then feel free to get a
hold of us after the talk. So one of the most interesting
features that Traffic Director provides is traffic control. If many of you have
used Envoy, you’ve already used these features. So let’s take a quick look. The biggest thing about– before, you had to actually go
architect all of these traffic control capabilities into
either your application or some system. Here, you get all of that
without doing anything to your application. So what– if you think of
how you configure this, think that there are two buckets. One is that routing rules, and
one is the traffic policies. Routing rules look at the
client traffic that’s coming in, and then they figure
out where to send it and what to do with it. The traffic policies are more
about your service that’s being accessed or
being connected to. And so what does
that service want? That’s what you express
through traffic policy. So let’s go a little bit
deeper into the routing rules. Again, reminding you
about the data model that Mike showed,
which is you have the URL map, that’s where
your routing rules reside. And then you have this notion
of a backend service in GCP. And that’s where
your traffic policies reside, because they going and
apply to your back end service. So what kind of things
would you want to do? One of the things
that you’d want to do is, essentially, imagine
you have a version A running of your service. And you decided to roll
out 2.0 in staging. And you want to
actually split traffic, because you want to see how
the other new version is doing. That’s what you
can seamlessly do with Traffic Director
and the Envoy with what we call
traffic splitting. So you simply put weights,
like you see here, and I can split the traffic. Another interesting thing you
can do is traffic steering. So imagine that in
here, you’re looking at certain characteristics
of your incoming traffic and saying, if the
user agent is Android, send it to the back end
service that’s actually dealing with Android stuff, and
if the user agent is iPhone, send it to the
back end instances in the service that’s dealing
with the iPhone stuff. And you can do
very rich matches. You can also take a
bunch of actions– traffic splitting, redirects. You can do URL rewrites, and
then several other things as well. Another very interesting
thing is fault injection. So as you write
out your service, you do want to test the
resiliency of your service. And you want to test it against
things like delays, aborts, and so on. And so, as a part of the fault
injection, what happens here is that the client,
when it sends a request to the
backend service, there’s a delay that can be
introduced by the Traffic Director, say on a
percentage of your requests that are getting sent
to the backend service. And you can also send– have certain percentage
of your requests going there to be aborted. So in this way, you can actually
test the resiliency of your app without really building out
any of this infrastructure yourself. This is one of my personal
favorite features, which is mirroring. So what you can do here is
imagine you have your main app. You can create a shadow app. And then as the real traffic
comes to your main app, you can shadow all of that
traffic to your shadow app. And then the shadow app treats
this as fire and forget, which means it gets the
response, but it’s not going to respond back to it. So it’s a very nice way to
actually go and test out a bunch of things,
such as binaries with production traffic. Or you can debug errors that are
happening in production using the shadow debug service. This is– I really love
this feature, personally. Now let’s come to the
second part of it. And there are actually
several other features that I didn’t cover
in the routing rules, but coming to traffic policies. Now this is the backend
service telling the client, here is how I’d
like you to behave, or here is how the
traffic should be load balanced across my instances. And so it’s from the perspective
of a backend service. So each of these
policies are obviously specified for backend service. Now let’s take an example
of one such policy, which is load balancing. So Mike showed you the
global load balancing. So initially, he showed
the global load balancing, which Traffic Director
and our global systems used to figure out which
is the zone and region to send your traffic to. Now, once it lands there, you
will have multiple instances there. And so that’s when the second
tier of your load balancing kicks in. And then it figures
out which is the best, or the optimal, instance
to use for your traffic. So you can see one and two, and
that’s what it is referring to. In the second stage,
essentially you have several different
LB algorithms that you can turn on. So things like round robin,
lease request, ring hash, random, and several
others, as well. And you can turn
on session affinity if that’s important to your app. So circuit breakers. How many of you here are
familiar with circuit breaking? I think it is a topic
we could all talk about for an entire day, but my aha
moment about circuit breaking came when I was talking to
Georgie, who is a distinguished engineer at Walmart. And he said there are
several circuit breaking solutions that exist. But one of the really
nice things about Envoy and Traffic Director
in this solution is that it’s from
the perspective of a backend service. So the back end
service will say, I can accept a max of five
connections from your client. And so when the client tries
to send more than that, the proxy on the client
just throttles itself. So it’s like back
pressure to the client, but it’s from the perspective
of a backend service. And this part was
what was missing in several of the other
solutions that existed. So you had to guess what
the back end service wanted from a client’s perspective. So this is the thing
that is actually huge about circuit breaking. You can also do things
like outlier detection, and then you can eject out the
instances that actually don’t conform to whatever
policy it is that you want for your backend service. So with that, I
wanted to invite Mike, again to give you–
we just picked one of the features, which
is traffic splitting. It is one of our very
popular features. And give you a
closer look at it. MIKE COLUMBUS: All right. Thanks, Prajakta. So we get to buy more things. So as part of this demo,
we weren’t calculating shipping costs before. We want to deploy a
version two of our payment that does charge for shipping. So if we can switch
back to the podium. Thanks. So while Prajakta was
speaking, I did a few things. And I want to talk
through what I did. So I am in our Cloud Shell code
editor, which is super helpful. And I have a few shell
scripts that I ran. The first was actually creating
the deployment for the version 2 of the payment service. So I just deployed that
and that’s running in GKE. The next thing I did
was I added the– let me make this
a little smaller– I created the backend service
for our new delivery V2 version of our deployment. I created a health check
for it, and I specified the flag that made it global. The next thing I did was I
grabbed the network endpoint group name for each cluster– remember, it was US central
and Asia-southeast1– and I added that as a backend
to the backend service I just created. So that was creating the
Traffic Director constructs. And then the final thing I
did was update the URL map to split the traffic
50/50 across the two backend services. And what that did
was actually deploy this yaml file, which
I called canary. So I have a really
simple path matcher, and it basically takes the
default service and splits it. So you can see I have
weighted backend services, and I give a weight of 50, or
50%, to the backend service. This was the V1 version,
and this is the version 2. And I might as well show it. And then, in
Kubernetes Engine, you can see if I go
into the workloads, each cluster now has the
non-delivery version and then the delivery version of V2. So let’s buy something. Let’s see what we
want to buy now. Let’s buy a surfboard. It’s almost summer. So what I’ll do is now, when
I add this to the cart– again, we’re served
by US central cart. And you can see now I
have delivery charge. And this is going to
be fairly variable. If I refresh this
page a few times, you can see it’s about half– as you’d expect,
about half the time, we have the shipping cost. So basically, what’s happening
is the client side proxy, when the cart service talks
to the payment service, half the time it’s choosing
the delivery deployment, half the time it isn’t. And I could weight
this accordingly. So just one example
of how this works. But pretty powerful. And you might say, well, I
haven’t put in my address. How does it know
the delivery cost? It’s a demo. Maybe that’ll be V3. So thank you. I’ll hand it back
over to Prajakta. PRAJAKTA JOSHI: Can
we get the slides? MIKE COLUMBUS: Yeah. PRAJAKTA JOSHI: So we spoke
about Traffic Director, and we spoke about
the side car Envoy. We actually have another
interesting product called L7 ILB, and I’ll talk
a little bit more about it. If you think of
a lot of you, you may be using load balancers. And what you’re saying is
just give me a simple L7 load balancer that brings in all
of the capabilities of Envoy, but I don’t want to deal
with Traffic Director, and I don’t want
to deal with Envoy. And because all
you care about is, essentially, a load
balancer that’s for your internal
services that brings in all of these capabilities. And that is what
L7 ILB gives you. So think of you’ve
got a front end, and then you’ve got
your shopping cart, and you’ve got your payments. And you simply put the L7 ILB,
just like any middle proxy load balancer, and then you load
balance your instances behind. You’re essentially not injecting
any proxies on your back end instances. The other side of it is
there’s another set of you who have your existing deployments. And you want to bring in
all of this service mesh and all of these
modern paradigms into your existing brownfield. And that is the other
use case of L7 ILB. So it helps you bring service
mesh to your brownfield without disruption. So let’s take a look. Imagine that you’ve got
two services, the front end and the shopping cart. You were able to use– you were able to put in
are the Envoy proxies. You configure Traffic
Director for it. Now you have this really
ancient payment service, and there is no way you can go
and inject a proxy in there. So what do you do? But you don’t want to bring
in all the modernization and the capabilities that
you would get with Envoy. And so what you’re
doing there is, essentially, you
bring in the L7 ILB, because it lets you have
all of the capabilities, but without having to
go and inject anything in your back end instances. So it’s a really good way to
start migrating your services towards these modern paradigms. And this is what it looks
like under the hood. So what we did for you is we
abstracted out Traffic Director and we abstracted out
the managed middle proxy pool of Envoys. And it looks like any other
internal load balancer to you, and you simply
configure it that way. So it’s as simple as what you
saw in the previous slide, but then this is what
is under the hood. So we do believe this is a very
good way for you to get started with the service
mesh, especially in your brownfield
deployments, or if you’re just looking for that next
level of load balancing compared to what you have to do. So we spoke about a
bunch of features, but, honestly, that’s
just the start. We have several exciting
features on the roadmap. Some of these we just
started working on. Some of these will come
sometime this year or next. But one of the first
important things we will do an integration
with is all of this security features that you get with
Istio, which is things like mTLS, RBAC, and so on. The second big
integration we will do is with a number of
observability solutions. And we will have more than one. So you can– even today, you
can actually go and integrate your own– like if you’re
using, for instance, Apache Skywalking, or any of the
other tools, you can go integrate those as well. The third important thing
is that we always get asked, so Traffic Director is amazing
for services that are in GCP. What do we do with our services
that are sitting on-prem? And so that is where we do want
to support hybrid and multi cloud with Traffic Director. So your instances stay
wherever they are today, but then you use Traffic
Director to go and manage them. Then the fourth thing is
a lot of you use Istio. And so if you’re using
Pilot, Traffic Director is a very good fit for
a fully managed service. So which means we have
to support Istio APIs. So you can actually
take the Pilot and swap it out with
Traffic Director. So that is the whole integration
story into Istio and Anthos that you will hear more
about in the coming year. One of the most interesting
in areas of innovation is federation. This is probably the part where
we’ve just started thinking, but there are other service
mesh control planes out there. Some are legacy, some
are totally different implementations. But there needs to be a
way to federate these. And you will hear us talking
a lot more about federation in the coming months,
or coming year. So hybrid, I pointed out. This is what it looks like. You’ve got Traffic Director,
you’ve got your instances on-prem, and then you’ve got the
same Traffic Director managing instances that are in GCP. So that’s a hybrid story. This is what Istio Anthos
integration looks like. So you can see that there’s
a layer of Istio APIs that you can use
for configuration. And obviously, that should
work whether you instances are in GCP or on-prem. And then the key
benefit of this is you will get global
load balancing for all of your services. You will get a highly
managed control plane, which you don’t have to
worry about managing, even for your services on-prem. And one of the
nice things, again, is that we use the xDSv2
APIs for communicating between the control
plane and the data plane. What that means is
you’re not locked in into Traffic Director. So you can swap it
out back for Pilot, or another
implementation, as long as it uses the open xDSv2 APIs. So this is what federation
could look like. This is probably a place we’ll
innovate with the community, as well. And so you will
hear more about it in terms of Istio and
several other communities that we are a part of. But what is the need
for a federation? What is driving the need
for us to go look at this? First thing is there are
lot of implementations where you want a flat mesh. So you want seamless melding
of what’s on-prem and in GCP. The second one is if you
have Traffic Director control things that are
on-prem, you should assume that the connectivity
between on-prem and GCP will fail. And so you need a
backup control plane. So there’s a whole aspect
of resiliency there. The third one is
sometimes, you want to have very controlled
collaboration, because entity A might be
a totally different setup from what’s in GCP. And you want to federate state
things like config identity, but in a very controlled manner. And so you need
proper primitives, interfaces, and abstractions
to make that happen. And then the last
one is people are going to have variety
of control planes. And as we introduce Traffic
Director, for instance, we don’t expect
people to throw away all of their control planes
at once, because they want to test it out. And then they slowly
corner introduce it into a variety of
their deployments. And so we do assume
that the world will have multiple service
much control planes, and we have to figure out a way
to operate in that environment, and then help you move your
services over to Traffic Director. Or, if you choose to keep
the other ones, that too. So what do we have,
overall, in GCP? We spoke about a
bunch of things today, gave you a whirlwind tour of
the Traffic Director and L7 ILB capabilities. So let’s, again, go back
to your starting point. If you are a cloud
load balancing users, you can think of Traffic
Director as bringing in Envoy. And this could be in the form
of just a managed middle proxy. So you use it like any
other load balancer. So it’s just like an L7 ILB
that’s there for your services, but with the primitives and
the modern traffic control capabilities that you get
with Envoy and with Istio. Then, if you’re one step ahead
in your modernization phase, or if you have new apps that
you want to bring in there, that’s when you would use
Traffic Director with– just like with a
traditional service mesh, where you’ve got the
Envoy as a sidecar proxy. And then, the main thing is
that, in both of these cases, you don’t really have to worry
about managing the control plane, because you
actually want to worry about your apps
and your business. The last thing any of
you here want to do is co-manage these
controlled planes. So that is what it’s
letting you do in here. One of the nice things
is that this works out of the box for
Kubernetes, like I mentioned before– self-managed
Kubernetes GKE Docker VMs. Because one of the things I hear
very frequently is service mesh is for containers. Actually, that is
not true at all, because if you took a monolith
and you chopped it up– because if you container
has a monolith, you’re not going to
get any benefits. So if you’re taking a
monolith and chopping it up, the first thing you’re
actually going to need is services management
before you even containerize. So one of the misconceptions
that we want to remove– and also make it true– with Traffic Director
is that service mesh is for [? VNV ?] services
and for containers. It is for greenfield and
it is for brownfield. It is for enterprises,
cloud natives. It is an abstraction that
should work across the board. And then, of
course, we have gRPC is another project in
which Google is involved. That is an excellent
interface if you have apps that support gRPC. And then most of our testing
for all of Traffic Director has been with Envoy. We really like the
Envoy proxy, because we believe it delivers almost
like a universal data plane. But any other proxy that
supports xDS can be brought in. There is nothing that locks
you in, either into Traffic Director or into Envoy. You can bring your
own Envoy, as well. The second part of
it is, so you’re coming now from the
service management angle. And so that is where,
if you see here, this is when you
think about an Anthos, or you think about
GKE On-prem, or you think about Traffic
Director or Istio, what it’s really
helping you do is to deliver hybrid multi-cloud
services at scale seamlessly without toil. And so this is a portfolio
of products that’s going to help you do that. Imagine that you added
a few more to it. And as we integrate
with Anthos and Istio, it will be much
more easier for you to migrate from any of the
things you are using currently to Traffic Director or L7 ILB. Again, in here, there’s nothing
that necessitates Kubernetes, but on-prem, we have
several customers who have brought in GKE On-prem. And then in Google Cloud,
the fastest way to containers is using the Google
Kubernetes engine. So this is the last part. I wanted to–
there is a customer we have called [? Mercadi. ?]
So this example is actually inspired by them. This is how they are
thinking of a blueprint to go and modernize at
their pace without toil. So in their case,
they had, essentially, a monolith on-prem. And what they went to
is something like this. And you can notice that
there’s no monolith on-prem. So how did they do it? What they did is– and you
can just follow the numbers. What they did first
is they, essentially, did nothing to the monolith. They came to cloud, and
the monolith, of course, was on-prem. They came to cloud. They put a Google
global load balancer– this was for DDOS defense,
this was for TLS termination, and so on. And they put API
gateway instances behind the global load balancer. So now your traffic
comes in, it seamlessly gets routed through the
API Gateway to on-prem. So that is the first set
of things that they did. Then they started chopping
away at the monolith. What they did is they took
a chunk of the monolith and then they created
service A. So you see the number two, there. They had a gRCP
interface for it. Once they did that, they
actually chopped out service B. One of the
things you’ll notice is every time they chopped out
anything from the monolith, they created it as a
containerized service. So it was created as a
containerized service in cloud, instead of doing it on-prem. Once they’re done
now service B, there was some services that
could not be move to cloud. So what do you do there? So they chunked
out the service C. You can see the
number four, there. And then they put it on-prem. But then they used GKE
On-prem to manage it, because they were using GKE,
which is a Google Kubernetes Engine in Google Cloud. And so GKE On-Prem gives
you the same primitives, but to manage things on-prem. And you must have heard a lot of
it in talks in this conference, as well. So now they have GKE
On-Prem managing service C. They say, now we are
ready to try out the service mesh and some of the
newer capabilities of the modern paradigms. And so they have now put Envoy
proxy on service B and service C. This is so that the
services could communicate. But now you need a control
plane, because something has to go manage those
proxies, and something has to make sure that
they get configuration, and so on and so forth. So they first brought in
Istio, because Traffic Director wasn’t available there. And then Istio
controlled service B, and Istio on-prem controlled
service C. Now after this, what will happen is now that
Traffic Director is available, the Pilot from Istio can be
replaced with Traffic Director, which is what will happen. And then eventually,
when we support hybrid, you can see the
arrow six, which will be able to go control those
services with the proxies on-prem. And by this point of time,
because you’ve chopped up all of these services
from your monolith, you can see that you’ve totally
taken away the monolith itself. So if you see here that it’s not
that people just bring in all of these technologies one shot. They have a strategy,
and they bring it in the way that
makes sense for them. This is just one strategy. There’s another talk
we are doing tomorrow, which describes several
of these strategies. But this is a good way
to bring in service mesh. And of course, there’s
the brownfield strategy where you could totally use
L7 ILB as your starting point. So our hope was that– I know we gave you a whirlwind
tour of Traffic Director, L7 ILB, how it fits into
the whole ecosystem, but our hope is that using
these GCP services, and also open source services,
you can manage your services seamlessly
and modernized from your starting point, at
your pace, and without toil. Thank you. [APPLAUSE] Oh, yeah. And one last thing. We have a survey. We’d really appreciate
it if you filled it. We also have an online Dory. So Dory is where you
can post your questions, and then Mike and I
will respond to them and post them back online. So thank you. Thank you for being
here for the talk. Thanks. [MUSIC PLAYING]

Author:

Leave a Reply

Your email address will not be published. Required fields are marked *