Micro Services in the Cloud : Azure Service Fabric Mesh : The What and Why ?

Months ago, Microsoft announced the introduction of Azure Service Fabric Mesh (SFM), a new easy and salable managed service to deploy your micro services application based on containers.

Many of you are skeptic about Microsoft, a feeling that i have seen through many threads and discussions on the net. In this post, i will try to share my experience with this new service (even if it’s in early phase, public preview) in order to explain the What and Why.

  • What is Service fabric Mesh ?
  • Why Service Fabric Mesh?

1- What is Service Fabric Mesh ?

For people whom familiar with Service fabric or Azure Service Fabric, Service Fabric Mesh is a managed Service Fabric offer. It’s supposed to deliver the same Azure Service Fabric features, but in a managed way : You don’t create, manage or operate the underlying infrastructure, you only deploy you services directly to  Service fabric. Note that SFM does not always provide the same features, but equivalent features, and maybe new features (The features time-line between both offers is not always the same), but the whole concept is the same.

For people new to this area, SFM is a managed (which means that the infrastructure is hidden and managed by Azure) service where you deploy your services (to be more precise, you containers) using a very simple, beautiful and powerful way.

Here the Microsoft overview article of Service Fabric Mesh, when more “commercial” details and sentences are used, but to be honest i believe is this service.

Take my previous blog post, where i explained the Micro Services architecture using simple words, and where i have used a very know example ; The HRA application.

SNAG-0039.png
HRA Application services

The Application is composed of 4 services. If we map a Service to a container, we will obtain 4 containers. You will ask your developer to create 4 docker images, 1 image for each service. You will ask your developer to upload/push these images to a registry, let’s say Azure Container Registry (an equivalent to Docker Hub).

SNAG-0040.png
3 docker images pushed to Azure Container registry

Then comes Azure Service Fabric Mesh : You have just to prepare an ARM json template (see Azure ARM templates) where you describe you application architecture and configurations : I want to deploy 4 services, each service use the image x, each service should be highly available with at least y instances, and each instances should have N cpu and M GB of RAM. Each service listens internally on port A, B, C… and only the Portal service is published to the internet on port 443.

–> Is not this magic and beautiful ? Believe me, the structure of the ARM template is very easy and understandable, making DevOps very affordable.

Once you deploy this ARM template, within less than 5 minutes, your application is ready to be used.

To conclude : Azure Service Fabric Mesh is the service provided by Azure that combines:

  • The Power of Azure Service Fabric
  • Managed service (PaaS)

2- Why Service Fabric Mesh ?

Many of you will ask : Why do i have to use SFM, and not use another service like :

  • Kubernetes : Azure Kubernetes Service, which provides the power of the Cloud (Azure) and Kubernetes
  • Azure Container Service
  • Amazon ECS
  • Amazon EKS
  • Or create your own cluster on top of IaaS (like a swarm cluster)

Even though this is a matter of preference, the following are my arguments about why i love SFM and why it’s my preferred cloud container service (For micro services)

2.1- Fully managed

SFM is fully managed, which means that you define only what matters for you : The containers. SFM will manage all the underlying compute, storage and networking infrastructure. HA is also provided by simple defining up to 2 instances for each container.

2.2- ARM resources

What i like in SFM, is that the components are first class ARM resources, which means that you can manage, deploy and configure them using ARM calls (ARM templates, API, Terraform in the future…), and benefit from RBAC. The following are examples of ARM resources :

  • Application : Your SFM application is an ARM resource
  • Service : Each “micro service” is an ARM resource
  • Network : The network in which you application is deployed in an ARM resource

2.3- Enterprise features

SFM is currently in preview, but according to many discussions and the road map, many enterprise features are/will be added that makes the integration of our applications easier and powerful:

  • HA volume: You will be able to attach a Highly Available fast (SSD) volume to a set of containers. This will allow simultaneous and concurrent access to a storage location (local and fast) to your services. You can share anything between your services thanks to this shared volume
  • Reliable Collections/Dictionary: If you don’t know what we mean by “Reliable”, then here a small description : Your services can have access to a highly available, near-real-time synchronized collection/dictionary that your application and services can use to store and exchange values. This is very useful for state-full services, where each instance of your service should access an accurate and synced state.
    • Example : My service, which is instanced via 2 containers (c1 and c2) must store the status of the user session during its life-time (token for example). With a Reliable Dictionary, you can store the token within a key/value pair object in a dictionary, that your service can get/set anytime. See here for a more explained video
  • Services Communication and routing: It’s very easy today for Service 1 to communicate with service 2. No need to use the IP or the hostname. Service1 can communicate with Service2 just by sending requests to htttp://Service2:port/abcd (using the Service name). On the future, SFM will support Envoy which is sort of Reverse Proxy for MicroServices, allowing your services to communicate only with it, and it will orchestrate the communications. It supports many features like routing, circuit braking, authentication, transformation… The native support is a key point to why i like SFM
  • Rolling updates : When you have a new version of your Service1, you can deploy it securely. SFM will replace your containers one by one to ensure your service is not affected
  • Auto-scale Rules : In the future, SFM will support auto-scale rules in order to support scaling the containers instances automatically.
  • Scaling performance :  I have seen how SFM scaled from 1 to 500 containers in less than 30s

And more is coming…

2.4- Pricing

NB : This depends on the pricing units, which is not yet disclosed

The pricing of SFM is very simple : You are charged per CPU core and RAM fraction you allocate to your containers.

For my HRA app, i can use the following configuration :

  • Portal : 1 cores,2 GB RAM
  • Service 1 : 0.5 cores, 1 GB RAM
  • Service 2: 0.5 cores, 0.5 GB RAM
  • Service 3: 2 cores, 4 GB RAM

I can then deploy x instances per Service, and apply a scaling plan to scale-In ou Out accordingly –> I will pay just what i consume, reducing the entry price

3- Conclusion

SFM is new player (still a baby) on the container as a Service market, but what differentiates it from other solutions is that is fully managed, and that the management experience is unique (ARM). I have used AKS, and i really like it and would recommend it alongside with SFM (even though AKS stills lack of some management features, and that it’s not a fully managed service).

Check the following links for more information:

Advertisements

An easy way to understand Micro Services Architecture (MSA)

Hi,

During the last few years, we are increasingly speaking about Micro Services, and how applications should be designed in this style. This is very interesting, but what is lacking when talking with “Mircroservices” fun is a simple explanation why it’s better, and how can i understand the concept rapidly, and imagine my application in this style.

My aim in this blog is to save you pages and pages of lecture by presenting you a quick and simple way to understand MSA (Micro Services Architecture)

1- Our sample application

In order that my explanations be more clear, it will be supported each time be a sample application, that we will call : Hotel Reservation Application (HRA). This is a typical example that many are using, and i insist in using it in order that the picture be always more clear.

HRA provide the following feature:

  • Search and find a hotel room (By City, price, or both)
  • Check the availability of the hotel room during a period
  • Make a reservation
  • Send an email to the user in order to confirm the reservation

For the simplicity of the example, the above features are enough for a user to make a reservation

2- The application architecture after a quick classical reflection

Without any software architecting background and using only the “good sense”, i would imagine my application like the following (we can do better in fact)

SNAG-0038

The application has two tiers:

  • The frontal part which is a Web Portal http://hra.com
  • The back-end part which is huge server containing my application core features
    • It stores and retrieves the information from a SQL DB

What is bad with this application ?

10 years ago , no one will crtisize this application, but today everyone will have at least one comment:

  • What if the Back-end server is down, all my application will be down
    • In fact no, with horizontal scale, we can achieve high availability be crating at least 2 instances of the back-end server, the web portal will talk with a load balancer, and the load balancer will distribute traffic to the healthy servers

It’s no more a question of high availability, today other questions and challenges rose:

  • I need to update my “check availability” module : This is very complicated since it coexist with all the other modules, you need to check dependencies and interactions with the other modules, you need to make tons of verification and finally you need to redeploy the whole application –> Long application update life-cycle, Risk
  • We have to upgrade the Java or .NET version so our module “Find hotel” supports a very nice feature that will accelerate the search 10 times. Unfortunately, this will beak the “check availability module” since it uses an old library not compatible with new version. Two possible paths:
    • wait for all the other modules to be upgraded to support the new framework update –> Time waste, long application update life-cycle
    • Force the development of the other modules to support the new framework     –> Cost
  • We want to entirely change the programming language (JAVA to .NET or .NET to JAVA –> Kidding me, unless you have solid arguments to justify the huge cost and effort to Business, this is impossible (like JAVA is dead which is impossible too)
  • The “Find Hotel” module is very solicited today, so we are horizontally scaling out our application. Unfortunate, the other modules do not need to be scaled out since they are underutilized. But because the application is ONE, all the application should be scaled, leading to resources waste (since each module consumes by default x resources when idle –> None-optimal resources usage

–> Did you catch it : Monolithic applications have many and many drawbacks and limitations that we can resume them on 3 main points:

  • Limited and poor Flexibility and Extensibility
  • Very long update and release life-cycle
  • Resources usage

–> Does no longer follows the growing “business needs for changes”

3- Micro Services Architecture : It’s no more than good sense

Let me use the limitations of the Monolithic application listed above, to propose a new architecture for my application.

SNAG-0039.png

Am i a genius ? No, it’s just a natural reflection. I have just decoupled the modules, so that each module becomes a Service (a small application) that runs on its proper server.

Each Service can communicate with the other service using a known method, and even can be fully isolated from communicating with the other services. Let us take an example:

  1. The user connects to the portal
    • Involved components : Web Portal
  2. The users asks for all the hotel rooms within Paris and selects a Hotel
    • Involved Components : Web Portal, Service 1, DB
  3.  The users checks the availability for the room between 20 and 30 September
    • Involved Components : Web Portal, Service 2, DB
  4.  The users makes a reservation for that room
    • Involved Components : Web Portal, Service 3, DB

Waw, we demonstrated that each Service is independent of the other by proof, and here is my contribution to the community : WHAT DO WE MEAN BY INDEPENDENT, because without Service 1 i can’t  find the hotel room for step 3.

The characteristic of MSA are the following (Inspired by this excellent article )

  • Service Independence
  • Single Responsibility
  • Self-Containment

3.1- Service Independence

My explanation for Service Independence is the following : During the service execution, an Independent Service  should not rely on any another service to complete a task.

NB : The data store (like the database or a file) is not considered as a service

Our application can testify:

  • In order to find a hotel, i will submit a request to the Find Hotel service with a set of parameters (city and price). When the service handles the request and starts its execution, it will only communicates with the DB to achieve its goal. Then it will send back the result to the portal. The task is completed
  • In order to check for the availability, i will submit a request to the Check Availability Service with a set of parameters (hotel room, period). When the service handles the request and starts its execution, it will only communicate with the DB to achieve its goal. Then it will send back the result to the portal. The task is completed
  • In order to reserve the room,  i will submit a request to the Make reservation Service with a set of parameters (hotel room, period, availability OK, CardID). When the service handles the request and starts its execution, it will only communicate with the DB to achieve its goal. Then it will send back the result to the portal. The task is complete

Note that during the service execution, it dos not rely on other services to complete. The service needs only inputs to start.

3.2- Single Responsibility

Single responsibility means that the service is responsible for providing a single business capability that, by using the good sense cannot be further divided. When i say Using the good sense, it’s because we can divide anything till we reach the CRUD operations. Single Responsibility means that the service has a logical and business function, which can be seen as un-split-able. For example, in our example, the Check Availability Service has a single responsibility : to check if a hotel/room is available. The Reservation Service on the other hand, is quiet different, since it contains two functionalities that logically can be divided (Good candidate for an enhancement)

 

3.3- Self Containment

This is an important properties, specially for whom are responsible of developing the application. Self Containment means that the Service should contains all the necessary “code” to run without the need for external interactions or modules. This is solely related to the Properties 1 (Service Independence), but at the code level. Ensure that the Service does not rely on an external modules or dependencies when executing. The code lifecyle should be independent by means. For example, updating a “Function” related to Service 1 should not break Service 2.

4- What next ?

This post is a simple introduction to understand the Micro Services Architecture. The goal is to demystify the concept and to start having a proper thought about your ability to design and imagine your application, or a new application using MSA. Keep in mind that:

  • Not all applications are suitable for a MSA
  • Switching from a monolithic app to MSA has to be considered carefully and wisely : May have multiple advantages, but also requires a huge effort
  • There is no ONE MSA for an application. Two Software Architects may bring different designs. Study them deeply and identify the best, according at least to the 3 principles mentioned above.

Note : What do you thing of my Service 3 in my sample Application, any note ? Can it be enhanced : Why and How ?

The oldest “bug” for Windows domain joined computers

Hi,

When you have a computer joined to an Active Directory domain, and that you decide to leave the domain to work-group, you are “surprisingly” asked to enter credentials for an account with permissions to remove this computer from the domain. I know this is not always a bug, but in all my cases, i was just wanting the leave the domain.

So what credentials do you need to provide : anything will work, just type something and click OK.

 

SNAG-0037.png

Nice Saturday!

Azure Service Fabric news from //BUILD

Hi all,

Azure Service Fabric is one of my preferred azure services today, at is embraces the microsoervices era. At //BUILD, a new SF offer has been introduced, which is Azure Service Fabric Mesh, a fully managed and server-less SF brand. The following points resume the session that you can find here : https://www.youtube.com/watch?v=0ab2wIGMbpY

  • Deploying Service Fabric Clusters locally (Standalone)
    • Now you can use a JSON manifest file to describe the nodes where to deploy, certificates to install, configuration to set… This file will be uploaded to Azure, and Azure will provide the required packages to make the installation
  • Single Portal experience
    • View and manage the Azure SF clusters and the Local Clusters via the Azure portal
  • Azure Service Fabric Mesh
    • A fully managed SF running on Azure. You no longer need to manage the SF infrastructure like VMSS, LBs or any other thing.
    • Supports containers
    • Creating Applications and Services now can be done via ARM templates, as Applications and Services are first class ARM resources !! This just amazing2018-05-09_11-53-49.png
    • All the .yaml files that are used to define the Service Fabric resources are regrouped into a JSON file2018-05-09_13-56-26.png
  • Secrets Store (Secrets Resource)
    • Built within SF, Applications and Services have managed idendity with AAD, and can access Azure Key Vault to get secrets and certificates. Secrets and certificates rollover is supported too.
  • Volume Resource
    • A volume resource presented to the containers, two types :
      • Backed by Azure File Storage
      • Backed by SF Volume Disk (replicated local disks)2018-05-09_14-10-03.png
  • Diagnostic and monitoring
    • Containers will write stdout/stderr to a volume, and Application Insights will analyse these data from that volume.
    • Azure Monitor for SF metrics
  • Reliable Collections
    • Enhancements and new features, the application and the reliable collection are now separated
  • Intelligent traffic routing
    • Great news about this as new routing features are added2018-05-09_14-25-50.png
    • Introduction of Envoy. This will simplify services communication2018-05-09_14-26-53.png

Service Fabric Mesh Pricing

This was not disclosed, waiting for it.

Understanding Azure CosmosDB Partitioning and Partition Key

Hi,

Many of you have asked me about the real meaning of Cosmos DB partitions, the partition key and how to choose a partition key if needed. This post is all about this.

1- Cosmos DB partitions, what is it ?

The official Microsoft article explains well partitions in CosmosDB, but to simplify the picture:

  • When you create a container in a CosmosDB database (A Collection in case of SQL API), CosmosDB will provision a capacity for that container
  • If the container capacity is more than 10GB, then CosmosDB requires an additional information to create it : WHY ?

When CosmosDB provisions a container, it will reserve capacity over its compute and storage resources. The Storage and Compute resources are called Physical Partitions.

Within the physical partitions, Cosmsos uses Logical Partitions, the maximum size of a Logical Partition is 10GB

You get it now : When the size of a container exceeds (or can exceed) 10GB, then Cosmos DB needs to spread data over the multiple Logical Partitions.

The following picture shows 2 collections (containers):

  • Collection 1 : The Size is 10 GB, so CosmosDB can place all the documents within the same Logical Partition (Logical Partition 1)
  • Collection 2 : The size is unlimited (greater than 10 GB), so CosmsosDB has to spread the documents across multiple logical partitions

SNAG-0022.png

“The fact of spreading the documents across multiple Logical Partitions, is called partitioning

NB1: CosmosDB may distribute documents across Logical Partitions within different physical partitions. Logical Partitions of the same container do not belong necessarily to the same physical partition, but this is managed by CosmsosDB

NB2: Partitioning is mandatory if you select Unlimited storage for your container, and supported if you choose 1000RU/s and more

NB3: Switching between partitioned and un-partitioned containers is not supported. You need to migrate your data

2- Why partitioning matters ?

This the first question i have asked to myself: Why do i need to know about this partitioning stuff? The service is managed, so why do i need to care about this information if Cosmsos DB will distribute automatically my documents across partitions.

The answer is :

  • CosmosDb does not manage Logical partitioning
  • Partitioning has impacts on Performance and related to the partition Size limit

2.1- Performance

When you query a Container, CosmosDB will look into the documents to get the required results (To keep it simple because this is a more elaborated ). When a request spans multiple Logical Partitions, it consumes more Request Units, so the Request Charge per Query will be greater
–> HINT : With this constraint, it’s better that the queries don’t span multiple logical containers, so it’s better that the documents related to the same query stay within the same logical partition

2.2- Size

When you request the creation of a new document, CosmosDB will place it within a Logical Partition. The question is how CosmosDB will distribute the documents between the Logical Partitions : A Logical Partition can’t exceed 10GB : So CosmosDB must intelligently distribute documents between the Logical Partitions –> This is easy i think, a mechanism like round robin can be enough, but this is not true! Because in case of round robin, your documents will be spread between N logical Partitions. And we have seen that queries over multiple logical Partitions consume a lot of RUs, so this is not optimal


We have now the Performance-Size dilemma : How cosmosDB can deal with these two factors ? How we can find the best configuration to :

  1. Keep ‘documents of the same query’ under the same Logical Partition
  2. Not reaching the 10GB limit easily

–> The answer is : CosmosDB can’t deal with this, you have to deal with it by choosing a Partition Key


3- The Partition Key

My definition: The Partition Key is a HINT to tell CosmosDB where to place a document, and if two documents should be stored within the same Logical Partition. The partition key is a value within the JSON document

NB : The PartitionKey must be submitted during a query to cosmsosDB

Let me explain this by an example:

Suppose we have a Hotel multi-tenant application that manages the hotel rooms like reservation. Each room is identified by a document where all the room’s information are located. The Hotel is identified by a hotelid and the room by id

The document structure is like the following:

{
“hotelid” : “”,
“name” : “”,
“room” : {
“id” : “”,
“info” : {
“info1” : “”,
“info2” : “”
}}}

The following is the graphical view of the JSON document:

SNAG-0033

Suppose we have documents of 6 rooms:

SNAG-0027

SNAG-0028

SNAG-0029

SNAG-0030

SNAG-0031

SNAG-0034.png

3.1- No Partition Key

If you create a container with a size of 10GB, the container will be not partitioned, and all the documents will be created within the same Logical Partition. So all your documents should not exceed the size of 10GB.

3.2- Partition Key : Case 1

Partition Key = /hotelid

In this case, when CosmosDB will create the 6 documents based on the /hotelid, it will spread the documents on 3 Logical Partitions, because there are 3 /hotelid distinct values.

  • Logical Partition 1 : /hotelid = 2222
    • 3 documents
  • Logical Partition 2 : /hotelid = 3333
    • 2 documents
  • Logical Partition 3 : /hotelid = 4444
    • 1 document

What are the Pro and Limits of this partitioning scheme:

  • Pro
    • Documents from the same hotel will be placed on a distinct Logical Partition
    • Each Hotel can have documents up to 10GB
    • Queries across the same hotel will perform well since they will not span multiple Logical Partitions
  • Limits
    • All rooms of the same hotel will be placed within the same Logical Partition
    • The 10GB limit may be reached when the rooms count grows

3.1- Partition Key : Case 2
NB : CosmosDB supports only 1 JSON properties for the Partition Key, so in my case i will create a new properties called PartitionKey

Suppose that after making a calculation, we figured out that each Hotel will generate 16GB of documents. This means that i need that the documents be spread over two Logical Containers. How can i achieve this ?

  • The /hotelid has only 1 distinct value per Hotel, so it’s not a good partition key
  • I need to find a value that can have at least 2 distinct values for a Hotel
  • I know that each Hotel have multiple rooms, so the multiple room ids

The idea is to create a new json proprieties called PartitionKey, the PartitionKey can have two values :

  • hotelid-1 if the roomid is odd
  • hotelid-2 id the roomid is even

This way:

  • Whe you create a new document (which contains a room), you have to look to the roomId, if it’s even than PartitionKey = hotelid-2, if it’s odd: PartitionKey = hotelid-1
  • This way, Cosmos will place even rooms within a Logical partition, and odd rooms within another Logical Partition

–> Result : The hotel documents will span two Logical Partitions, so 20 GB of storage

What are the Pro and Limits of this partitioning scheme:

  • Pro
    • Documents related to the same hotel will be placed on 2 Logical Partitions
  • Limits
    • Queries related the same hotel will be spread across 2 Logical Partitions, which will result on an additional request charge

4- How to choose the Partition Key?

This is most difficult exercise when designing your future CosmosDB data structure, here some recommendations to guide your thought it:

  • What is the expected size per document? This will give you an information about the level of partitioning you will make. Think about the examples above. If each document is 100KB max, then you can have up to 105k documents per Logical Partition, which means 105k room per hotel (More than enough), so /hotelid is a good partition key against the Size Constraint
  • If you are faced to more combinations of partition keys and are unable to get decided, do the following:
    • Do not use the partition key that will fire the Size constraint quickly : Reaching the Size limit makes the application unusable
    • Choose the Partition Key that will consume less Request Charge, but how to predict that : You have to determine the most used queries across your application, and choose the best Partition Key according to them.
  • Add new properties to your json document (PartitionKey), even if they are not really useful, just to achieve a good Partitioning

5- I have determined a good Partition Key, but i afraid hitting the 10 GB limit per Logical Partition ?

This is the most asked question after choosing the Partition Key : What if all the documents with the same Partition Key value hit the 10GB limit !!

Like he example above, try to find a mandatory value that gives your the X factor you want : The idea is to say: Can i find an additional properties that i can use in my partition key ?

NB : The Request Charge will be multiplied by X, but at least i can predict it

This was simple in my case, but in case have a factor X, you can use a Bucket calculator function. Here’s a blog about this : You just provide how much logical partitions you want to span your documents into. A good blog post here about the subject.

Hope that this article helps.

Cheers

Azure Virtual Machine Serial Access : Finally available

Hi all,

5 years after the first feedback request, Azure has finally added the Console Access feature to its Virtual Machine service.

The history ?

In the past, accessing a virtual machine was only possible via the network. Anything preventing you for accessing it (managing it) other than from the network path (ssh, rdp, remote powershell…) has dramatically bad impact –> Redeploy

For example :

  • You have enabled the firewall on the Virtual Macine –> Redeploy
  • You have a blue screen (that you can fix by changing a setting) –> Redeploy
  • Your screen is stuck on the ‘Please hit a key to  continue’ –> Redeploy

Today, Azure has added the feature of Serial console access, which means that you can access the Virtual Machine, just you were accessing it via the console port –> No need for network connectivity to the OS

This is a so waited and wanted feature that is currently on Public Preview, check it here : https://azure.microsoft.com/en-us/blog/virtual-machine-serial-console-access/

Future improvements

  • Adding the F8 keyboard key support to handle accessing early stage booting screen
  • Adding RDP support to Windows because only cmd or powershell administration is provided today

Enjoy !!

 

Azure Networking Cross connectivity : The Options

Hi all,

I’m continually working on designing Cloud solutions, and specifically, Azure based Cloud solutions. One of the building blocks when starting dealing with Azure, is the Networking infrastructure that we need to build.

One of the challenges that we may and certainly will encounter is how to imagine the cross connectivity model, between the different networks. Cross connectivity can involve the following networks :

    • On-premises DC to Azure VNET
    • Azure VNET to Azure VNET
    • Azure VNET to Azure VNET on different region
    • ROBO to Azure VNET

1- The options

The following table shows the different options that you can use to cross connect different networks to Azure VNET :

Network Options
On-premises DC Express Route

Site to Site (S2S) VPN

3rd part S2S VPN

Azure VNET VNET to VNET (VPN)

VNET Peering

Express Route

3rd part S2S VPN

Azure VNET Different Region VNET to VNET (VPN)

Regional VNET Peering

Express Route

3rd part S2S VPN

ROBO Site to Site (S2S) VPN

3rd part S2S VPN

2- Understanding the options

2.1- Site to Site VPN

The Site to Site VPN is a connectivity option that can be used to connect an Azure VNET to any network over internet, and using the VPN technology. The S2S VPN is the fastest way to establish a trusted private connection between your network and an Azure VNET.

The S2S VPN requires that you deploy an Azure VPN Gateway on the Azure VNET (An Azure managed gateway), and establish a VPN connection with a compatible VPN device on your side. The Azure VPN Gateway provides 99.9 SLA (under the hood, 2 VPN gateway instances in Active/Passive mode). You can in addition achieve a more resilient configuration with the Active/Active configuration (No published SLA)

a- Requirements and prerequisites

  • A VPN Gateway deployed on each VNET (2 VNETs can’t use the same VPN gateway)
  • A compatible VPN device on your side (Note that even if your device is not listed, it can be used as long as it supports the VPN configuration required by Azure)

b- Pro and Cons

Pro Cons
The fastest way to establish cross connectivity

No special configuration (VPN over internet)

A good solution for ROBO

No quality SLA (internet : latency, jitter…)

A maximum of 1.25 Gbps

c- Pricing

You will pay for :

2.2- Express Route

ExpressRoute is the Microsoft offer that enable the customer to establish a low latency, private and high bandwidth network connection to the Azure data-centers. Without entering the technical details, ER is a Layer 3 private connection to Azure networks, it travel through a dedicated circuit from your data-center to the Azure networks, without going to Internet.

ExpressRoute connectivity (Microsoft Credit picture)

ER is a Enterprise offer to customers that require high bandwidth and low latency connection to their Azure workloads. It can provide different bandwidth options, that can go from 50Mbps to 10Gbps, and an SLA of 99.9

a- Requirements and prerequisites

  • An ER Gateway deployed to your VNET (An ER circuit can be shared between different VNETs)
  • An Exchange/Network provider that can provide the connection to Azure ER

b- Pro and Cons

Pro Cons
High Bandwith, low latency, private connection to Azure

An ER circuit can be shared between different VNETs, providing both full mesh connection between the VNETs and a connectivity to on-premises for all VNETs

Can be extended to connect VNETs on other regions

The possibility to use Azure Microsoft peering (and Public Peering) to reach other Microsoft Services directly without going through internet. Services like Azure PaaS services (Web Apps, Azure SQL..) and Office 365 services…

Can take time to prepare and establish the circuit (weeks to months: contract with a network/exchange provider)

The cost can be significant for high bandwidth/unlimited tiers

Cannot be easily used for ROBO

c- Pricing

You will pay for :

  • The deployed ER Gateway (Per hour)
  • The outbound data leaving Azure in case you subscribed to the metered plan
  • ER Premium Add-on in case you wish to enable premium features like sharing the ER circuit with VNETs outside the geopolitical region when the ER circuit is established
  • The ER circuit

2.3- VNET Peering

VNET Peering is an Azure technology that allows you to link/peer/connect 2 or more Azure Virtual Networks, using few clicks and without deploying any additional resource. 2 peered VNETs are like a bigger VNET, that’s it. Imagine putting a wire between two networks, and start exchanging traffic between them, this is VNET peering.

VNET peering establishes a private, LAN-like connectivity between 2 or more virtual networks. Resources within the virtual networks will see each other just like they were on the same one.

a- Requirements and prerequisites

  • 2 or more Virtual Networks (Note that peeing VNETs in different regions are currently in preview, and not available on all regions)

b- Pro and Cons

Pro Cons
Easy to configure (few clicks)

LAN-like performance

Not negligible Cost (The pricing model is per volume, so not very predictable)

c- Pricing

You will pay for :

  • The data In and Out the VNET. For example, if you send 1 GB from VNET 1 to VNET 2, you will pay 1 GB leaving VNET 1 and 1 GB entering VNET 2

2.4- VNET to VNET

The VNET to VNET connectivity is just a S2S VPN between two VNETs, provided by the Azure VPN Gateways. It’s an additional cross connectivity option, that is cost-effective

a- Requirements and prerequisites

  • A VPN Gateway on each VNET

b- Pro and Cons

Pro Cons
Simple to configure

Cost-effective as traffic between 2 VNETs within the same region is free, you pay only for the VPN Gateways (per hour)

Performance/latency

c- Pricing

You will pay for :

2.5- 3rd party S2S VPN

You can opt for  establishing a network cross connectivity using your own technology, by using a virtualized VPN device (Or a device that uses any tunneling protocol). By deploying a Virtual Machine where your software is running (must be supported by Azure like a Linux-based Virtual Appliance), you can establish a connection to your other networks and then route traffic to your VNET using Route Tables (UDR)

a- Requirements and prerequisites

  • A Virtual Network Appliance supported by Azure

b- Pro and Cons

Pro Cons
Keep your Enterprise technology High Availability : Most HA protocols are not supported on Azure like VRRP

Cost : That depends

Bandwidth/Latency : Traffic is over internet

Additional Management (Not a managed service)

c- Pricing

You will pay for :

  • The Virtual Machines you deploy
  • The outbound data leaving Azure

3- How to choose between the solutions

In a coming post, i will share a design i have recommended to one of my customers, showing one of the architecture that we can build, using Express Route and VNET peering. But this was for the context of that particular customer. Choosing which technology to use depends on many factors including :

  • Budget : We saw that ER and Peering are relatively expensive comparing the S2S VPN and VNET2VNET
  • Needs : I don’t need an ER between my ROBO and Azure if i have few data to exchange. But i need VNET peering if latency is mandatory between my workloads spread between VNETs
  • Time To Market : Establishing a S2S VPN is a way quicker than an ER circuit, so an emergency may leave you with no choice, at least for the short term

My recommendations

I can’t just recommend something without knowing the context and the needs, but in general, i see the picture like the following :

  • If you are in a Hybrid configuration for the mid/long term (More than 1 Year), then providing an Enterprise connection between your datacenter and Azure is crucial. East-West traffic requires high bandwidth / low latency connections, so ExpressRoute is the unique good choice.
  • If your workloads are spread between VNETs, and the latency/bandwidth matters, VNET peering is the best choice. If the connection quality is not mandatory, then you can opt for VNET2VNET connectivity, or you can share your ExpressRoute if it already exist.
  • The small and medium ROBOs can use S2S VPNs to connect to Azure. If the performance matters than an alternative architecture may see place. Establishing ER for a ROBO is neither practical nor cost effective. So you can opt for a hybrid architecture where all your offices are connected to a POP with high bandwidth / Low latency links, and an ER is linking the POP to Azure.