Microsoft Azure: The ways to upload VHDs to Azure : AZcopy

Hi All,

This week is all about the ways to upload VHDs to Azure. And guess what, I’m not done yet. Everyday, I’m discovering something new. Remember my blog about the Azure Powershell command Add-AzureVHD that permits to upload VHDs to Azure. It’s a powerful command that enable you to do scripting and automation. But we saw some drawbacks with it like the MD5 hash calculation that can take considerable time, and that we can’t skip. However, I don’t know why I never used AZcopy. I used AZcopy before to copy VHDs between two storage accounts, but I forgot that it permits uploading files (VHDs) from On-premise to Azure.

AZCopy is a tool (utility) that give you many options like copying files between Azure storage accounts and subscriptions, copying files from on-premise to Azure, make batch copies… The good news is that the MD5 hash calculation is optional (Not by default). Use it if your read disk throughput is low and if you don’t afford waiting.

Keep in mind that Azcopy does not convert Dynamic VHDs to Fixed VHDs during upload. You need to upload fixed type VHDs

Download AZCopy here

Look for this Microsoft article describing the AZCopy options

Look here for this step by step blog

Microsoft Azure: The ways to upload VHDs to Azure (v2)

  UPDATE 12-22-2016

This post is also applicable for Add-AzureRmVHD with Azure Powershell 1.0 for Azure Resource Manager

  UPDATE 

  • This post replaces my previous post Microsoft Azure: The ways to upload VHDs to Azure (Retired). The aim is to add an important information related to CLoudBerry Explorer
  • Described another utility in this post

 

Hi all,

More and more customers are moving to Azure, or at least moving some workloads to Azure, or even are starting using Azure. Anyway, in any of these cases, you may want to move some VMs to Azure, so you can start using them in a production or test scenario. Before I continue, If you plan to migrate a platform to Azure, uploading VHD by VHD is not suitable for you, you should look for a more automated and complete solution. Look for my blog here, I made a good description of that topic.

But if you want to upload some VHDs to Azure, and you are lost in googling and binging, I wish you will find your  answers here. So let’s begin:

To upload virtual hard disks to Azure, you can use several tools:

CloudBerry Explorer for Microsoft Azure Blob Storage

This is just and excellent  tool, and my favorite. CloudBerry Explorer offers a handsome UI where you can drag and drop VHDs between your local disks and the Azure Blob, and vise-versa. You can initiate many simultaneous uploads, pause and resume uploads and view the upload remaining time, all this for free. In fact, CloudBerry Explorer  provides a free action and PRO edition. The PRO edition will let do more things like creating upload rules, create analytic reports, multithread uploading, encryption, compression… But if you want just to upload some VHDs, the free version is really great. Update : Only VHDs on Page blobs are supported to work in Azure, CloudBerry copies by default  files as Block blobs, you should use the Copy As Page blob button on top of the Window. So, what are you waiting for ? Start now here

Add-AzureVHD

This is a PowerShell command provided by the Microsoft Azure PowerShell. Add-AzureVHD is great if you want to script several VHDs uploads. You can download the Azure Powershell here. Follow this blog to begin with the Add-AzureVHD command. Add-AzureVHD is a powerful way to upload VHDs to Azure, but to be honest you may hit some limits and drawbacks

Azure storage explorer

Azure Storage Explorer is a free utility for viewing and acting on Microsoft Azure Storage. Azure Storage allows you upload VHDs to Azure blobs and several additional operations. Azure Storage explorer can be downloaded here and is under a preview version. I will not rate this tool because I just used it for 5 minutes. The upload experience was awful (No upload status, no upload percentage…) and the UI freezes unexpectedly.

Azure Drive Explorer

Azure Drive Explorer is server/client tool that allows you not only to upload VHDs to Azure blobs, but also to upload files inside VHDs in Azure. Azure Drive Explorer requires that you install its server component in Azure (deploy packages into Azure), and then uses the client component to make the uploads. If you want to test it, just go here and good luck. PS: I did not tested or used this tool, so it will be to you to rate it.

Voila, I’m done here. If you want to follow my recommendations, use Cloud Berry explorer. If you want to automate, script and batch uploads, and if the disk(s) where your VHDs are located provides high read throughput, and your VHDs are dynamic, you can use the Azure PowerShell command.

Microsoft Azure : Optimize VHD upload to Azure : Add-AzureVHD

Hi all,

It was an interesting week for me, dealing with 9 Terabytes of VHDs to upload to Azure. To be honest, I was surprised of the time it costs, because all the calculations we have made to predict the total time needed to upload, were unfortunately wrong. How and why ?

To upload VHDs to Azure, I used the Azure PowerShell cmdlet Add-AzureVHD. You can use the Add-AzureVHD by downloading and installing the Azure Powershell module : Download HERE. You should install the Azure powershell module on the machine from where you will initiate the upload.

The post aim is not to share how to use the Add-AzureVHD command, but to give you hints to get the best of it.

The upload process

When you upload a VHD to Azure using Add-AzureVHD, the following steps are conducted:

Step1 : Hash calculation : A MD5 hash is calculated against the VHD. This step can’t be skipped or avoided. The aim is to be able to check the VHD integrity after its upload to Azure

Step 2: Blob creation in Azure: A page blob is created in Azure with the same VHD size.

Step 3: Empty data blocks detection :  (For Fixed VHD type only) The process looks for the empty data blocks to avoid copying blank data blocks to Azure

Step 4 : Upload : The data is uploaded to Azure

How to optimize

Step1 : Hash calculation

The hash calculation depends on three factors: Disk speed, VHD size and the processor speed. Let’s optimize each factor:

  • Disk speed: The higher the read throughput is, the faster the hash calculation will be. If your VHD is placed on a SATA disk with 60 MB/s read throughput, the  hash calculation will work at 500 Mbits. So for a VHD of 500 GB, the hash calculation will need more than two hours. Place your VHD on fast disks to obtain significant time gain.
  • VHD size: The more your VHD is huge, the more the hash calculation will need time. The question is can we optimize it. The answer reside in  using dynamic VHDs. A dynamic VHD contains the same size of data within it. Imagine a 500 fixed VHD containing just 100 GB of data, imagine the waste of going through 400 GB of blank blocks to calculate the hash. In addition, you may compact your dynamic VHDs before uploading them to Azure, compacting dynamic VHDs can reduce the VHD size. You should know that the blob size that will be created in Azure during the upload process will be equal to the VHD size for fixed size VHDs and the maximum size for dynamic VHDs. But you have not to worry about that when compacting your dynamic VHD because you can later,  expand your VHD in Azure, in case you would like a greater VHD size.
  • Processor speed : The hash calculation is a mathematical operation, so it’s clear that the faster our processor is, the faster the calculation will be. However, todays processor are fast enough to handle such operations, and bottleneck here is the disk read throughput, unless you are using a 1 Ghz old Dual Core processor to calculate the hash of a VHD located on a RAID10 SSD drives on a 10 Gbits FC SAN. You can take a look to your task manager during a hash calculation to see the processor usage.

Step 2: Blob creation in Azure

In this step, a blob with the same VHD size will be allocated in Azure. Nothing to optimize

Step 3: Empty data blocks detection

This step is only performed if the VHD to be uploaded is a fixed size VHD. The Azure command scans the VHD to look for empty data blocks. I really like this  step because it can bring us an enormous upload time gain. Imagine that you want to upload a 500 GB fixed size VHD, and that really only 100 GB are used! Empty data blocks detection will let you gain 4x the upload time. In the other hand, this step is time consuming, because all the VHD is processed to look for empty data. For example, processing a 500 GB will take more than one hour. This why, again, uploading a dynamic VHD is more advantageous, no empty data

Step 4 : Upload

This is the final step, the data is uploaded to Azure. The only optimization is to have a fast internet connection (Fast upload link).

Lesson from an experience:

  • The VHDs to be uploaded should be dynamic expanding VHDs
  • If your VHDs are fixed size, convert them before uploading, you will gain a significant upload time (Hash calculation + Empty data blocks detection)
  • If your VHDs are already dynamic, try to compact them to the minimal size (Hash calculation gain). You can then expand them to the desired size in Azure.