Thread (33 messages) 33 messages, 6 authors, 2021-10-11

RE: [Patch v5 0/3] Introduce a driver to support host accelerated access to Microsoft Azure Blob for Azure VM

From: Long Li <longli@microsoft.com>
Date: 2021-08-07 18:29:13
Also in: linux-hyperv, lkml

Subject: Re: [Patch v5 0/3] Introduce a driver to support host accelerated
access to Microsoft Azure Blob for Azure VM

On Thu, Aug 05, 2021 at 06:24:57PM +0000, Long Li wrote:
quoted
quoted
Subject: Re: [Patch v5 0/3] Introduce a driver to support host
accelerated access to Microsoft Azure Blob for Azure VM

On 8/5/21 12:00 AM, longli@linuxonhyperv.com wrote:
quoted
From: Long Li <longli@microsoft.com>

Azure Blob storage [1] is Microsoft's object storage solution for
the cloud. Users or client applications can access objects in Blob
storage via HTTP, from anywhere in the world. Objects in Blob
storage are accessible via the Azure Storage REST API, Azure
PowerShell, Azure CLI, or an Azure Storage client library. The
Blob storage interface is not designed to be a POSIX compliant
interface.
quoted
quoted
quoted
Problem: When a client accesses Blob storage via HTTP, it must go
through the Blob storage boundary of Azure and get to the storage
server through multiple servers. This is also true for an Azure VM.

Solution: For an Azure VM, the Blob storage access can be
accelerated by having Azure host execute the Blob storage requests
to the backend storage server directly.

This driver implements a VSC (Virtual Service Client) for
accelerating Blob storage access for an Azure VM by communicating
with a VSP (Virtual Service
Provider) on the Azure host. Instead of using HTTP to access the
Blob storage, an Azure VM passes the Blob storage request to the
VSP on the Azure host. The Azure host uses its native network to
perform Blob storage requests to the backend server directly.

This driver doesn't implement Blob storage APIs. It acts as a fast
channel to pass user-mode Blob storage requests to the Azure host.
The user-mode program using this driver implements Blob storage
APIs and packages the Blob storage request as structured data to
VSC. The request data is modeled as three user provided buffers
(request, response and data buffers), that are patterned on the
HTTP model used by existing Azure Blob clients. The VSC passes
those buffers to VSP for Blob
storage requests.
quoted
The driver optimizes Blob storage access for an Azure VM in two ways:

1. The Blob storage requests are performed by the Azure host to
the Azure Blob backend storage server directly.

2. It allows the Azure host to use transport technologies (e.g.
RDMA) available to the Azure host but not available to the VM, to
reach to Azure Blob backend servers.

Test results using this driver for an Azure VM:
100 Blob clients running on an Azure VM, each reading 100GB Block
Blobs.
quoted
quoted
quoted
(10 TB total read data)
With REST API over HTTP: 94.4 mins Using this driver: 72.5 mins
Performance (measured in throughput) gain: 30%.

[1]
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdo
quoted
quoted
cs
quoted
.microsoft.com%2Fen-us%2Fazure%2Fstorage%2Fblobs%2Fstorage-
blobs-
quoted
quoted
intro
quoted
duction&amp;data=04%7C01%7Clongli%40microsoft.com%7C6ba60a78f4e74
quoted
quoted
aeb0b
quoted
b108d95833bf53%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6376
quoted
quoted
378015
quoted
92577579%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoi
quoted
quoted
V2luMzIiL
quoted
CJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ab5Zl2cQdmUhdT3l
quoted
quoted
SotDwMl
quoted
DQuE0JaY%2B1REPQ0%2FjXa4%3D&amp;reserved=0
Is the ioctl interface the only user space interface provided by
this kernel driver? If so, why has this code been implemented as a
kernel driver instead of e.g. a user space library that uses vfio to
interact with a PCIe device? As an example, Qemu supports many
different virtio device types.
quoted
The Hyper-V presents one such device for the whole VM. This device is
used by all processes on the VM. (The test benchmark used 100
processes)

Hyper-V doesn't support creating one device for each process. We cannot
use VFIO in this model.

I still think this "model" is totally broken and wrong overall.  Again, you are
creating a custom "block" layer with a character device, forcing all userspace
programs to use a custom library (where is it at?) just to get their data.
The Azure Blob library (with source code) is available in the following languages:
Java: https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/storage/azure-storage-blob
JavaScript: https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/storage/storage-blob
Python: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/storage/azure-storage-blob
Go: https://github.com/Azure/azure-storage-blob-go
.NET: https://github.com/Azure/azure-sdk-for-net/tree/main/sdk/storage/Azure.Storage.Blobs
PHP: https://github.com/Azure/azure-storage-php/tree/master/azure-storage-blob
Ruby: https://github.com/azure/azure-storage-ruby/tree/master/blob
C++: https://github.com/Azure/azure-sdk-for-cpp/tree/main/sdk/storage#azure-storage-client-library-for-c
There's a reason the POSIX model is there, why are you all ignoring it?
The Azure Blob APIs are not designed to be POSIX compatible. This driver is used
to accelerate Blob access for a Blob client running in an Azure VM. It doesn't attempt
to modify the Blob APIs. Changing the Blob APIs will break the existing Blob clients.

Thanks,
Long
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help