Skip to main content

Azure Tables Storage (Day 19 of 31)

List of all posts from this series: http://vunvulearadu.blogspot.ro/2014/11/azure-blog-post-marathon-is-ready-to.html

Short Description 
Azure Table Services are part of Azure Storage and give us the possibility to store large amount of data in a structured way. It allow us to store collections of entities in so called table. If we look over this service from NoSQL side, we could say that so called table in Azure Table are the collections of NoSQL world.


Main Features 
A part of the features of Azure Table are similar with Azure Blob Storage. This is happening because both services are constructed over the same infrastructure and have a common base.
Different entities schema in the same table
We are allowed to store under the same table entities with different schema. For example we can store entity ‘Car’ and ‘House’ in the same table as long as in the moment when we retrieve the entity we can detect the type of entity.
Size
The size of tables is unlimited. We can store large collection of entities in Azure Table without any kind of problem (the size can be reach more than 1 TB). The limit of the maximum size is the same for Azure Blob Storage – 500TB (for now).
RESTFull and OData
All the access is made over a REST API that support OData standard. This mean that we can query the storage very easily from any kind of device.
Tables Count
We don’t have a real limit of the number of tables that we can create under the same storage. Theoretically, we can create as many tables we want as long we have enough space.
Simple hierarchy
The hierarchy of Azure Table database is very simple. Under Storage Account we have 0..N tables. Each table can contains 0..M entities. Each entity can have from 3..256 properties.
Partition Key
It is used to group entities that are similar from a table. Partition key is used by Azure Table to partition the tables in different nodes when the table size is too big.
Row Key
The unique ID of an entity in the same partition. In a table, we can have multiple Row Key with the same value as long as the partition key for each entity is different.
Entity Key (ID)
The unique ID of each entity is constructed from partition key + row key. This key is unique per table and can be used to retrieve entities.
Timestamp
This is the property key that give us information when was the last time when the entity was changed. Can be used with success to detect if the entity value changes during a specific time period (ETag).
This value cannot be set. It is set automatically by the system. Any kind of changes of this property will be ignored by the system.
Batch Support
We have the ability to execute a batch over Azure Table. You can maximum have 100 actions in a batch. The changes from the batch can affect only on partition from a table.
Property Supported Types
The range of types supported by properties is very large, starting from INT and STRING to bool or array of bytes. Below you can find the list of all types that are supported:

  • byte[]
  • bool
  • DateTime
  • Double
  • Guid
  • String
  • Int32
  • Int64

Security
There are 3 different methods to manage the access to your blob content:

  • Shared Access Signature – you have full control to manage read, write, manage access at container and blob level for each item. In this way different users can have different access level
  • ACL – Similar with Shared Access Signature, but allow us better management mechanism. In this way for a specific key (SAS key) we can manage from the backend the access writes without having to revoke it.

Pagination Support
When we execute queries that retrieves large about of data we can use the continuous token to be able to iterate over all the entities retrieved by query. This token contains 3 parts:

  • NextTableName
  • NextPartitionKey
  • NextRowKey

Query
Azure Tables have query support. The queries support the fallowing components:

  • $filter – used to filter entities based on client rules
  • $top – returns the first N entities 
  • $selected – returns only the property that client request

$filter
Has support for base filter rules like equal, greater, less, not equal, less or equal, grater and equal. On top of this we have support for Boolean operations (and, or, not).
Redundant
Azure Blob storage give us multiple options when we talk about redundancy. By default we have redundancy at data center level – this mean that all the time there are 3 copies of the same content (and you pay only one). On top of this there are other 3 options of redundancy that I will describe below (plus the one that we already talk about):

  • LRS (Local Redundant Storage) – Content is replicate 3 times in the same data center (facility unit from region)
  • ZRS (Zone Redundant Storage) – Content is replicated 3 times in the same regions cross 2 datacenters (facilities) where 2 datacenters are available. Otherwise the content is replicated 3 times into 2 different regions
  • GRS (Geo Redundant Storage) – Content is replicated 6 times across 2 regions (3 times in the same region and 3 times across different regions)
  • RA- GRS (Read Access Geo Redundant Storage) – Content is replicated in the same way as for GRS, but you have read only access in the second region. For the GRS even if the data exist in the second region you cannot access it directly.

Limitations 
  • Like other NoSQL storages, Azure Table cannot manage complex joins, FK or store procedures. In this moment you have support for querying, but you should keep in mind that only on the index properties the query will work (fast). 
  • Each entity can have maximum 256 property. The maxim size of an entity can be 1MB.
  • A batch can contains maximum 100 actions and can be executed only over the same partitions of a table, with the payload maximum size of 4MB.
  • The length of a table name can be between 3 and 64 characters. 
  • Maximum size of Partition Key or Row Key is 1KB (each of them).

Applicable Use Cases
Below you can find 3 uses cases when I would use Azure Tables.
Logs
I would use Azure Table to store large amount of logs, splitter in different tables based on date and source. In this way the traceability step and cleaning steps are pretty simple.
Software updates assignments
If we have a system that needs to store the software updates assignments for each device than we could think to use Azure Tables. We can store in each table the assignments for each system. We can group the assignments of each system based on the applications type using partition key.
Content tags and description
If we are using Azure Blobs for example to store our binary content, than we may need a place where we would like to store the list of tags for each binary content, plus descriptions and other information. For this case we could use Azure Table with success.

Code Sample 


// Retrieve storage account from connection string
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
    CloudConfigurationManager.GetSetting("StorageConnectionString"));

// Create the table client
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();

//Create the CloudTable that represents the "people" table.
CloudTable table = tableClient.GetTableReference("people");

// Define the query, and only select the Email property
TableQuery<DynamicTableEntity> projectionQuery = new TableQuery<DynamicTableEntity>().Select(new string[] { "Email" });

// Define an entity resolver to work with the entity after retrieval.
EntityResolver<string> resolver = (pk, rk, ts, props, etag) => props.ContainsKey("Email") ? props["Email"].StringValue : null;

foreach (string projectedEmail in table.ExecuteQuery(projectionQuery, resolver, null, null))
{
    Console.WriteLine(projectedEmail);
}

Source: http://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-how-to-use-tables/
Pros and Cons 
Pros

  • Unlimited number of tables and entities
  • Maximum size of database very high (same as for Azure Storage)
  • Fast and easy to query
  • Querying over partition key and row key very fast (plus timestamp)
  • Cheap

Cons

  • Not all properties are indexed (only partition key and row key)
  • Limited number of properties per entity
  • Querying on properties that are not indexed required getting the entities from Azure to client machine


Pricing
When you start to calculate the cost of Azure Blob Storage you should take into account the following things:

  • Capacity (size)
  • Number of Transactions
  • Outbound traffic
  • Traffic between facilities (data centers)


Conclusion
Yes, Azure Table are a good option if we need a NoSQL solution to store content in (key,value) format. If you need more than that I recommend looking at DocumentDB. There are use cases when Azure Table is the perfect solution for you.
Personal note: Take into account the number of transactions that you are executing,  because it can affects the costs at the end of the month.

Comments

Popular posts from this blog

Windows Docker Containers can make WIN32 API calls, use COM and ASP.NET WebForms

After the last post , I received two interesting questions related to Docker and Windows. People were interested if we do Win32 API calls from a Docker container and if there is support for COM. WIN32 Support To test calls to WIN32 API, let’s try to populate SYSTEM_INFO class. [StructLayout(LayoutKind.Sequential)] public struct SYSTEM_INFO { public uint dwOemId; public uint dwPageSize; public uint lpMinimumApplicationAddress; public uint lpMaximumApplicationAddress; public uint dwActiveProcessorMask; public uint dwNumberOfProcessors; public uint dwProcessorType; public uint dwAllocationGranularity; public uint dwProcessorLevel; public uint dwProcessorRevision; } ... [DllImport("kernel32")] static extern void GetSystemInfo(ref SYSTEM_INFO pSI); ... SYSTEM_INFO pSI = new SYSTEM_INFO(

Azure AD and AWS Cognito side-by-side

In the last few weeks, I was involved in multiple opportunities on Microsoft Azure and Amazon, where we had to analyse AWS Cognito, Azure AD and other solutions that are available on the market. I decided to consolidate in one post all features and differences that I identified for both of them that we should need to take into account. Take into account that Azure AD is an identity and access management services well integrated with Microsoft stack. In comparison, AWS Cognito is just a user sign-up, sign-in and access control and nothing more. The focus is not on the main features, is more on small things that can make a difference when you want to decide where we want to store and manage our users.  This information might be useful in the future when we need to decide where we want to keep and manage our users.  Feature Azure AD (B2C, B2C) AWS Cognito Access token lifetime Default 1h – the value is configurable 1h – cannot be modified

What to do when you hit the throughput limits of Azure Storage (Blobs)

In this post we will talk about how we can detect when we hit a throughput limit of Azure Storage and what we can do in that moment. Context If we take a look on Scalability Targets of Azure Storage ( https://azure.microsoft.com/en-us/documentation/articles/storage-scalability-targets/ ) we will observe that the limits are prety high. But, based on our business logic we can end up at this limits. If you create a system that is hitted by a high number of device, you can hit easily the total number of requests rate that can be done on a Storage Account. This limits on Azure is 20.000 IOPS (entities or messages per second) where (and this is very important) the size of the request is 1KB. Normally, if you make a load tests where 20.000 clients will hit different blobs storages from the same Azure Storage Account, this limits can be reached. How we can detect this problem? From client, we can detect that this limits was reached based on the HTTP error code that is returned by HTTP