Skip to main content

NoSQL in 5 minutes

NoSQL – one of 2013’s trends. If three or four years ago we rarely heard about a project to use NoSQL, nowadays the number of projects using non-relational databases is extremely high. In this article we will see the advantages In NoSQL taxonomy, a document is and challenges we could have when we use seen as a record from relational databases
NoSQL. In the second part of the article we will analyze and emphasize several nonrelational solutions and their benefits.

What is NoSQL?
The easiest definition would be: NoSQL is a database that doesn’t respect the rules of a non-relational database (DBMS). A non-relational database is not based on a relational model. Data is not groups in tables; therefore there is no mathematical relationship between them.
These databases are built in order to run on a large cluster. Data from such storage does not have a predefined schema. For this reasons, any new field can be added without any problem. NoSQL has appeared and developed around web applications, consequently the vast majority of functionalities are those that a web application has.

Benefits and risks
A non-relational database model is a flexible one. Depending on the solution that we use, we could have a very ‘hang loose’ model that can be changed with a minimum cost. There are many NoSQL solutions that are not model-based. For example, even though Cassandra and HBase have a pre-defined model, adding a new field can be easily done. There are various solutions that can store any kind of data structure without defining a model. An example could be those storing keyvalue-pairs or documents.    and collections are seen as tables. The main difference is that in a table we will have records with the same structure, while a collection can have documents with different fields.
Non-relational databases are much more scalable than the classical ones. If we want to scale in a relational database we need powerful servers instead of adding some machines with a normal configuration to the cluster. This is due to the way in which a relational database works and adding a new node can be expensive.
The way in which a relational database is built easily allows a horizontal scaling. Moreover, these databases are suitable for virtualization and cloud.
Taking into account the databases’ dimensions and the growing number of transactions, a relational database is much more expensive than NoSQL. Solutions like Hadoop can process a lot of data. They are extremely horizontally scalable, which makes them very attractive.
Concerning costs, a non-relational database is a lot cheaper. We do not need hardware custom or special features to create a very powerful cluster. Using some regular servers, we can have an efficient database.
Certainly, NoSQL is not only milk and honey. Most of the solutions are rather new on the market compared to relational databases. For this reason some important functionalities  may be missing – business mine and business intelligent. NoSQL has evolved to meet the requirements of web applications, which is the main cause for some missing features, not necessary on the web. That does not mean that they are missing and cannot be found, rather they are not quite mature enough or specific to the problem that the NoSQL solution is trying to solve.
Because they are so new to the market, many NoSQL solutions are pre-production versions, which cannot be used every time in the world of enterprise. The lack of official support for some products could be a stopper for medium and large projects.
The syntax with which we can interrogate a NoSQL database is different from a simple SQL query. We usually need to have some programming concepts. The number of experts in NoSQL databases is much lower than the one in SQL. The administration may be a nightmare, because support for administrators is presently weak.
However, ACID and transactions support is not common in NoSQL storage. Queries that can be written are pretty simple, and sometimes storages do not allow us to „join” the collections, therefore we have to write the code to do this.
All these issues will be solved in time, and the question we must ask ourselves when we think about architecture and we believe NoSQL could help is „Why not?”

The most widely used NoSQL solutions
 On the market there are countless NoSQL solutions. There is no universal solution to solve all the problems we have. For this reason, when the we want to inteseveral types of storage. We may identify within our application several problems which require a NoSQL solution. We may need different solutions for each of these cases. This would add extra complexity because we would have two storages that we need to integrate.
MongoDB
This is one of the most used types of storage. In this type of storage all content is stored in the form of documents. Over these collections of documents we can perform any kind of dynamic queries to extract different data. In many ways MongoDB is closest to a relational database. All data we want to store is kept as a hash facilitating information retrieval. Basic CRUD operations work quickly on MongoDB.
It is a good solution when you need to store a lot of data that must be accessed in a very short time. MongoDB is a storage which can be used successfully. If we do not perform many insert, update and delete operations, information remains unchanged for a period of time. It can be successfully used when properties are stored as a query and /or index. For example, in a voting system, CMS or a storage system for comments.  Another case in which it can be used is to store lists of categories and products in an online store. Due to the fact that it is directed to queries and the list of products does not change every two seconds, queries to be made on them will be rapid.
Another benefit is the self-share. A MongoDB database can be very easily held on 2/3 servers. The mechanism for data and documented.
Cassandra
It is the second on the list with eCommerce solutions for the storage of NoSQL solutions. This storage can become our friend when we have data that changes frequently. If the problem we want to solve is dominated by insertions and modifications of stored data, then Cassandra is our solution. Compared to insert and change, any query we do on our data is much slower. This storage is more oriented to writings, than to queries that retrieve data. If in MongoDB the data we work with was seen as documents with a hash attached to each of them, Cassandra stores all content in the form of columns.
In MongoDB, the data we access may not be in the latest version. Instead, Cassandra guarantees us the data we obtain through queries has the latest version. So if we access an email that is stored with the help of Cassandra, we get the latest version of the message. This solution can be installed in multiple data centers from different locations, providing support for failover or back-up - extremely high availability.
you have an eCommerce solution, where we need a storage system for our shopping cart. Insert and update operations will be done quickly, and each data query will bring the latest version of the shopping cart - this is very important when we perform check-out.
Cassandra came to be used in the financial industry, being ideal due to the performance of insert operations. In this environment data changes very often, the actions’ value being new in every moment.
CouchDB
 If most of the operations we perform are just insert and read, no update, then CouchDB is a much better solution. This storage is targeted only to read and write operations.
Cassandra is a storage that can be successfully used as a tool for logging. In such a system we have many scripts, and the queries are rare and quite simple. For this reason it is the ideal solution when Besides this, we have an efficient support to pre-define queries and control the different versions that stored data may have. Therefore, update operations are not so fast. From all storages presented so far, this is the first storage that guarantees us ACID through the versioning system it implements.
Another feature of this storage is the support for replication. CouchDB is a good solution when we want to move the database offline. For example, on a mobile device that does not have an internet connection. Through this functionality, we have support for the distributed architecture to support replication in both directions.
It can be a solution for applications on mobile devices, which do not have 24 hour internet connectivity. Simultaneously, it is very useful in case of a CMS or CRM, where we need versioning and predefined queries.
HBase
 This database is entirely integrated into Hadoop. The aim is to be used when we need to perform data analysis. HBase is designed to store large amounts of data that could normally not be stored in a normal database.
It can work in memory without any problem, and the data it stores can be compressed. It is one of the few NoSQL databases that support this feature. Due to its particularity, Hbase is used with Hadoop. In some cases, when working with tens / hundreds of millions of records, Hbase is worth being used.
Membase
As the name implies, this non-relational database can stay in memory. It is a perfect solution with very low latency, and and content replication becomes an easy process.
It is very common in games backend, especially online. Many systems that work with real-time data they need to manipulate or show use Membase storage. In these cases Membase may not be the only storage level that the application uses.
Redis
This storage is perfect when the number of the updates we need to do on our data is very high. It is an optimized storage for such operations. It is based on a very simple key-value. Therefore the queries that can be made are very limited. Although we have support for transactions, there is still not enough mature support for clustering. This can become a problem when the data we want to store does not fit in memory - the size of the database is related to the amount of internal memory.
Redis is quite interesting when we have real-time systems that need to communicate. In these cases Redis is one of the best solutions. There are several stock applications using this storage.

What does the future hold for us?
We see an increasing number of applications that use NoSQL. This does not mean that relational databases will disappear. The two types of storage will continue to exist and often coexist. Hybrid applications, which use both relational databases and NoSQL, are becoming more common. Also, an application does not need to use only a single database. There are solutions using two or more NoSQL databases. A good example is an eCommerce application that can use MongoDB to store the list of items and categories, and Cassandra to store the shopping cart to each of the clients.

Conclusion
In conclusion, we can say that NoSQL databases that must be part of our area of knowledge. Compared to relational databases we have many options, and each of these does one thing very well. In the NoSQL world we do not have storage to solve all the problems we may have. Each type of storage can solve different problems. The future belongs neither to non-relational databases, nor to relational ones. The future belongs to applications that use both types of storage, depending on the needs.

Comments

Popular posts from this blog

Windows Docker Containers can make WIN32 API calls, use COM and ASP.NET WebForms

After the last post , I received two interesting questions related to Docker and Windows. People were interested if we do Win32 API calls from a Docker container and if there is support for COM. WIN32 Support To test calls to WIN32 API, let’s try to populate SYSTEM_INFO class. [StructLayout(LayoutKind.Sequential)] public struct SYSTEM_INFO { public uint dwOemId; public uint dwPageSize; public uint lpMinimumApplicationAddress; public uint lpMaximumApplicationAddress; public uint dwActiveProcessorMask; public uint dwNumberOfProcessors; public uint dwProcessorType; public uint dwAllocationGranularity; public uint dwProcessorLevel; public uint dwProcessorRevision; } ... [DllImport("kernel32")] static extern void GetSystemInfo(ref SYSTEM_INFO pSI); ... SYSTEM_INFO pSI = new SYSTEM_INFO(

Azure AD and AWS Cognito side-by-side

In the last few weeks, I was involved in multiple opportunities on Microsoft Azure and Amazon, where we had to analyse AWS Cognito, Azure AD and other solutions that are available on the market. I decided to consolidate in one post all features and differences that I identified for both of them that we should need to take into account. Take into account that Azure AD is an identity and access management services well integrated with Microsoft stack. In comparison, AWS Cognito is just a user sign-up, sign-in and access control and nothing more. The focus is not on the main features, is more on small things that can make a difference when you want to decide where we want to store and manage our users.  This information might be useful in the future when we need to decide where we want to keep and manage our users.  Feature Azure AD (B2C, B2C) AWS Cognito Access token lifetime Default 1h – the value is configurable 1h – cannot be modified

What to do when you hit the throughput limits of Azure Storage (Blobs)

In this post we will talk about how we can detect when we hit a throughput limit of Azure Storage and what we can do in that moment. Context If we take a look on Scalability Targets of Azure Storage ( https://azure.microsoft.com/en-us/documentation/articles/storage-scalability-targets/ ) we will observe that the limits are prety high. But, based on our business logic we can end up at this limits. If you create a system that is hitted by a high number of device, you can hit easily the total number of requests rate that can be done on a Storage Account. This limits on Azure is 20.000 IOPS (entities or messages per second) where (and this is very important) the size of the request is 1KB. Normally, if you make a load tests where 20.000 clients will hit different blobs storages from the same Azure Storage Account, this limits can be reached. How we can detect this problem? From client, we can detect that this limits was reached based on the HTTP error code that is returned by HTTP