Skip to main content

Chart and dashboards from time perspective (real time, near-real time, consolidation)

In this post we will talk about how we can group charts, dashboards and reports in different categories based on how fast we need to be able to ingest and update data inside them.
Time is a relative term especially when you put it together with business insights and application reporting. There are two important aspects related to time from business and application insights perspective.

1. Time Granularity
The first one is related to the time granularity. In the beginning, most of the business stakeholders require the granularity to be as small as possible, until they realize that there is not too much inside of that and that in order to be able to understand something, the time granularity needs to be increased.
Most of the systems that are available now on the market allow us to change the time internal (granularity) on the fly, enabling us to navigate inside our data from a different time perspective.

2. Time Interval
The second aspect is the time interval, from the moment the data arrives inside our storage system until the moment it can be displayed insight a specific report. I’m not saying that it is impossible to have all the data in moment 0 inside your reports, but it can be extremely expensive and you might not even need it.
From my experience, I’ve seen business stakeholders that at the beginning they would say that this data needs to be displayed inside the reports immediately. After 2 or 3 meetings they would realize that the report is generated for the previous week and there is no extra value if the data for the report is ready immediately or after 1 day. Of course, this has a direct impact on the cost and on how you implement the solution. Also terms like immediately or real-time is relative. Each stakeholder has a different understanding of it, even when they are in the same team. This is why it is important to clarify these from the beginning.

Taking this into considerations, I started to group reports and any kind of application insights in 4 different categories:

  • Real Time
  • Near-real Time
  • Reporting
  • Consolidation

These categories are defined based on the time interval that is allowed to exist from the moment the data arrives inside the backend system until the moment when it can be displayed inside the charts (reports).

It is important to know that a specific chart can be in multiple categories (e.g. Near-real time and reporting). Even so, you should look, as it would be a different chart for each category. The solutions that you might need to use might be different for each category.

Real Time
In general, I group any kind of insights that needs to be displayed, inside a chart or dashboard with a maximum of 2-3 seconds delays. These are the kind of views that are used by maintenance and support team to get real time insights about the system.
In general, when you have such a system you will try to modify in any ways the input data. Not because this is not possible, but because any ETL process might increase the latency between the moment when you receive the data and the moment that you can display it.
Nowadays, it is a real trend for companies to require this kind of charts, but the reality is that not to many of them really need a real time chart. For example, if you don’t have a team that is monitoring 24/7 the system, there is not too much use in having a chart like this.
Additional to this not all the data that is produced by a system needs to be inside a real time chart. A good example is when you monitor a pool of servers. You care about their state and you might want to know the current state of each server and the applications that are running inside of them. There is no extra value if you see in real time that the number of processes that are running on the servers is changing or other similar things. Except when you are debugging it, but for this we have Performance Counters and other ways to collect metrics.

Real time charts are used especially in trading and monitor production lines inside factories for example. In most of the cases even if the initial requirement is defined around real time, once you start to clarify the business use case and what is required you realize that it is not nothing more than a near-real time chart.

Near-real Time
This is that kind of chart where data is updated every few seconds. The latency from the moment the data arrives in the backend until the moment when it is displayed is less than 30-60 seconds. Most of the real time charts are in reality near-real time, where there is no impact on the business if data is displayed with a latency of 30-60 seconds.
From the latency perspective, I would even say that even a few minutes latency (2-5 minutes) is still in near-real time. For this kind of system, the most important element is not the updated time interval, but the ability to drill inside the data and get different perspectives for the same data. In this way, the support team will be able to identify issues before they even happen.
Most of the current reporting and monitoring tools that are available nowadays are part of this category. From a running cost perspective, the difference between a real-time and a near-real time system can be 3x or even 5x.
A pretty new but powerful tool that can be used for this kind of charts and insights, is Azure Time Series Insights that has great capabilities and can dynamically change the way you look at data.

Reporting
This is the category of charts and reporting capability that I usually like to call it a classical. This are that kind of reports that can be generated every 1 hour, 1 day or every few days. There is nothing special from this perspective.
The new versions of systems that are available for this category are offering the ability to create dynamic reports where an user can drill down inside the data based on their needs.

Consolidation
Many times this category overlaps with the previous one. This is happening because the tools that are used to create consolidation charts are in general the same with the one used for reporting. There are only some edge cases, when because of the data volume and complexity, Hadoop and other similar tools are used to pre-process data before pushing them inside the reporting systems.
The charts that are part of the consolidation category are those kind of charts that you generate one time per week, month or quarter and that are used by the business stakeholders to get a high level view on their business and to get a status.

Mixing them
Reporting and Consolidation categories can be mixed with success in the same tool, allowing users to have an high-level overview, but in the same time being able to drill down inside the data.
In some cases, Near-real time and Reporting categories can be combined, but I don’t recommend this. The biggest problem is that because of the different perspectives that can exist. If for the Near-real time, most of the charts are around time perspective, for Reporting, the charts can be around other points, not only related to time.
From Real time, it is easy to have perspective from a near-real time perspective and this two go hand in hand.

Conclusion
Based on the data refresh time and how fast we can process new data, we can generate a different perspective of the same chart. Even if it is trendy to be able to display real/near time application insights, ask yourself if this is really required and what are the tradeoffs and costs. There is no sense to offer this kind of solutions if there is no real business requirement nor it adds extra value.
Additionally to this, it is not the same thing to store real time data that is sent with a frequency of 10 milliseconds vs 5 minutes. Storage and processing costs are different. Also the tools and mechanisms that are used for this will be different.
YES, a solution can have insights from all this 4 categories. YES, you might even find ways to store them in the same storage type or system. And YES, you’ll need to be able to create static data points for Reporting and Consolidation category. You might not want to process 100GB of data where data frequency is 10 milliseconds to generate a report for the last 3 months where data point perspective is at 1 day.

Comments

Popular posts from this blog

Windows Docker Containers can make WIN32 API calls, use COM and ASP.NET WebForms

After the last post , I received two interesting questions related to Docker and Windows. People were interested if we do Win32 API calls from a Docker container and if there is support for COM. WIN32 Support To test calls to WIN32 API, let’s try to populate SYSTEM_INFO class. [StructLayout(LayoutKind.Sequential)] public struct SYSTEM_INFO { public uint dwOemId; public uint dwPageSize; public uint lpMinimumApplicationAddress; public uint lpMaximumApplicationAddress; public uint dwActiveProcessorMask; public uint dwNumberOfProcessors; public uint dwProcessorType; public uint dwAllocationGranularity; public uint dwProcessorLevel; public uint dwProcessorRevision; } ... [DllImport("kernel32")] static extern void GetSystemInfo(ref SYSTEM_INFO pSI); ... SYSTEM_INFO pSI = new SYSTEM_INFO(

Azure AD and AWS Cognito side-by-side

In the last few weeks, I was involved in multiple opportunities on Microsoft Azure and Amazon, where we had to analyse AWS Cognito, Azure AD and other solutions that are available on the market. I decided to consolidate in one post all features and differences that I identified for both of them that we should need to take into account. Take into account that Azure AD is an identity and access management services well integrated with Microsoft stack. In comparison, AWS Cognito is just a user sign-up, sign-in and access control and nothing more. The focus is not on the main features, is more on small things that can make a difference when you want to decide where we want to store and manage our users.  This information might be useful in the future when we need to decide where we want to keep and manage our users.  Feature Azure AD (B2C, B2C) AWS Cognito Access token lifetime Default 1h – the value is configurable 1h – cannot be modified

What to do when you hit the throughput limits of Azure Storage (Blobs)

In this post we will talk about how we can detect when we hit a throughput limit of Azure Storage and what we can do in that moment. Context If we take a look on Scalability Targets of Azure Storage ( https://azure.microsoft.com/en-us/documentation/articles/storage-scalability-targets/ ) we will observe that the limits are prety high. But, based on our business logic we can end up at this limits. If you create a system that is hitted by a high number of device, you can hit easily the total number of requests rate that can be done on a Storage Account. This limits on Azure is 20.000 IOPS (entities or messages per second) where (and this is very important) the size of the request is 1KB. Normally, if you make a load tests where 20.000 clients will hit different blobs storages from the same Azure Storage Account, this limits can be reached. How we can detect this problem? From client, we can detect that this limits was reached based on the HTTP error code that is returned by HTTP