I am working on a real-time analytical system that uses a database schema which was designed without giving much thought to the performance. The current design is more like a flat table approach where we have the measurable fields and non-measurable fields all combined together in a single table.
There is no intermediate layer between the UI and the database, the UI queries directly on the DB, apply aggregations and returns the results. The table has about 100 measurable fields and 100 non-measurable fields and data is refreshed every X minutes currently and could go up to every X seconds later.
What is the best data modeling approach for such kind of system?
I initially thought of going with the dimensional modeling approach and create an intermediate layer for a cube, which could help with the pre-calculations to a far extent. But how to handle every minute refresh of the cube is not clear for me?
In short, I am trying to understanding how to redesign the existing DB schema to optimize the performance of the system and reduce query response time.
Any suggestions/thoughts on this would be highly appreciated.
It might be a good idea to re-think the database in the backend.
I couldn't find any information about the database being used in your question but I presume it could be a row-based database like MySQL, Oracle or MSSQL with the standard engines.
If you do have a large amount of columns, you could switch to a colum-based database like HBase in Hadoop or Cassandra. This might give the best perfomance boost depending on your requirements.