high volume data

The Challenge

  1. Large scale retail chain business or Insurance industries have data accumulated on sales, inventory, shipment, delivery and customers on a day-to-day basis.
  2. With the amount of increasing data, tradition BI systems face issue in either storing the huge amount of data or applying the required business logic over it, resulting in performance issues
  3. BI data governance allows connection to data in two modes, live and extract
    Live Mode: Creates a live connection to the data source, enabling access to the latest information at any time but fails when processing is done inside the Bi tool, as each time it has to query the data source to get data to process it.
    Extract Mode: creates a snapshot of the data at any point in time, increasing the processing speed of the analytical logics applied over the data, but this increases the load on the Bi file to have its size increase at every refresh, and depends on the system spec to support it.

The Solution

The solution proposed is at an architectural level, managing the data model, by leveraging Tableau’s online cloud platform and its ability to host extracted data source which is stored in hyper format, (Hyper is Tableau’s in-memory Data Engine technology optimized for fast data ingest and analytical query processing on large or complex data sets)
Solution Architecture:
Hybrid Model

  1. The proposed model would have data extracted from the data sources, and have it stored in tableau online as separate standalone hyper file data sources, to which tableau dashboards can connect to.
  2. These extracts are refreshed at regular intervals as per the business requirement, and on-premise data is communicated to the cloud platform via Tableau Bridge which ensures data security.
  3. The created cloud data sources can be used as data sources for as many dashboards as required.

The Outcome

  1. The incremental data refresh feature in the tableau online allowed to get data at regular intervals and have it processed and stored, ready to be displayed in the dashboards.
  2. The volume of data was handled as scaling the cloud server was easy.
  3. The dashboards performance was increased to, Milli seconds in handling and retrieving data from millions of records.
  4. Any KPI that involved complex visualization or computation, handled the volume of data with ease and returning data at an very high rate.