Real-Time Dashboard with Open-Source Tools

Overview

A real-time dashboard is a dashboard that contains charts that are automatically updated with the most current data available. The typical use case is to load a chart with some historical data first and then live update it as new data comes in. In this tutorial, you will learn how to build such real-time dashboards with only open-source tools and without any third-party services.

The main challenge of building such a dashboard is to design a proper architecture to react to changes in data all the way up from the database to the charts on the frontend. The part from the server to the frontend is a simple one, since we have a lot of technologies and frameworks built to handle real-time data updates. Going from database to server is much trickier. The underlying problem is that most of the databases, which are good for analytic workload, don't provide out-of-the-box ways to subscribe to changes in the data. Instead, they are designed to be polled.

Cube, which acts as a middleman between your database and analytics dashboard, can provide a real-time WebSockets-based API for the frontend, while polling the database for changes in data.

You can check out the demo of real-time dashboard built with Cube here.

Alt Text

On the frontend, Cube provides an API to load initial historical data and subscribe to all subsequent updates.

import cubejs from '@cubejs-client/core';
import WebSocketTransport from '@cubejs-client/ws-transport';

const cubejsApi = cubejs({
  transport: new WebSocketTransport({
    authorization: CUBEJS_TOKEN,
    apiUrl: 'ws://localhost:4000/'
  })
});

cubejsApi.subscribe({
  measures: ['Logs.count'],
  timeDimensions: [{
    dimension: 'Logs.time',
    granularity: 'hour',
    dateRange: 'last 1440 minutes'
  }]
}, (e, result) => {
  if (e) {
    // handle new error
  } else {
    // handle new result set
  }
});

In our tutorial, we are going to use React as a frontend framework. Cube has a @cubejs-client/react package, which provides React components for easy integration of Cube into the React app. It uses React hooks to load queries and subscribes for changes.

import { useCubeQuery } from '@cubejs-client/react';

const Chart = ({ query, cubejsApi }) => {
  const {
    resultSet,
    error,
    isLoading
  } = useCubeQuery(query, { subscribe: true, cubejsApi });

  if (isLoading) {
    return <div>Loading...</div>;
  }

  if (error) {
    return <pre>{error.toString()}</pre>;
  }

  if (!resultSet) {
    return null;
  }

  return <LineChart resultSet={resultSet}/>;
};

In this tutorial, I'll show you how to build a real-time dashboard either with MongoDB or BigQuery. The same approach could be used for any databases that Cube supports.

Update from April 2023. This guide was authored more than 3 years ago and certain parts (e.g., generation of the front-end boilerplate code) are not relevant anymore. Please see up-to-date front-end guides in the blog.

Cube API with MongoDB

Feel free to jump to the next part if you want to use BigQuery instead of MongoDB

For quite a long time, doing analytics with MongoDB required additional overhead compared to modern SQL RDBMS and Data Warehouses associated with aggregation pipeline and MapReduce practices. To fill this gap, MongoDB released the MongoDB connector for BI, which acts as a MySQL server on top of your MongoDB data. Under the hood, it bridges existing aggregation mechanisms to the MySQL protocol, allowing standard MySQL clients to connect and issue SQL queries.

Setting up MongoDB and BI Connector

If you don’t have a MongoDB instance, you can download it here. The BI Connector can be downloaded here. Please make sure you use the MongoDB version that supports the MongoDB connector for BI.

After the BI connector has been installed, please start a mongod instance first. If you use the downloaded installation, it can be started from its home directory like so:

$ bin/mongod

The BI connector itself can be started the same way:

$ bin/mongosqld

Please note that mongosqld resides in another bin directory. If everything works correctly, you should see a success log message in your shell for the mongosqld process:

[initandlisten] waiting for connections at 127.0.0.1:3307

If you’re using the MongoDB Atlas, you can use this guide to enable BI connector.

Getting a Sample Dataset

You can skip this step if you already have data for your dashboard.

We host a sample events collection, which you can use for a demo dashboard. Use the following commands to download and import it.

$ curl https://cube.dev/downloads/events-dump.zip > events-dump.zip
$ unzip events-dump.zip
$ bin/mongorestore dump/stats/events.bson

Please make sure to restart the MongoDB BI connector instance in order to generate an up-to-date MySQL schema from the just added collection.

Creating Cube Application

We are going to use Cube CLI to create our new Cube application with the MongoBI driver:

$ npx cubejs-cli create real-time-dashboard -d mongobi

Go to the just created real-time-dashboard folder and update the .env file with your MongoDB credentials.

CUBEJS_DB_HOST=localhost
CUBEJS_DB_NAME=stats
CUBEJS_DB_PORT=3307
CUBEJS_DB_TYPE=mongobi
CUBEJS_API_SECRET=SECRET
CUBEJS_DEV_MODE=true

Now let's start a Cube development server.

$ npm run dev

This starts a development server with a playground. We'll use it to generate Cube schema, test our data and, finally, build a dashboard. Open http://localhost:4000 in your browser.

Cube uses the data schema to generate an SQL code, which will be executed in your database. Data schema is a JavaScript code, which defines measures and dimensions and how they map to SQL queries.

Cube can generate a simple data schema based on the database’s tables. Select the events table and click “Generate Schema.”

Alt Text

Once the schema is generated, we can navigate to the “Build” tab and select some measures and dimensions to test out the schema. The "Build" tab is a place where you can build sample charts with different visualization libraries and inspect how that chart was created, starting from the generated SQL all the way up to the JavaScript code to render the chart. You can also inspect the JSON query, which is sent to the Cube backend.

Alt Text

Although auto-generated schema is a good way to get started, in many cases you'd need to add more complex logic into your Cube schema. You can learn more about data schema and its features here. In our case, we want to create several advanced measures and dimensions for our real-time dashboard.

Replace the content of schema/Events.js with the following.

cube(`Events`, {
  sql: `SELECT * FROM stats.events`,

  refreshKey: {
    sql: `SELECT UNIX_TIMESTAMP()`
  },

  measures: {
    count: {
      type: `count`
    },

    online: {
      type: `countDistinct`,
      sql : `${anonymousId}`,
      filters: [
        { sql: `${timestamp} > date_sub(now(), interval 3 minute)` }
      ]
    },

    pageView: {
      type: `count`,
      filters: [
        { sql: `${eventType} = 'pageView'` }
      ]
    },

    buttonClick: {
      type: `count`,
      filters: [
        { sql: `${eventType} = 'buttonCLicked'` }
      ]
    }
  },

  dimensions: {
    secondsAgo: {
      sql: `TIMESTAMPDIFF(SECOND, timestamp, NOW())`,
      type: `number`
    },

    anonymousId: {
      sql: `anonymousId`,
      type: `string`
    },

    eventType: {
      sql: `eventType`,
      type: `string`
    },

    timestamp: {
      sql: `timestamp`,
      type: `time`
    }
  }
});

First, we define measures for our dashboard. The count measure is just a simple count of all total events; pageView and buttonClick are counts of corresponding events. The online measure is a bit more complex. It returns the number of unique users who performed any event in the last 3 minutes.

Within dimensions we have simple anonymousId, eventType, and timestamp, which just shows the values from corresponding columns. We've also defined a secondsAgo dimension, which calculates the number of seconds since the event's occurrence.

Lastly, we are setting a custom refreshKey. It controls the refresh of the Cube in-memory cache layer. Setting it to SELECT UNIX_TIMESTAMP() will refresh the cache every second. You need to carefully select the best refresh strategy depending on your data to get the freshest data when you need it, but, at the same time, not overwhelm the database with a lot of unnecessary queries.

We will use these measures and dimensions in the next part, when we create a frontend dashboard app with React and Chart.js.

Cube API with BigQuery

Google BigQuery is a serverless and highly scalable data warehouse. It is designed to quickly process complex queries on large datasets. It uses SQL as a query language, which makes it easy to get started.

There are a few things worth mentioning before we proceed. BigQuery isn't designed for transactional queries like CRUD operations. It takes around 2 seconds to run a simple query like SELECT * FROM bigquery-public-data.object LIMIT 10 on a 100 KB table with 500 rows. Also, BigQuery is slower on small datasets than traditional relational databases, such as MySQL or Postgres.

BigQuery is a paid service, where customers are charged based on query and storage costs. Real-time data streaming is a paid feature as well; you can check its pricing here. There are best practices on how to control the amount of processed data per query in order to reduce the cost. We'll talk about them later in this part.

Prerequisites

You are going to need a Google Cloud Platform (GCP) account in order to use BigQuery. If you don't have it yet, please refer to this guide to set it up and then come back here to continue our tutorial.

Once you have a GCP project with billing enabled (by starting a free trial or using a coupon, for example), you can move on to the next steps.

As a dataset, we'll use a sample public events dataset—cubejs-examples.stats.events. Feel free to use your own dataset if you have one.

Creating a Cube Application

We are going to use Cube CLI to create our new Cube application with the BigQuery driver:

$ npx cubejs-cli create real-time-dashboard -d bigquery

Now, we need to configure credentials to access BigQuery. Cube uses environment variables to manage database credentials. To connect to BigQuery, we need to set two variables: CUBEJS_DB_BQ_PROJECT_ID and CUBEJS_DB_BQ_KEY_FILE.

The first one is simply your project ID, which you can copy from the lift of your projects. The CUBEJS_DB_BQ_KEY_FILE variable should point to the Service Account Key File. To get this file, you need to create a new service account on IAM -> Service accounts page. Add BigQuery Data Viewer and BigQuery Job User roles to this service account and then generate a new key file. Download it and place it into the real-time-dashboard folder.

Your real-time-dashboard/.env file should look like the following.

CUBEJS_DB_BQ_PROJECT_ID=cubejs-examples
CUBEJS_DB_BQ_KEY_FILE=./cubejs-examples-f1c5cbc00a18.json
CUBEJS_DB_TYPE=bigquery
CUBEJS_API_SECRET=SECRET

Data Schema

Cube uses the data schema to generate an SQL code, which will be executed in your database. Data schema is a JavaScript code, which defines measures and dimensions and how they map to SQL queries. You can learn more about data schema and its features here.

As mentioned before, we are going to use data from a public BigQuery table—cubejs-examples.stats.events. Inside the project folder, create the schema/Events.js file with the following content.

cube(`Events`, {
  sql: `
    SELECT
      *
    FROM
      stats.events
    WHERE ${FILTER_PARAMS.Events.timestamp.filter('timestamp')}`,

  refreshKey: {
    sql: `
      SELECT
        count(*)
      FROM
        stats.events
      WHERE ${FILTER_PARAMS.Events.timestamp.filter('timestamp')}`
  },

  measures: {
    count: {
      type: `count`
    },

    online: {
      type: `countDistinct`,
      sql : `${anonymousId}`,
      filters: [
        { sql: `${timestamp} > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 3 MINUTE)` }
      ]
    },

    pageView: {
      type: `count`,
      filters: [
        { sql: `${eventType} = 'pageView'` }
      ]
    },

    buttonClick: {
      type: `count`,
      filters: [
        { sql: `${eventType} = 'buttonCLicked'` }
      ]
    }
  },

  dimensions: {
    anonymousId: {
      sql: `anonymousId`,
      type: `string`
    },

    eventType: {
      sql: `eventType`,
      type: `string`
    },

    timestamp: {
      sql: `timestamp`,
      type: `time`
    }
  }
});

The sql property of the cube defines the SQL that will be used to generate a table that will be queried by a cube. It usually takes the form of a SELECT * FROM table query. In our case, you can see we are using FILTER_PARAMS here. Usually you don't need to pass filters to the sql property and filtering is done automatically by Cube, but in the case of BigQuery partitioned tables, you need to do that. The events table is partitioned by a timestamp and cannot be queried without a filter over the timestamp column. BigQuery partitioned tables is an excellent way to reduce the cost and improve the performance of our queries.

Next, we define measures for our dashboard. The count measure is just a simple count of all total events; pageView and buttonClick are counts of corresponding events. The online measure is a bit more complex. It returns the number of unique users who performed any event in the last 3 minutes.

Within dimensions we have the simple anonymousId, eventType, and timestamp, which just shows the values from corresponding columns. We've also defined a secondsAgo dimension, which calculates the number of seconds since the event's occurrence.

Lastly, we are setting a custom refreshKey. It controls the refresh of the Cube in-memory cache layer. We're making it to count the number of rows in our table. This way Cube will not issue unnecessary queries against BigQuery, which would help to keep our billing low.

Feel free to play around with measures and dimensions in the playground. Please make sure you always select some date range, since it is required because of the partitioning.

Alt Text

We will use these measures and dimensions in the next part, when we create a frontend dashboard app with React and Chart.js.

Frontend Dashboard

In the previous part we've successfully configured a database, BigQuery or MongoDB, and created a Cube schema for the real-time dashboard. Now it is time to build a dashboard itself!

Cube Playground can generate a boilerplate frontend app. It is a convenient way to start developing a dashboard or analytics application. You can select your favorite frontend framework and charting library and Playground will generate a new application and wire all things together to work with the Cube API.

We'll use React and Chart.js in our tutorial. To generate a new application, navigate to "Dashboard App,” select "React Antd Static" with "Chart.js", turn on the "Web Socket Transport (Real-time)" switch, and click on the “Create dashboard app” button.

Alt Text

It could take a while to generate an app and install all the dependencies. Once it is done, you will have a dashboard-app folder inside your Cube project folder. To start a dashboard app, either go to the “Dashboard App” tab in the playground and hit the “Start” button, or run the following command inside the dashboard-app folder:

$ npm start

Make sure the Cube backend process is up and running since our dashboard uses its API. The frontend application is running on http://localhost:3000.

To add a chart on the dashboard, you can either edit the dashboard-app/src/pages/DashboardPage.js file or use Cube Playground. To add a chart via Playground, navigate to the "Build" tab, build a chart you want, and click the "Add to Dashboard" button.

Configure Cube for Real-Time Data Fetch

We need to do a few things for real-time support in Cube. First, WebSockets transport should be enabled by setting the CUBEJS_WEB_SOCKETS environment variable to true.

Check that the following line is present in the .env file:

CUBEJS_WEB_SOCKETS=true

Next, we need to update the cube.js file to pass a few additional options to the Cube server.

Update the content of the cube.js file the following.

module.exports = {
  processSubscriptionsInterval: 1000,
  orchestratorOptions: {
    queryCacheOptions: {
      refreshKeyRenewalThreshold: 1,
    }
  },
};

We have passed two configuration options to Cube. The first, processSubscriptionsInterval, controls the polling interval. The default value is 5 seconds; we are setting it to 1 second to make it slightly more real-time.

The second, refreshKeyRenewalThreshold, controls how often the refreshKey is executed. The default value of this option is 120, which is 2 minutes. In the previous part, we've changed refreshKey to reset a cache every second, so it doesn't make sense for us to wait an additional 120 seconds to invalidate the refreshKey result itself, that’s why we are changing it to 1 second as well.

These are all the updates we need to make on the backend part. Now, let's update the code of our dashboard app. First, let's check that the @cubejs-client/ws-transport package is installed in the package.json file in the dashboard-app folder. It provides a WebSocket transport to work with the Cube real-time API:

{
  "name": "dashboard-app",
  "version": "0.1.0",
  "private": true,
  "dependencies": {

    // ...

    "@cubejs-client/ws-transport": "^0.23.11",

    // ...

Now, we need to update how we request a query itself in the src/components/ChartRenderer.js. Make the following changes:

  const ChartRenderer = ({
-   vizState
+   vizState, cubejsApi
  }) => {
    const {
      query,
      chartType
    } = vizState;
    const component = TypeToMemoChartComponent[chartType];
-   const renderProps = useCubeQuery(query);
+   const renderProps = useCubeQuery(query, { subscribe: true, cubejsApi });
    return component && renderChart(component)(renderProps);
  };

That's it! Now you can add more charts to your dashboard, perform changes in the database, and see how charts are updating in real time.

You can also check this online live demo with various charts displaying real-time data.

In the next part, we'll talk about how to deploy our application, both the backend and the frontend.

Deployment

Now, let's deploy our Cube API and the dashboard application. In this tutorial, we'll deploy both the Cube API and the dashboard application to Heroku.

Cube API Deployment

There are multiple ways you can deploy a Cube API; you can learn more about them here in the docs.

The tutorial assumes that you have a free Heroku account. You'd also need a Heroku CLI; you can learn how to install it here.

First, let's create a new Heroku app. Run the following command inside your Cube project folder.

$ heroku create real-time-dashboard-api

We also need to provide credentials to access the database. I assume you have your database already deployed and externally accessible. The example below shows setting up credentials for MongoDB.

$ heroku config:set \
  CUBEJS_DB_TYPE=mongobi \
  CUBEJS_DB_HOST=<YOUR-DB-HOST> \
  CUBEJS_DB_NAME=<YOUR-DB-NAME> \
  CUBEJS_DB_USER=<YOUR-DB-USER> \
  CUBEJS_DB_PASS=<YOUR-DB-PASSWORD> \
  CUBEJS_API_SECRET=<YOUR-SECRET> \
  --app real-time-dashboard-api

Then, we need to create two files for Docker. The first file, Dockerfile, describes how to build a Docker image. Add these contents:

FROM cubejs/cube:latest

COPY . .

The second file, .dockerignore, provides a list of files to be excluded from the image. Add these patterns:

node_modules
npm-debug.log
dashboard-app
.env

Now we need to build the image, push it to the Heroku Container Registry, and release it to our app:

$ heroku container:login
$ heroku container:push web --app real-time-dashboard-api
$ heroku container:release web --app real-time-dashboard-api

Let's also provision a free Redis server provided by Heroku:

$ heroku addons:create heroku-redis:hobby-dev --app real-time-dashboard-api

Great! You can run the heroku open --app real-time-dashboard-api command to open your Cube API and see this message in your browser:

Cube server is running in production mode.

Dashboard App Deployment

The dashboard app should be deployed as a static website.

To do so on Heroku, we need to create the second Heroku app. Run the following command inside the dashboard-app folder:

$ heroku create real-time-dashboard-web

Then, enable the static website build pack:

$ heroku buildpacks:set https://github.com/heroku/heroku-buildpack-static.git

Next, we need to create the static.json file under the dashboard-app folder with the following contents:

{
  "root": "build/"
}

Also, we need to set Cube API URL to the newly created Heroku app URL. In the src/App.js file, change this line:

- const API_URL = "http://localhost:4000";
+ const API_URL = "https://real-time-dashboard-api.herokuapp.com";

Finally, we need to run the npm run build command and make sure the build folder is tracked by Git. By default, .gitignore excludes that folder, so you need to remove it from .gitignore.

Once done, commit your changes and push to Heroku. 🚀

$ git add -A
$ git commit -am "Initial"
$ git push heroku master

That's it! You can run the heroku open command to open your dashboard application in your browser and see it working with Cube.

Congratulations on completing this guide! 🎉

I’d love to hear from you about your experience following this guide. Please send any comments or feedback you might have in this Slack Community. Thank you and I hope you found this guide helpful!

Real-Time Dashboard with Open-Source Tools

Overview

Cube API with MongoDB

Setting up MongoDB and BI Connector

Getting a Sample Dataset

Creating Cube Application

Cube API with BigQuery

Prerequisites

Creating a Cube Application

Data Schema

Frontend Dashboard

Configure Cube for Real-Time Data Fetch

Deployment

Cube API Deployment

Dashboard App Deployment

Upgrade your data stack today

More on React

Building a Next.js Dashboard with Dynamic Charts and SSR

Building a nivo Dashboard with Cube

Building a visx Dashboard with Cube