written by
5000fish Team

Why Should I Pull API Data Into a Data Lake for Reporting?

BI Problems and Solutions 6 min read

The reality is that more businesses are becoming cloud-first. It’s becoming increasingly common for nearly all business processes to take place in the cloud. The cloud is easily accessible, and it offers a lot of efficiencies – it eliminates a lot of the need for companies to make expensive up-front investments in hardware and infrastructure, and it allows for redundancies and backup in multiple locations.

The cloud has also created increased software usage as a service (SaaS) apps. Still, one of the downsides of this is that they often don’t include out-of-the-box reporting features or allow the ability to combine data into multiple sources. And they make it even harder to access data because the only access is via complex APIs.

For this reason, we find many customers are looking for a business intelligence (BI) tool to report from these SaaS apps.

We also find many people asking whether DashboardFox can hook up to SaaS APIs. The answer used to be no, but that was before. That may sound like a bad answer – especially when so many other tools like Power BI, Tableau, Google Data Studio, and others do provide that ability. Why not DashboardFox?

Historically, we didn’t support API queries because there were a lot of limitations to the data that you could pull directly from APIs. Typically, each API endpoint only gives you a slice of data that, without additional queries, provides little value outside an eye-candy report.

That is unless you’re bringing your API data into a data lake first, and then you can start to treat it like a relational database, complete with current and historical data.

That’s what we implemented in DashboardFox. More on that later, but let’s talk more about the limits in API reporting and why the Data Lake is so important.

API Data at a Glance

Let’s look at what an API is and does. API stands for application programming interface, and it is a way for programs to communicate with each other. It lets programs release only limited information rather than granting access to all their data.

So an API is an endpoint that directs to a specific feature in a SaaS product. For example, suppose your SaaS product is a CRM. In that case, there may have an API for contacts (and another for each specific contact detail), one for locations (and another for each specific location details), and one for companies (and another for each specific company details).

APIs don’t convey all the information you need. So, to pull a complete view of a contact, location, and the company requires multiple simultaneous API calls. Still, many APIs have rate limits on how often and fast you can call them, and some APIs even limit the amount of information you can pull.

The API infrastructure doesn’t lend itself well to a codeless, self-service building experience. When building ad hoc reports or exploring data, you want to pull together multiple fields across multiple tables. In some cases, you need to create relationships that don’t exist natively in the API endpoints.

So, instead of opening up reporting to your business users, which is a primary purpose of BI software, the responsibility has to remain with your developers. This means less access to data for the analysts and a higher cost of data analytics.

Confused? Let’s illustrate this with a couple of examples.

Examples of Pulling Data via APIs

Let’s say your software queries the API to get a list of open invoices for a specific customer by location. You probably first have to pull an API call for the list of open invoices, and in that list, there may be a link to a customer ID for each invoice. You then have to query the API to convert the customer ID into the full set of customer information. The customer record may have a location ID, and you have to query the API to get the location information. So now, to do an ad hoc report on several open invoices by customer and location, you’ll end up spawning multiple API calls. That’s not efficient (or easy).

You query the API for a list of help desk tickets for Customer B. You get a result showing that a ticket is open, assigned to Joe. It has a status of “pending.” in another example, keep in mind that the API gives only information on the current state – it has no concept of history or an audit trail unless the SaaS vendor specifically provides an endpoint for that data (typically, they don’t).

You query the API in another day and find that the ticket is now assigned to Megan, and its status is “closed.” That’s it – you have no information on when the changes happened or what else took place. So now you can’t do any reports on the lifecycle of that ticket because of the missing data.

You have no way of reporting on how many tickets are transferred, how long it takes to get a ticket from “pending” to “closed” – all the important data you’d likely want to report on. In some cases, even if the SaaS application shows you this information in their application, they may not expose that data via their API endpoints.

Additionally, some APIs limit how far back you can go. Want to do a month-over-month or year-over-year comparison? You're out of luck if the API doesn’t give you access to that history.

So what can I do?

A great alternative that solves some of these problems is having a service regularly query the API and store it in a data lake (a data lake is just another name for a database – it’s a repository of data in its raw format).

What are the benefits? You’ll have all your data at your fingertips, even if you stop paying for your SaaS app. That historical data could be critical in the future for benchmarking, troubleshooting, research, etc.

You can leverage database features like views to do power analysis of your data, making it simple for users to build powerful reports in a codeless report builder like DashboardFox. (Don’t be intimidated by database views – our service team can set that up for you too.)

Historical analysis, forecasting, and ad hoc exploration of data become easier.

Data migration into new tools also becomes easier, but even if you don’t want to do this, you still have it for reference, which is critical. It’s important not just to have a bunch of CSV files but an actual database that they can query.

How Can DashboardFox Help Pull API Data into a Data Lake?

So, as we mentioned before, we used to tell people, “No, DashboardFox cannot use API endpoints.” We required customers to have their data lake or database already set up.

Not anymore.

We recently launched our API integration and all the best practices that we mentioned above. It’s how we implemented our API reporting feature.

DashboardFox allows you to connect to any API endpoint that returns JSON as a result (which are almost all of them).

Plus, we support all the major authentication methods, oAuth2, API Key, HTTP, and we’ll explain that jargon in another post. But it means we can connect to almost all API endpoints.

Once connected, we take that JSON result and automatically convert it into a relational database structure, aka, a data lake.

And then, you’re able to schedule the fetching of data regularly to update and populate those tables.

From that point, everything you can already do in DashboardFox, you can do with that API-based data because it’s stored in the database, not stuck in the cloud behind an API.

Of course, there are a lot more things that DashboardFox can offer to your business, such as it being self-hosted for your data protection, affordable pricing that will surely benefit small and medium-sized businesses in particular, and a dedicated team that will solve all your queries without having to file tickets.

The best thing you can do is give us a call to talk about what we can offer to your business. Better yet, schedule a free live demo with our team because we believe that actions speak louder than words.

SaaS Microsoft Power BI Google Data Studio Tableau data lake API data Self-Service BI Self-Hosted Ad-hoc Reporting JSON