5 min read

Three Tips for Designing Data Integrations

You've just been tasked with building a data integration to save your organization a lot of time on a wasteful service desk process that's..
Three Tips for Designing Data Integrations

You've just been tasked with building a data integration to save your organization a lot of time on a wasteful service desk process that's driving your service triage team and your clients nuts. Thankfully you've gotten pretty good business requirements about what should map to what, and now it's time for you to build it! But how do you get started?

The way you design a data integration is important. Here are some tips I have learned over the last ten years that you may find helpful as you get started:

Tip #1. Design one directional workflow first.

There tends to be a misconception among developers (and even business stakeholders) that bi-directional integrations are mirror images of each other. They are not. How you handle data coming in to an app is often quite different then the messages you send out of it to another system. Here's a very simple example. Let's say you have a service desk integration between ServiceNow and Autotask:

  1. In Service Now, you have an incident. The incident includes a priority field.
  2. In Autotask, the incident will create a ticket. The ticket also includes a priority field.
  3. When Autotask updates a ticket, we want to also update status and priority back to the incident in ServiceNow.

Seems simple enough. However, these fields often don't have the same valid values. In ServiceNow you might have priorities:

  • high
  • medium
  • low

In Autotask, you may have priorities:

  • critical
  • high
  • normal
  • low

Mapping data from ServiceNow into Autotask, therefore, will be different from mapping in the other direction, because the rules are not identical going the other direction (they're not "reflexive").

In fact, I've found that about the only time that the data flows are reflexive is when you are doing a fully bidirectional data sync (same data, same fields propagate in both directions). This tends to happen only with CRM and sometimes eCommerce/inventory information.

So, you will have two different workflows of data in each direction. This can be very confusing, especially once you start testing. Therefore, I recommend getting the integration working in one direction first, and then implementing the "return trip".

Tip #2. Identify business triggers early

It's important to really understand the business part of an integration to know when to actually push data between the applications in play. For example:

  1. in a service desk integration, usually the initiating system creates a ticket but it doesn't get sent to the triaging system until it meets certain criteria (i.e. we want IT tickets to go to our MSP, but non-IT tickets we will handle on our own).
  2. in a CRM integration an opportunity may need to reach maturity (i.e. updated to "won" or "lost" before it becomes of interest to other systems (accounting, PSA etc).

Thankfully most modern SaaS applications have a built-in methods to create notifications when certain capabilities are met. One of the first things I look at when designing an integration is to assess the webhook capability of the apps involved. What's a webhook?

webhook: an HTTP-based notification message sent to a configurable destination when a particular condition occurs in an application

Note, they may not always be called webhooks:

  • In Autotask a ticket "webhook" is actually a combination of workflow rules and extension callouts.
  • In Salesforce, webhook capability is built through a combination of workflow rules and outbound messages.
  • In ServiceNow, webhook capability can be configured by script actions that send a message when a system event occurs.

Notice the pattern. There's a "when does it get sent", and a "what gets sent" aspect to webhooks. The "when" is really important to get right and can only be determined once you have discussed the integration scenario with the end users involved.

What happens if you can't use webhooks? Usually I'd consider the "polling option". This involves checking the source system (via a query API) for records that meet certain conditions. If records meeting the criteria are found, then they are queued for processing into the target system. If you have to use this choice, then you'll need to settle on how often you will check the source system. Polling is always a second choice, because it likely will use more resources and be slower than a near near-real time webhook notification.

Whether you use webhooks or polling, you still need to understand the business processes and criteria that determines when to initiate the data transfer.

Tip #3. Configure webhooks with minimal payloads.

If you're going to use webhooks, there's another question you have to answer, "what gets sent?"

Some systems have very robust capabilities for assembling custom webhook payloads. What's in a webhook payload? A CRM contact payload might look something like this:

{
  "id": "AB49EF934",
  "first_name": "Derek",
  "last_name": "Gau",
  "email": "derek@apigrate.com",
  "created": "2024-01-22T17:52:35Z",
  "updated": "2024-01-22T17:52:35Z",
  "status": "active"
}

The advantage of including information in a payload is obvious. You have the information sent directly from the application at the time of the event, and you can immediately use it for processing.

But...

Most of time webhook payloads don't contain nearly enough information needed for a business integration workflow. For example, what if I've configured user-defined fields I want to also use on my contact, but they aren't included in the webhook? What if I need to also use company information for the contact?

For this reason, I often minimize (or ignore) information included on the webhook payload. In other words, from the above example, I design the integration to only use the id field, effectively:

{
  "id": "AB49EF934"
}

...and instead I immediately retrieve all the information via API lookup that I'll need for the rest of the business transaction workflow. This gives some advantages:

  1. you get the absolute most recent version of the data from the source entity
  2. you usually get a more complete set of data by getting the whole source entity
  3. it minimizes the data in-transit, which can protect personally identifiable information from man-in-the-middle attacks
  4. it reduces code sensitivity in your middleware if the webhook payload is ever modified. In other words, as long as the id field is in the payload, other field additions/subtractions don't break the integration.

The obvious "con" of this approach is that it does introduce a little bit more latency (you have to do an extra API "read" to get the full set of source data). However, in my experience, I've found that I almost always have to do API lookups in the source system anyway and a few fractions of a second more usually doesn't make a big difference. More importantly, from a maintenance perspective, I have definitely seen integrations break because of unintended configuration changes or system updates that have altered webhook payloads. Minimizing your payloads as outlined maximizes the robustness of your automation code.

I hope these tips help you get off to a good start with your next integration project!