SSJS Efficiency: Overcoming the 30-Minute Limit
Server Side JavaScript (SSJS) activities can take a long time to execute. However, the maximum runtime of an Automation Studio activity is 30 minutes. There’s no way to extend this limit and scripts that run longer than 30 minutes will be stopped, potentially ruining your progress.
During my time as a Marketing Cloud Developer, I’ve seen many scripts that do not care about this limit. They either process to little data or run as long as they need to, and if they are stopped, they just fail. The first is inefficient while other is not a good practice, as it can lead to data corruption or inconsistent data.
So, how can we deal with this?
The options
There are several ways to manage this limit, all of which require you to split the job into smaller tasks. You are probably already doing this by loading data via VSProxy in batches of 2500 records and processing them in a loop.
However, when it comes to processing records in a single script activity, you could take different approaches:
- Small batches: we load some data (some or all) and process them. We only run a one batch per script activity execution. This is a simple approach, but is often inefficient as the script will not run for most of the allocated time.
- Finished or failed: we load lots of data (or all) and try to process it. If the script fails, we just run it again. This might be a good approach for simple low volume tasks. When attempting to process a large amount of data, this approach may result in frequent failures, leading to duplicates or missing data. If nothing else, it will make our automation fail.
- Empiric approach: we tweak the script until it runs within the limit. We try bigger or smaller batches, add or remove repetitions… In the end we use this experience to pick the most suitable option. This is rather simple to do, but can be time consuming and can still lead to failures.
These are the approaches I see the most. However, I would like to introduce another approach that I’ve used in the past with a great success:
- Elapsed Time: we keep track of the time and stop the script when it’s about to run out. This is a good approach for complex tasks that can be split into smaller jobs. Add couple of more steps to keep track of the progress or load more data when no more data is available and you have a robust solution!
Example
Consider this hypothetical use case: Your company wants you to import and check a daily list of email addresses through SFTP. You want to check if the email addresses are valid and if they are already in your database, so you don’t import them again. We could also create the Subscribers, but that’s a task for another day. Plus, you are expecting a huge list of email addresses, but have no idea how many. And since this is an article on SSJS, let’s try this with, surprise-surprise, SSJS.
Automation
We will need two automations to handle this. The first automation will run the daily import of emails from SFTP and store them in a Data Extension (EmailsToValidate). Let’s not focus on this part, as this is as simple as an Import and an SQL.
Our Data Extension will have the following structure:
Name & CustomerKey: EmailsToValidate
- EmailAddress: Text(254)
- isChecked: Boolean
- isValid: Boolean
- validationStatus: Text(50)
Our main automation will run hourly and will process the data in this Data Extension. It will check if the email address validity. Checked emails will always be marked as isChecked = true
. Valid email addresses will have isValid = true
and validationStatus = 'TO_BE_ADDED'
. Existing emails will be HAS_SUBSCRIBER
. Invalid emails will receive INVALID_EMAIL
.
Splitting the job
The first thing we need to do is to split the job into smaller jobs. Let’s determine what we need to do for each email address:
- load the data from the Data Extension. We do not want to load processed data, so let’s load those that are not checked yet.
- check if the email address is valid. Keep it simple for this demo by using
IsEmailAddress()
function. - check if the email address is already in Subscribers. In this case, let’s use
Subscriber.Retrieve()
function. - save the results back to the Data Extension.
Great! So we have 4 basic steps. Steps 1 & 4 can be run for multiple records at once. Steps 2 & 3 are single record operations so we call these in a loop.
Timing you runs
Let’s create a simple timer object. This object will keep track of the time and let us know when we are about to run out of time. We of course need to be able to set the time limit. Let’s set it to 27 minutes. This way we have finalize things like logging and and cleanups.
var MAX_RUNTIME = 27 * 60000;
function Timer() {
this.startTime = new Date().getTime();
this.isOverRuntime = function() {
var currentTime = new Date().getTime();
var elapsedTime = currentTime - this.startTime;
return elapsedTime > MAX_RUNTIME;
}
};
Now we can use this, to check if we are about to run out of time periodically:
try {
var timer = new Timer();
// WORK PREPARATION:
// MAIN PROCESSING:
while (!timer.isOverRuntime()) {
// DATA PROCESSING:
}
// FINALIZATION (CLEANUP) LOGIC:
} catch (err) {
debug(err);
}
Getting more data
I have written a simple Data Getter utility (you can find it in the final script). This utility remembers the progress in getting the data and can get more data when needed. Also, you can set the size of the batch.
As part of the loop, we will keep removing the processed data from the data
variable. When there is no more data in the batch, we will get more data. If we are out of data, we will stop the script. Also, let’s not forget adding the data checks to the while loop condition:
var getter = new DataGetter();
var data = getter.getDataBatch();
// MAIN PROCESSING:
while (!timer.isOverRuntime() && (getter.moreData || data.length > 0)) {
if (data.length < 1) {
data = getter.getData(config);
}
var record = data.shift(); // cut the first record from the array
}
The Script
Finally, we can add the rest of the script. You can find the function handleRecord()
that validates the email and saves the result. There’s a way to improve it. Have you found it?
By the way, our script activity can easily run twice in the allocated hour since we do not use the full 30 minutes (but only 27-28 minutes).
<script runat=server language="JavaScript">
Platform.Load("core", "1.1.1");
var IS_PROD = false; // switch for your dev/prod
var DE_NAME = "EmailsToValidate";
var BATCH_SIZE = IS_PROD ? 300 : 2;
var MAX_RUNTIME = IS_PROD ? 27 * 60000 : 1000;
/* UTILS: */
function debug(msg) {
// Debug to cloud page is not always the best:
if (typeof(msg) === 'object') {
Write('\n' + Platform.Function.Stringify(msg));
} else {
Write('\n' + msg);
}
}
function Timer() {
this.startTime = new Date().getTime();
this.isOverRuntime = function() {
var currentTime = new Date().getTime();
var elapsedTime = currentTime - this.startTime;
return elapsedTime > MAX_RUNTIME;
}
this.printTime = function() {
var currentTime = new Date().getTime();
var elapsedTime = currentTime - this.startTime;
debug('Time: ' + elapsedTime/1000 + ' s (of ' + MAX_RUNTIME/1000 + ' s).');
}
};
function DataGetter() {
// could be even more generic:
var api = new Script.Util.WSProxy();
this.config = {
name: DE_NAME,
cols: [ "EmailAddress", "isChecked" ],
filter: {
Property: "isChecked",
SimpleOperator: "equals",
Value: false
},
opts: { BatchSize: BATCH_SIZE },
props: {}
};
this.moreData = true;
this.reqID = null;
this.getData = function() {
var result = [];
if (this.moreData) {
if (this.reqID) this.config.props.ContinueRequest = this.reqID;
var req = api.retrieve("DataExtensionObject[" + this.config.name + "]", this.config.cols, this.config.filter, this.config.opts, this.config.props);
if (req) {
this.moreData = req.HasMoreRows;
this.reqID = req.RequestID;
var results = req.Results;
for (var k in results) {
var props = results[k].Properties;
var o = {};
for (var i in props) {
var key = props[i].Name;
var val = props[i].Value
if (key.indexOf("_") != 0) o[key] = val;
}
result.push(o);
}
}
}
return result;
}
}
function updateRecord(email, isValid, status) {
var result = Platform.Function.UpdateData(
DE_NAME,
// filter
[ "EmailAddress"],
[ email ],
[ "isChecked", "isValid", "validationStatus" ],
[ true, isValid, status ]
);
}
function hasSubscriber(emailAddress) {
var results = Subscriber.Retrieve({ Property: "EmailAddress", SimpleOperator: "equals", Value: emailAddress });
return typeof(results) === 'object' && results.length > 0;
}
function handleRecord(record) {
var ok = false;
var status = 'unknown';
if (record && record.EmailAddress && IsEmailAddress(record.EmailAddress)) {
if (!hasSubscriber(record.EmailAddress)) {
ok = true;
status = 'TO_BE_ADDED';
} else {
ok = true;
status = 'HAS_SUBSCRIBER';
}
} else {
status = 'INVALID_EMAIL';
}
// one-by-one updates are not the best - consider batching them:
updateRecord(record.EmailAddress, ok, status);
return ok;
}
/* MAIN: */
try {
var timer = new Timer();
// WORK PREPARATION:
var recordsProcessed = 0;
var errors = 0;
var getter = new DataGetter();
var data = getter.getData();
// MAIN PROCESSING:
while (!timer.isOverRuntime() && (getter.moreData || data.length > 0)) {
i += 1;
if (data.length < 1) {
data = getter.getData(config);
}
var record = data.shift();
// do something with the record:
var ok = handleRecord(record);
recordsProcessed++;
if (!ok) { errors++ }
}
// FINALIZATION (CLEANUP) LOGIC:
debug('Records processed: ' + recordsProcessed + '. Invalid emails: ' + errors + '.');
timer.printTime();
} catch (err) {
debug(err);
}
</script>
Final thoughts
This simple example shows, how you can keep your script activities within the 30 minutes limit and do as much as possible. The key is to keep track of the progress and of the time - we did that by the DataGetter() utility and the Timer utility. This way, you can run multiple jobs in the same script activity and still be within the limit.
Couple of other takeaways I’ve learned over the years:
- Batches over one-to-one operations.
- Sometimes smaller batches are better than bigger ones.
- Avoid
wait
/sleep
functions (like when trying to fetch results of API calls) - instead of waiting, save the state in a DE and check the results in another script activity/execution. - Prepare for failures - they happen and you need to be able to recover from them. Logs and saving status to DEs, these are your friends.
- Sometimes it’s better to re-run the job than loose the data - even if you run it multiple times. But only sometimes.
If you have read through this article and still have questions, feel free to reach out to me. Also, I’ve left couple of small ideas you can improve this script with. Can you find them?