The mysterious case of the Zombie Automation Plan

Yesterday we worked with our customer to deploy some integrations that trigger Sitecore Marketing Automation plans. The plans do their thing and then push messages onto an Azure Service Bus topic and the messages end up on a Dynamics Contact Activity feed. To test the integrations, I created an Automation Plan that would be triggered from an event and then pushed some test content to the service bus. All appeared to be working, so I deleted the plan and uploaded the “real” plans ready for action.

Next morning, lots of messages had been processed but the Activity Feeds on the Dynamics contacts had double the number of activity records. One which was correct and the other which had some content that looked a lot like my test automation plan output. No worries, I thought, it must still be there. Go and check in /System/Marketing Control Panel/Automation Plans. Nope, not there.

OK, so maybe it’s orphaned in the Master database. Search for the Item by ID (which conveniently enough was included in the service bus message). Nope, not there either.

Hmm, so it must be cached by the maengine.exe. Go to Azure Portal, restart ma-ops app service. Turns out this doesn’t recycle the WebJob, so I edited the maengine.exe.config and that recycled the WebJob.

Trigger a plan as a test. Nope, still getting duplicate messages on the bus.

Alright, what about Deploy Marketing Definitions ? Cool, go do that, trigger the plan again. No luck – still getting duplicate messages.

What was the cause?

Well, Automation Plans are not actually retrieved from the Master database, since the MA engine doesn’t have access to the Sitecore databases directly. It uses the REFDATA service which in turn pulls from the refdata database (which is where the data is sent when you click the magical Deploy Marketing Definitions button), so that was the next place to check.

The refdata database contains a table called xdb_refdata.Definitions which has some data that looks like:

A query of the xdb_refdata.DefinitionTypes table revealed that Automation Plans have a TypeID of 2DE34E0A-5FB6-4AE0-ACFC-69ACF97076B4

so the query to find all Automation Plans in the refdata Definitions table was:

select *
FROM [xdb_refdata].[Definitions]
WHERE TypeID = '2DE34E0A-5FB6-4AE0-ACFC-69ACF97076B4'

I can see the plans, but which is the zombie plan? The Data column is varbinary(max) and the ID column has no relation to the ItemID in the Master database, so to find out which plan doesn’t belong we need to get the plan data from the Data column:

SELECT id, cast(data as varchar(max)) as planData
FROM [xdb_refdata].[Definitions] 
WHERE TypeID='2DE34E0A-5FB6-4AE0-ACFC-69ACF97076B4' 
ORDER by LastModified

Then it was just a case of looking at the planData output and finding the zombie plan. Great but what now ? Kill the zombies with fire ? The Definition items were referenced in 2 other tables xdb_refdata.DefinitionCultures and xdb_refdata.DefinitionMonikers and I wasn’t keen to start deleting stuff in Production, so instead I marked the offending Definition items as IsActive=0:

UPDATE [xdb_refdata].[Definitions]
SET IsActive=0
WHERE id in ('B143F382-7014-4E55-A764-0180D82AED74','33BFBC5E-C3B9-49C0-9E37-710BA146AA79');

Run the tests again and no more duplicate messages coming through. Just to be sure I deployed the marketing definitions again but the data remained set to IsActive=0 so all was well..

The moral of the story is this: Don’t delete Marketing Automation Plans without deactivating them.

Postscript:

The above approach still had issues, some contacts were enrolling in plans and were blocked and the following error was occurring (although not all the time):

[12/02/2019 00:03:54 > fdf166: INFO] 2019-12-02 00:03:54 ERR An exception occurred during processing for worker '35f908d6-638e-433e-aef6-9184a80378f4'
[12/02/2019 00:03:54 > fdf166: INFO] Sitecore.Framework.Conditions.PostconditionException: Postcondition 'planDefinition should not be null' failed.
[12/02/2019 00:03:54 > fdf166: INFO] at Sitecore.Framework.Conditions.EnsuresValidator1.ThrowExceptionCore(String condition, String additionalMessage, ConstraintViolationType type) [12/02/2019 00:03:54 > fdf166: INFO] at Sitecore.Framework.Conditions.Throw.ValueShouldNotBeNull[T](ConditionValidator1 validator, String conditionDescription)
[12/02/2019 00:03:54 > fdf166: INFO] at Sitecore.Framework.Conditions.ValidatorExtensions.IsNotNullT

The key was in the planDefinition should not be null message. The zombie plan was still taking new enrolments.

To fix this, I needed to re-activate the plans in refdata and set the EndDate of the plan, but the Data column is varbinary(max) so a bit of SQL was needed:

SELECT id, cast(data as varchar(max)) as planData
FROM [xdb_refdata].[Definitions]
WHERE id ='33BFBC5E-C3B9-49C0-9E37-710BA146AA79'

declare @planData varchar(max);
declare @binPlanData varbinary(max);
declare @test varchar(max);
SET @planData = '{-- insert the modified planData from SELECT query above with EndDate set to a date in the past --}';
SET @binPlanData = cast(@planData as varbinary(max));
SET @test = cast(@binPlanData as varchar(max));

PRINT @test;

UPDATE [xdb_refdata].[Definitions]
SET Data = @binPlanData, IsActive=1
WHERE id = '33BFBC5E-C3B9-49C0-9E37-710BA146AA79';

Next time I am going to use a flamethrower.

Automate Everything!

Recently I had the privilege and good fortune to present at the 2019 Sitecore Symposium in Orlando, Florida. Below are links to the slides and notes from my presentation on customising Sitecore Marketing Automation plans.

Slides

Code

Github repository containing sample code, utilities and content package: https://github.com/parrya/ma-demo

Troubleshooting and debugging tips

See this blog post for some tips on creating custom activity types in Sitecore Marketing Automation plans.

Resources

Sitecore docs:
https://doc.sitecore.com/developers/92/sitecore-experience-platform/en/activity-types.html
Custom Activity Types in 9.0.x:
https://www.brimit.com/blog/sitecore-9-custom-marketing-automation-action
Automation deep dive:
https://www.linkedin.com/pulse/how-sitecore-9-marketing-automation-works-deep-dive-emmerzaal/
Marketing Automation Achitecture:
https://www.youtube.com/watch?v=0HxUph2YLZc&list=PL1jJVFm_lGnyicywCcwcWa8RtsoiJEbC9&index=5

Troubleshooting custom activity types and Marketing Automation

Below are some tips for debugging and troubleshooting customisations in your Sitecore Marketing Automation plans. See also the slides from my recent Sitecore Symposium presentation “Automate Everything!” and the supporting code on GitHub.

Troubleshooting

  • Activity Properties must be PUBLIC to be populated
    • In your .NET class, ensure that the properties representing your Activity Type Parameters are public, otherwise the values will not be able to be set by the MA UI or read by the Automation Engine. (e.g. Message property in this file)
  • Ensure your activity ID is the same across:
    • Your config patch
    • Sitecore Activity Descriptor item
    • Angular package (lower case)
  • Watch out for DLL hell
    • Be careful which package versions you use in your .NET activity class or injected services. The dependencies that you use must be compatible with the Sitecore CM server and the Automation Engine.
  • If using custom facets, you have to include a config so that the Engine knows about them (contact loader XML)
  • Patch files MUST have the filename format sc.<name>.xml.
    • For example: sc.MarketingAutomation.ContactLoader.xml
  • MA engine won’t recycle on XML changes.
    • Unlike a typical Sitecore config, the Automation Engine will not recycle when you update when you edit or add an XML config file. You need to restart the service or WebJob manually for the changes to be picked up.
  • Activity not showing in the MA UI?
    • Make sure you have an icon in your Activity Type Descriptor
    • Check the XHR request in your browser network tab – the error will be in the HTTP response.
    • Ensure the DLL and plugin JS have been deployed to the Content Management server and that your Activity Type Descriptor is using the correct DLL reference.
  • Activity Type Parameters – set the editor and ObjectType (e.g. System.String).
    • This avoids getting quotes around your parameter values inbound and outbound

Some other tips

Sitecore kernel

You don’t have access to the Sitecore kernel or context database in the Automation Engine. The engine runs independently of Sitecore XM, so you cannot retrieve Sitecore items and you should not try to do so. And besides, you don’t have the connection strings in the engine config. You can use the built-in API’s to access some kinds of content.

Logging

Logging in the MA engine is completely different to “traditional” Sitecore logging which uses Log4Net. The Automation Engine uses the Ilogger interface from Microsoft.Extensions.Logging.

Log files are separate to the Sitecore logs:

Debugging your code

Attach to the MA engine for debugging in your local instance. The service will probably be named maengine. Make sure that you pick the right one, since you might have more than one MA engine service running. Mouse over the “maengine” process name in the list and it should show the file path to the maengine.exe file.

In PaaS, consult the Webjobs dashboard (as per previous section above). errors will be output to the console:

Miscellaneous

Sitecore automation plans are just Sitecore items. You can serialise MA plans and deploy them via Unicorn, TDS, and Sitecore packages.

Plans are stored in buckets under:
/sitecore/system/Marketing Control Panel/Automation Plans

Activity Type Descriptors are stored in:
/sitecore/system/Settings/Analytics/Marketing Automation/Activity Types

Predicate definitions are stored in a folder of your choosing under:
/sitecore/system/Settings/Rules/Definitions/Elements
e.g. in the “XConnect – Marketing Automation” folder