Friday 28 April 2017

SDL Web Deployer Extension for sending Purge requests to Akamai

Akamai provides a simple REST API for automating content purge requests. For detailed information regarding the API, please refer to the CCU REST API Developers Guide.

Akamai's REST API supports a number of calls. In this article I will only cover the calls to:
  • Make a Purge Request
  • Check a Purge Status

I've implemented an AkamaiRestFlusher in the form of a Deployer Extension. When a page in SDL Web (Tridion) is sent for publishing, this extension triggers the Akamai purge automatically.

A properties file named publications-domain.properties provides the list of public website base domains known by Akamai keyed by publication id. For example:

91=http://www.firstexample.com
93=http://www.secondexample.com
96=http://www.thirdexample.com

The deployer extension takes the data available from the deployment package, like the publication id and page URL Path, to construct the page URL known by Akamai.

In addition to constructing the page URL for purging, the extension also constructs a list of binary URLs referenced by the page. The purge request will include the page URL and all binary URLs in one call.

The AwaitPurgeCompletion property of the deployer extension allows the administrator to determine whether the publishing task should hold and wait until the entire purge completes before continuing. In case this is set to true, the extension utilizes the pingAfterSeconds attribute, available in the response from the purge call, to halt the processing before submitting a call to check the purge status. The extension continues submitting calls to check the purge status until the purge completion is confirmed or the maximum waiting time of 30 minutes is reached.

Akamai purges usually take time to complete, so the general advise is to set the AwaitPurgeCompletion property to false.

All communication is performed using JSON.

Monday 10 April 2017

SDL Web Experience Optimization / Fredhopper Implementation Pitfalls

In this post I would like to address some issues experienced during a recent implementation of the SDL Web Experience Optimization. Experience Optimization integrates SDL Web with Fredhopper and supports creating and managing targeted content for websites driven by SDL Web. For more information, please refer to the online documentation.

Let's set the scenario:

  • 200-300 market websites delivered from SDL Web
  • 400 promotions already created (at the time of the experienced issues)
  • 6 live attributes set in Fredhopper
  • A plan for 1000 promotions in the coming 12 months

Fredhopper response size


Fredhopper stores all its configuration in XML files. In our case, there are hundreds of promotions using the BluePrinting capability with “Include child publications” selected. This results in one XML node per promotion containing a vast list of publication ids. All these get serialized to a single XML leading to a file of 25 MB.

With compression enabled, the 25 MB response gets shrunken very well (to ca. 800 K), however parsing of an XML document of such size is an expensive operation. This has an impact on the webservice which parses the XML from Fredhopper before returning to the SDL Web Experience Optimization (SmartTarget) UI.

A hotfix (ST_2014.1.0.2236.zip) was provided by SDL R&D which brought a considerable improvement in the listing of promotions in the UI. The completion time for the GetPromotions call came down from 48 seconds to 15 seconds.

Automatic promotion publishing


Automatic publishing is enabled by default. Every time a promotion is created or updated in SmartTarget, the updated business.xml file (together with a few other files) are made available inside a directory so the syncserver/syncclient processes can push them to the query instances. In addition, the same files are copied to another directory, so they appear in the history.

In other words, not only the new promotion is saved in Fredhopper but also becomes automatically available to the query instances, and therefore the front-end client applications (i.e. websites).

Turning off automatic publishing avoids these intensive publishing operations at save time also saving on server resources. At save time, the business.xml file is still immediately saved on the indexer instance, but there is no longer any file copying for the syncserver/syncclient processes.

Disabling automatic publishing in Fredhopper requires adding an <ApproveEveryChange>false</ApproveEveryChange> node under the <IndexServer> node in the smarttarget_conf.xml file

Nothing changes in the use of the SmartTarget UI. Newly created and updated promotions are shown instantly. The only difference is that these promotions do not become automatically available to the front-end apps. They would appear as “pending” inside the Fredhopper admin UI.

The change described above requires an additional promotion publishing process. There are two options regarding this:
  1. An administrator publishes the pending promotions from the Fredhopper Business Manager.
  2. Publishing via a scheduled job which uses the com.tridion.smarttarget.webservice.ApproveAllChangesWorkerThread class (available from the smarttarget_core.jar) to perform the work.
With automatic publishing disabled, the time taken to save a promotion in SmartTarget came down from 1.2 min to 10 seconds.


Reload interval on query instances


The previous section relates to the synchronization of published configuration (XML files) from an indexer instance to the query instances, which is done by the syncserver/syncclient processes.

It is also possible to configure the interval in which a query instance will reload these configurations. The reload is done by the qserver process and involves reading the synchronized files and updating the rule engine and in-memory data structures. This qserver process also serves the end-user requests, so it's important to set the reload interval to a suitable time to avoid overload.

The update frequency can be updated via a property inside the query instance's config/system.xml file. The default value is one minute only.

Add the following into the system.xml (under the "root" node):

<node name="com"><map/>
  <node name="fredhopper"><map/>
    <node name="util"><map/>
      <node name="prefs"><map/>
        <node name="BusinessProperties">
          <map>
            <entry key="auto-check-for-updated-config-interval" value="600000" />
          </map>
        </node>
      </node>
    </node>
  </node>
</node>

The configuration above puts the update interval to 10 minutes.

Notes:
  • After changing the configuration it is better to restart the qserver process.
  • Make sure to configure the same on all the query instances.

After this change, no matter how frequent the configuration (XML) gets published, a query qserver will only look for it every 10 minutes.

The change can be verified in the qserver.log file by checking the time interval in which the "Finished reloading business settings" message appears.

Promotion BluePrinting


This section is not meant to provide a recommendation, but only to share our experience in this particular project. From time to time you may find the need to re-index all content published from SDL Web (Tridion) to Fredhopper. In fact, according to Fredhopper's team, a full re-index is something that should happen fairly frequently.

Now imagine having 200 to 300 websites to re-publish from SDL Web in order to trigger the re-indexing against a wiped Fredhopper indexer. This is just not feasible.

As explained in the SmartTarget online documentation, it is possible to create generic promotions applicable for all Websites at a higher level in the BluePrint so that website publications lower down can use them.

So you can avoid indexing content from 200 to 300 lower level publications and having all content indexed from one publication. This means that re-indexing would require republishing content from a unique higher level publication in SDL Web.

There are two ways to achieve this:
  1. Localizing the component templates at the lower level publications and removing the "Add to SmartTarget" TBB.
  2. Adding conditional logic to the TBB in order to only process content originating from the higher level publication.
Of course this solution imposes limitations (i.e. if a promotion should only ever apply to a specific market), but it was a good fit in this particular project.

Tuning


After a trial and error period, the following memory settings were implemented to meet the ongoing server load:
  • 64 GB of RAM on Fredhopper servers running an indexer and query instances.
  • 32 GB of RAM on Fredhopper servers running a query instance only.

Friday 7 April 2017

Wiping Fredhopper indices

You can find the single instruction for wiping/deleting indices in the online learning center (https://www.fredhopper.com/learningcenter), but this post aims to detail all the steps involved in a real world scenario.

Consider a fredhopper installation with two instances: indexer and query.

1. Ensure that both instances are stopped

2. Execute the following for deleting the indices

sudo su - fredhopper
cd /home/fredhopper/fredhopper

./bin/deployment-agent-client --location <HOST> delete --instance indexer data/indices

./bin/deployment-agent-client --location <HOST> delete --instance query data/indices

3. Backup the processed XML files

It is very likely that the /processed/batch/ directory contains hundreds of files, which could result in the "Argument list too long" error when executing the cp or rm commands. An alternative approach using "find" is presented below.

mkdir /tmp/fredhopper-xml-backup/batch
cd /tmp/fredhopper-xml-backup/batch

find /home/fredhopper/fredhopper/data/instances/indexer/data/xml/processed/batch/ -maxdepth 1 -name "*.xml" -exec cp -uf "{}" . \;

4. Remove the processed XML files (IMPORTANT)

find /home/fredhopper/fredhopper/data/instances/indexer/data/xml/processed/batch/ -maxdepth 1 -name "*.xml" -exec rm -f "{}" \;

5. Restart the instances (indexer and query)

cd /home/fredhopper/fredhopper
./bin/instance indexer start &

Now publish content from SDL Web (Tridion) to be indexed in Fredhopper. Check http://<HOST>:<INDEXER_PORT>/fredhopper/sysadmin/indexinfo.jsp to confirm the published content has been indexed.

./bin/instance query start &

Check http://<HOST>:<QUERY_PORT>/fredhopper/sysadmin/indexinfo.jsp to confirm the indexed item has reached the query instance.

Building a distributed SDL Web Experience Optimization / Fredhopper solution

This post describes a solution for creating a distributed Fredhopper setup for a real-world Production environment. This solution is composed of the following Fredhopper instances:

PRODUCTION "PROD-ORIGIN" INSTANCES
  • Fredhopper ProdOrigin Indexer (on "prod-origin-01" server)
  • Fredhopper ProdOrigin Query 1 (on "prod-origin-01" server)
  • Fredhopper ProdOrigin Query 2 (on "prod-origin-02" server)

PRODUCTION "PROD-REPLICATION" INSTANCES
  • Fredhopper ProdReplication Indexer (on "prod-replication-01" server)
  • Fredhopper ProdReplication Query 1 (on "prod-replication-01" server)
  • Fredhopper ProdReplication Query 2 (on "prod-replication-02" server)

Consider all these instances running on Linux servers with the ProdOrigin and ProdReplication instances available from separate data centers.

There is no "out of the box" mechanism for replicating the published configurations (XML files) between the two Production environments. A bespoke automated script is registered on both Production Indexer instances as follows:

cd /home/fredhopper/fredhopper
./bin/deployment-agent-client set-option indexer /com/fredhopper/config/VersionControl/@customer-publish-action=/fredhopper/indexer/bin/post-publish.sh

This script copies the published business.xml to the remote instance and reloads it by invoking the http://${TARGET}:${HTTP_PORT}/fredhopper/sysadmin/reload-config.jsp?select=business URL.

In order for the script to work, a passwordless authentication must be established between the ProdOrigin and ProdReplication servers.

Parameters such as "target", "user" and "httpport" are set in the post-publish.sh script, which then uses these parameter values to invoke the replicate-smarttarget-promotions.sh script. Both these scripts are available from:

post-publish script
replicate-smarttarget-promotions script

Validation


    1. In the SDL Web/Tridion CME, navigate to Targeting

    2. In the top left, the source drop-down should default to Staging

    3. Validate that both the Staging and Production sources load their lists of promotions correctly

    4. On source Staging, create a new Promotion called "TESTPROMO" and save it

    5. Click the ... button of TESTPROMO and choose Copy To... > Production

    6. Change the source drop-down to Production, and validate that TESTPROMO is now available in Production

    7. Log in to the ProdReplication Fredhopper server, and validate that the TESTPROMO has been copied by the replication script:

    ssh root@prod-replication-01
    grep "TESTPROMO" /fredhopper/data/instances/indexer/config/business.xml

    If the promotion was replicated, you'll see an XML statement on the command line.

    Conclusion


    The standard Fredhopper processes continue working as usual on the Production servers. For example, the syncserver/syncclient processes continue sending the published configuration (XML files) from the indexer to the query instances. The replication itself is taken care of by the bespoke replication script as soon as the origin indexer instance is updated. As expected (and required), the replication is between indexer instances and Fredhopper takes care of the internal processing as usual.