Some notes about UCF

A really interesting article on UCF has been posted on the dm_notes blog:

Role of UCF in Documentum Clients « dm_notes: Documentum Notes

Here is some additional info I’ve discovered over the last couple of years:

  • UCF does not run in the context of the browser and it is not an applet. It is a standalone (headless) java application that is launched by the browser. There is some pretty tricky stuff that the launcher does (like installing a private copy of java if you don’t already have java) in order to get an execution environment for UCF setup.
  • With the advent of ACS/BOCS, the UCF client application is able to transfer files from/to(as of D6) the ACS servlet that runs in the Java Method Server servlet container on the Content Server. This can eliminate a network hop, but also requires users to have at least one network path open to the Docbase which could be considered a security vulerability.
  • UCF provides services to the User Profile component in Webtop, allowing for filetypes to be mapped to specific view/edit programs on a users PC.
  • DFS can utilize UCF for file transfers, This desirable when you are dealing with Virtual Documents and/or xml content managed by XML Applications
  • UCF is capable of transferring all nodes in a virtual document in a single operation.
  • UCF is capable of pulling down supporting documents (DTDs, Schemas, Entities, Modules, XML Editor customizations, etc) for xml content that is managed by an XML Application. Also UCF will perform DOCTYPE fixup so that the xml content points to the local filesystem location for the DTD/Schema
  • Similar to DFC, the UCF runtime can interact with the client registry. This information can then be used by UCF to streamline certain transfer operations.
  • At least as of 5.3, UCF would attempt to use HTTP 1.1 Chunked Encoding to send data. This reduces the amount of HTTP header information that has to be sent in band with large transfers. Unfortunately there is an issue where the HTTP header is split in such a way that certain firewalls (CheckPoint with SmartDefense enabled) will block UCF traffic. It is possible to turn off chunked encoding via the UCF config files.
  • UCF has issues traversing an HTTP proxy that requires authentication. This can be tweaked to some degree via UCF config, but there are some situations where UCF will simply fail to get through a proxy that requires auth.
  • Prior to D6, the UCF runtime would terminate when the launching browser closed. As of D6 this is no longer true. Instead the UCF client times out after a configurable period of inactivity (15 minutes by default.) I assume that Documentum is intending to re-bind to an existing UCF client when a browser is closed and re-opened in order to reduce the spin-up time associated with loading a JVM. However this re-binding does not appear to work as of D6SP1. This is especially noticeable when you use DFS as a stateless transport (i.e. you reconnect for each interaction.) In this case, you can end up with many UCF runtime instances all running concurrently taking CPU and Memory resources. Also it seems that if the UCF runtime timesout while you are in Webtop and you then attempt a UCF operation. There can be some failure in re-binding that results in your browser freezing up and/or a failed transfer.

Some of the above info is rather negative. I do think that UCF is a great concept to provide clients with a light-weight enhanced transfer mechanism, but there are quite a lot of issues with it as well. I have been told that a get-well program is underway for UCF so am expecting big improvements sometime soon.

Documentum Session Pooling and DFS

I’ve spent the last few days debugging a strange issue with a service we built using the Documentum Foundation Services (DFS) framework. After a period of usage we were receiving sporadic authorization failure messages when folks attempted to use our custom service and even to a lesser degree when folks tried to use DFS core services.

In this case our UI is an XML editor built on top of XMetaL. We have a COM component that was written in C# that consumes our custom service and also makes calls to DFS core services. The test team discovered that they could setup a search and click the search button repeatedly, sometimes the search would work and sometimes they would receive authorization errors.

It seems obvious now, but I proved without a shadow of a doubt that these requests were all valid coming from our COM comentent into the DFS registration framework via SOAP (i.e. the credentials were valid and were not getting munged by the network.)

We have also been seeing a few other issues that appear now to be related: we have been running out of file handles in our WebLogic instance (that hosts DFS and our custom service). We have also been running out of database connections in the Oracle instance that hosts our docbases.

One of WebLogic’s startup scripts sets the file handle ulimit to 1024 on unix boxes. As an asside this seems a tad low for Webtop/DAM/WebPublisher/etc implementations esp if you have several webapps deployed under a single managed container. We upped this to 2048 and it seemed to help with the file handles issue.

Also we have two docbases sharing the same Oracle database, so we upped our process limit in Oracle (more on this in a bit.) Again, this helped some, but still didn’t fully resolve the issue.

While the error message being surfaced to the end user was invariably about authorization failure. There were several different causes reported in the “caused by” sections of the stack traces in the server logs. When we could identify a root cause from the stack trace it often had to do with running out of docbase sessions, or database connectivity problems.

Keep in mind that DFS is supposed to be essentially stateless. So you have to re-auth prior to any transaction you wish to perform. As such any connectivity issues are likely to be reported as authorization failures. So for example if you run out of docbase sessions, you get an auth failure. If you run out of database processes, you get an auth failure, if the network has a hickup, you get an auth failure.

It turns out there was a bug in our code. We were (tisk-tisk) storing an IDfSession in a member variable for the duration of the service handlers lifetime (several minutes for an interaction with the docbase that typically lasts < 1 second.) This was of course easy to resolve by requesting the session on demand from a the session manager that DFS hands out, and then releasing said session when we are done with it.

The following para is speculation, somewhat borne out by experimentation: It seems that WebLogic was killing the service handler after some period of activity and that doing so resulted in the TCP connection backing the stored session going into a CLOSE_WAIT state indefinately (i.e. we were leeking TCP file handles). Unfortunately these were not getting re-used. This and the 1024 file handle limit WebLogic imposes on itself by default seem to explain why we were running out of file handles.

Not storing a session in a member variable went a long way towards resolving our issue. However, as I had built a test harness for pushing lots of requests through DFS and our custom service I did some additional stress testing and discovered that even with this bug resolved, it was pretty easy to run out of docbase sessions and that we were still running out of database processes.

First lets talk about the database:

Experimental evidence showed that a single docbase just after startup uses about 22 database connections (Oracle process) before any user sessions are opened up. And that each session accounts for 1 database connection. [sql used: select count(*) from v$session]

We had a vanilla setup, so the docbases were configured (via server.ini) to allow upto 100 concurrent sessions. And the Oracle database was configured to allow upto 150 processes.

Since we have two docbases pointing at the same database, we had to up the database processes to allow for both docbases to max out on concurrent sessions and for the baseline database connections introduced by the database itself and the docbase processes. For folks who are messing with the max concurrent docbase sessions it seems that you ought to set your number of processes to be greater than: (35 * number-of-docbases) + (SUM(over-docbases, max-concurrent-connections)). [yes there is some fudge factor in there].

Now let’s talk about docbase connections:

I’m not really sure why, but it seems to be pretty easy to get DFS (or services built on top of DFS) to max out concurrent docbase connections unless you do some extra work. This is exceedingly bad because once you run out of concurrent connections, you can get additional sessions for users or even to run administrative tools like iapi and idql.

As far as I can tell there is one IDfSessionManager associated with each service handler instance. I don’t understand the internals of the session manager, but I would expect sequential requests (where a request does a getSession() and a releaseSession()) to result in a session pool of roughly one session (in my test case all requests were with the same login credentials.) I have seen these session pools grow pretty much without bounds until you run out of concurrent docbase sessions.

After re-reading the Fundamentals doc and the dfcfull.properties file I began to understand some more about how the documentum session managers work.

The scheme is as follows. The session manager maintains a level 1 cache of sessions. Once a session has been released into the cache, a subsequent request for a session using the same credentials as the session has will result in the session being re-used. If session pooling is enabled (I turned it on explicitly in dfc.properties because I couldn’t tell if it would be enable or not by default — docs seem to be conflicting) and a session has been in the level 1 cache for a timeout interval (also configured in dfc.properties) then the session will migrate into the level 2 cache. Sessions in the level 2 cache will be rebound so they can be used even if the session request is using different credentials than what is stored in the cached session instance.

It appears that by default a single dfc instance will not limit the number of sessions it can hold until the servers max concurrent sessions have been reached. If the server max is reached, there does seem to be some re-jiggering, but it’s hard to say.

The moral of the story here appears to be that if you have an app that’s likely to build up a backlog of sessions in the session pool (DFS and services created with the DFS framework appear to fit this bill) you should use the dfc.properties to limit the maximum number of session for each such application.

I’ve heard rumblings recently that MTS may also fit the category of an app that will backlog too many sessions (thus starving other clients.)

For now I think we are going to try 20 sessions for dfs, and 20 sessions for our custom service. I’m not sure if it’s worth limiting Webtop and DA or if were better off letting these apps manage their session pools, since they seem to do a good job of this, and it’s hard to say how to balance sessions for these apps. We will of course have to up our max concurrent docbase sessions so that we can have a good number of users on the system at once, and we will have to up the Oracle process limit accordingly. We will also have to keep an eye on the number of file handles used by WebLogic in order to make sure it doesn’t use more handles than it has allocated itself.