EPiServer CMS, Lucene.NET and a recycling application domain

In EPiServer

This blog post is about troubleshooting seemingly random recycles of the application domain. We take a look at the symptoms, troubleshooting and finally a resolution to the problem. This case is related to EPiServer CMS Lucene.NET, but the same troubleshooting approach can be applied to the same symptoms.

Introduction & background

The website in question implements EPiServer CMS and EasySearch.

EasySearch is an addition to EPiServer CMS which incorporates the search engine Lucene.NET. Lucene.NET provides an event-driven architecture to keep the search index up to date for pages in EPiServer CMS.

For first time use the search index needs to be created. This is done by iterating through the page tree from start page and onwards by the use of a scheduled job.

The website also implemented a custom page provider to serve content from back-end systems. The content served by the custom page provider was the primary data source that needed to be indexed, containing roughly 20 000 items.

Problem & behaviour

bugWhen scheduling and running the EasySearch indexing job the application domain serving our website would recycle seemingly at random.

There were no event entries in the log, nor any exceptions caught by the scheduled indexing job. The application domain would simply recycle without a reason as to why.

The indexing job could run between anything between 5 and 10 minutes before the application domain was recycled and the thread executing the indexing job subsequently lost.

Troubleshooting

Debugging the process

The application domain will be forced to recycle if an unhandled exception occurs in a non-request thread. It will also bring down the worker process and all the contained application domains with it. This is bad.

magnifyingglassHow does this relate to EPiServer CMS and scheduled jobs? A non-request thread is a thread that has been initiated without an actual HTTP request. In our case this would be a thread instantiated by the scheduler service.

Other “unhandled” exceptions are actually caught by ASP.NET error handler which usually displays the yellow screen of death.

Troubleshooting thus begun by simply attaching a debugging session to the process and instructing Visual Studio to break whenever an exception occurred.

This way we should be able to catch the exception before our worker process is mercilessly gunned down. No exception was caught, however, and the worker process died with a last cry of “-2”, bringing the application domains within with it.

log4net and ELMAH

Enter log4net and ELMAH. These are both powerful tools to monitor the behaviour of your website.

ELMAH should catch any unhandled exceptions and log them to persistent storage. log4net can log information about how your code is being executed and the data processed.

I was hoping these tools would reveal bad data as we were traversing the page structure served  by the custom page provider.

But after numerous attempts there were no correlation to the data. It seemed to be very random – but it never really is, is it?

A breakthrough?

Health monitoring for the application pool was turned off. The website was also instructed never to time out any requests.

Instead of scheduling the indexing job it was manually triggered. I expected it to fail as it always do, but instead it just steamed on. Eventually finishing and leaving us with an indexed website. Huh?

The difference between manually triggering the job and having the scheduler service run it is basically access to the HttpContext.Current-object which only exists when there is an actual HTTP request.

Hoping the lack of an HttpContext-object was the root of the problem I implemented nullreference-checks and try/catch blocks where needed. But still the same behaviour.

HttpRuntime and reflection

Adam Najmanowicz had a similar problem with an application domain going down without leaving a clue as to why. Check out his blog post about it.

Using reflection we can query HttpRuntime in the Application_End event why the application domain is being recycled. The code below is stolen from Tess Fernandez blog post on the subject.

jumpThis revealed the following message in the event log: “Overwhelming Change Notification in <X:PhysicalPathToTheWebsite>”.

The sheer amount of writing to the index files which were located directly beneath the website root apparently caused ASP.NET to recycle the application domain due to too many change notifications.

By moving the index folder to a different location the problem was solved. No more recycling of the application domain.

Lesson

Do not keep files which are frequently written to directly beneath the website folder. This may cause ASP.NET to recycle the application domain.

Thanks for reading!

daniel
daniel
Developer
Recent Posts
  • Celia Black

    Love this blog post – I wish more people would blog about their methods for troubleshooting! great job 🙂

Contact Us

We're not around right now. But you can send us an email and we'll get back to you, asap.

Not readable? Change text. captcha txt

Start typing and press Enter to search