ArcGIS Web Adaptor 11.1 App Pool freezes

21667
128
05-21-2023 10:04 PM
Scott_Tansley
MVP Regular Contributor

Hi.   

I've recently upgraded a client to ArcGIS 11.1, and we're having random problems with the new web adaptors.  I'm looking to see if anyone has observed the same issues.

So the clients ArcGIS Enterprise was deployed at 10.8.1.  There's an IIS Web Server in the DMZ.  A single host with the rest of the base deployment, which is only used for Hosted Feature Services.  WA's exist for portal and hosting.  There is a third machine with a general-purpose ArcGIS Server, federated and primarily serving Map Image Layers.  There is a Web Adaptor called server.

There have never been any repeated outage issues.  The environment was upgraded to 10.9.1 last year.  Once again no issues.

They were upgraded to 11.1 two weeks ago.  Immediately, we found that the machine was running out of memory, we noted the advice given in the new dependencies, and increased the RAM from 4 to 8GB.  It sits at around 5-6GB with no issues, and we have not seen any spike above 6GB to date.

After adding RAM it all seemed to settle for a few days.  But now, every couple of days the IIS application pool for the 'server' web adaptor will just stall.  IIS logs show 200/304 responses for everything up to the freeze/stall and 500 for everything.  There is nothing untoward in the requeste.

ArcGIS Server is still available on 6443 and can be accessed.  It just isn't receiving requests from IIS.  With Info logging turned on, it shows the last good 200 request from the WA.  Then nothing, no errors, no issues.  It's just as if it's sat there waiting for a request and not receiving it.

There have been no firewall or environmental changes recently, the only change is the upgrade to 11.1 and the addition of memory.

On the web server there is nothing in event viewer, system/admin/security or IIS application logs.

I'm blind.  It's just as if the App Pool WA says I've had enough.

The only way to bring the application back online is to restart IIS.  On the AppPool you can stop it.  But it will not start unless IIS is restarted. 

I'm currently blind.  We've external ping monitoring in place so we know when the healthCheck API fails, but there's nothing else we can do but monitor and restart at this point.

Scott Tansley
https://www.linkedin.com/in/scotttansley/
128 Replies
JasonHarris2
New Contributor III

Sadly, the patch did not work for me either.  Almost immediately saw drops with the WA, going offline for a few minutes at a time. Re-implemeted fixes suggested by @LukeSavage and we look to be ok for now.

0 Kudos
JonEmch
Esri Regular Contributor

Howdy folks,

   We appreciate you all keeping this thread alive with your testing results and experimentation. If you are running into issues that are not addressed by the patch, please log a case with technical support, or reach out to me for assistance with case creation.

Keep on keeping on!
0 Kudos
MichaelSnook
Occasional Contributor III

Should the .NET CLR Version for the WebAdaptor pools be set to 'No Managed Code' since they are dependent on .Net Core?  After the upgrades (and the patch) I am just noticing that they are (both Portal and Server) set to v4.0 still.

0 Kudos
Edgar_W_Iparraguirre
New Contributor III

The issue seems to be triggered whenever de system refresh de Default Application Pool. Not always but sometimes een Web Adaptor's AppPool goes ashtray. Then it suffices recycling the Web Adaptor's AppPool. For this purpose we have created an schedule task which starts a recycling script.

 

0 Kudos
ThomasBuchmann
New Contributor

We encountered the same issue on several installations. The mentioned workaround from @LukeSavage worked on one environment, but not on others.

So, we contacted Esri support and they finally filed a bug:

ArcGIS Web Adaptor (IIS) 11.1 becomes unresponsive when under load, resulting in the inability to ac...

Scott_Tansley
MVP Regular Contributor

Thanks.  My first client initially responded well to the recent patch.   However, they use a town planning application that allows large documents to be uploaded.  That upload caused  a return of the instability.  So far the Luke Savage solution has got them stable again.  A second client has recently gone to production and immediately had stability issues.  Luke's solution worked there as well.  So I'm in the fortunate position that all 11.1 clients are stable.

 

Scott Tansley
https://www.linkedin.com/in/scotttansley/
IanIce1
New Contributor III

Okay it's been 3 weeks after implementing the AppPool tweaks by @LukeSavage. What a difference it's has made. After the second week, I did notice data in a map/app stop drawing. However, I was still able to get a popup, even for data that was partially drawn. Server manager page (web adaptor) actually showed a loading screen (didn't show it before) and eventually loaded after a few seconds to a few minutes. I've noticed this 3x so far. Sometimes the missing/incomplete geometries will appear after a few seconds or will refresh after panning/zooming. 

Another issue we've encountered is with a flood layer that has some complex geometries. 10MB in size, one record (dissolved). In PRO it's completely fine. When published as a copied feature and viewed in a map viewer, our CPU maxes out immediately on the federated/hosting server and nothing draws. At this time, all data in enterprise is down including the server manager page. I observed it's the OpenJDK Platform binary (Java) Application that spikes it to 100%. When we publish it as a referenced feature from PRO, it does the same thing in the map viewers, except it's the ArcSOC's that cause the CPU spike. It did seem to work okay when published as a copied map image with cache. It takes about 10 - 30 minutes for the CPU to stabilize after closing out the map viewers with the data. This layer works just fine in ArcGIS Online as a hosted feature layer. Esri support is looking into this issue.

0 Kudos
LukeSavage
Occasional Contributor II

I'm totally curious.  What's your geometry service max pool set at?  At Cityworks, we use both forward and backward lookups for relationships and by adding in extra pools the performance of grabbing related objects reduced from 30 seconds to instant.  Change Pool from 2 to 8.  I've increased it to 12 because we have 64GB of RAM available for our sales staff.  I know related objects are different but I have not noticed any issues since implementing the workaround on 6 different environments.  A user posted that document uploads were an issue and we upload documents on demo sites and don't have an issue.  When memory leaks and garbage cleanup within code is not handled properly, throwing hardware resources at it can temporarily fix the issue.  I'm curious what your geometry service is set at.

0 Kudos
IanIce1
New Contributor III

Interesting! It's currently defaulted at 2 max instances but something we'll have to consider in this new environment. We can throw more memory at the VM, but 24GB seems like plenty for our operations and is pretty stable under load. CPU's however....we're limited to the 4 cores due to licensing costs. Esri was able to reproduce the issue to some degree so we'll see what they say.

0 Kudos
Edgar_W_Iparraguirre
New Contributor III
To be honest, I do not see a relation between what you are experiencing and the issue with the Web Adaptors.
When publishing a copied future, it is my understanding that AGS will have to copy the data on the fdgb that comes with the sd file to the data store (if a Feature Layer), otherwise the fgdb will go under the arcgisinput and the service will retrieve the data from there. The spike that you observed, lasted long? Being Java it could mean a.o. that the servlet get's too busy, or probably some kind of geo-processing is taking place in the background.
When published as reference and it concerns a complex service, I can image that an SOC will spike, as for every (complex?) request a SOC process have to work itself through the MSD in order to produce an answer (let's not forget also that SOC's are .Net build and not Java).
Also I believe it is not fair to compare the AGOL service response with a "on premises" AGE, there is no way to observe what is happening on the AGOL service. Also, ArcGIS Pro processes does not necessarily have the same amount of processes an AGS have, with latter al of them competing for the same resources.
Edgar.
0 Kudos