ArcGIS Web Adaptor 11.1 App Pool freezes

Scott_Tansley · ‎05-21-2023

Hi.

I've recently upgraded a client to ArcGIS 11.1, and we're having random problems with the new web adaptors. I'm looking to see if anyone has observed the same issues.

So the clients ArcGIS Enterprise was deployed at 10.8.1. There's an IIS Web Server in the DMZ. A single host with the rest of the base deployment, which is only used for Hosted Feature Services. WA's exist for portal and hosting. There is a third machine with a general-purpose ArcGIS Server, federated and primarily serving Map Image Layers. There is a Web Adaptor called server.

There have never been any repeated outage issues. The environment was upgraded to 10.9.1 last year. Once again no issues.

They were upgraded to 11.1 two weeks ago. Immediately, we found that the machine was running out of memory, we noted the advice given in the new dependencies, and increased the RAM from 4 to 8GB. It sits at around 5-6GB with no issues, and we have not seen any spike above 6GB to date.

After adding RAM it all seemed to settle for a few days. But now, every couple of days the IIS application pool for the 'server' web adaptor will just stall. IIS logs show 200/304 responses for everything up to the freeze/stall and 500 for everything. There is nothing untoward in the requeste.

ArcGIS Server is still available on 6443 and can be accessed. It just isn't receiving requests from IIS. With Info logging turned on, it shows the last good 200 request from the WA. Then nothing, no errors, no issues. It's just as if it's sat there waiting for a request and not receiving it.

There have been no firewall or environmental changes recently, the only change is the upgrade to 11.1 and the addition of memory.

On the web server there is nothing in event viewer, system/admin/security or IIS application logs.

I'm blind. It's just as if the App Pool WA says I've had enough.

The only way to bring the application back online is to restart IIS. On the AppPool you can stop it. But it will not start unless IIS is restarted.

I'm currently blind. We've external ping monitoring in place so we know when the healthCheck API fails, but there's nothing else we can do but monitor and restart at this point.

Scott Tansley
https://www.linkedin.com/in/scotttansley/

Scott_Tansley · ‎05-24-2023

Thanks for this. It's Win Server 2019.

Scott Tansley
https://www.linkedin.com/in/scotttansley/

DavidColey · ‎05-26-2023

We are not experiencing this on Windows Server 2016.

JonathanPollack · ‎05-24-2023

We also are having this issue. WebAdapter seemingly randomly craps out. We are able to just restart/recycle the AppPool to get things working again but this is a huge pain/issue.

Scott_Tansley · ‎05-24-2023

thanks for confirming. I can confirm that a support ticket has been raised with our local distributor. It's early days, but it's good (bad?) to know we're not alone with this.

Scott Tansley
https://www.linkedin.com/in/scotttansley/

JasonHarris2 · ‎05-30-2023

Same here. Win 2019. Only fix is app pool recycle. Happening on 3 systems that were just upgraded. We've had a ticket open with Esri, but havent gotten anywhere. Anyone here make any progress?

Scott_Tansley · ‎05-30-2023

Thanks for sharing see workaround below. Seems a logical approach until we hear from Esri.

Scott Tansley
https://www.linkedin.com/in/scotttansley/

LukeSavage · ‎05-30-2023

Fixed it as a workaround. For which config change that solved it, I have no idea but maybe this will help the community.

Scott_Tansley · ‎05-30-2023

That looks a good approach. I like it.

Out of interest, did you try increasing the instances without the queue length? Or did you just go straight to belt and braces?

I’m assuming it eats memory with more instances?

Scott Tansley
https://www.linkedin.com/in/scotttansley/

LukeSavage · ‎05-30-2023

Didn't see too much of a hit. Honestly, the server behaved better. And yes, I just updated both based on Microsoft articles about appPool instances for .net core and queue length for multiple requests.

JasonHarris2 · ‎05-31-2023

So far, that fix is holding! Applied to 3 systems that where experiencing the issue and we've been up for 24 hours without incident.