ArcGIS (10.1 SP1) Site and Web Adapter randomly crash and stop responding

15800
25
Jump to solution
05-21-2013 02:59 PM
by Anonymous User
Not applicable
Original User: btelliot

We are struggling with two main problems since moving from ArcServer 10.0 to 10.1.

1. Poor Performance
2. Constant Server downtime / general site instability.

Our ideal server architecture would be to have a multiple virtual machine site, with 2 clusters, and a single web adapter running only using SSL.  See attached image for configuration.
[ATTACH=CONFIG]24566[/ATTACH]

We currently have about 350 services running on our site. 

  • ~300 of which are configured with a minimum instance of 0 (should turn themselves off) and a max instance of 2.

  • ~20-30 are cached

  • All running in High-Isolation

  • licensed for 4 cores per machine and additional staging license (12 cores total).

  • 16GB ram per machine.

  • web adapter is running with 1 core.

Performance

Our main issue with performance comes from administering / publishing services.  Since we have multiple machines, we need to reference the config store from a UNC path.  This is a known bug that should be fixed in SP2.  (Why they haven't released a hotfix for this is beyond me).  For more details see thread:  http://forums.arcgis.com/threads/66388-Slow-performance-administering-services-in-ArcCatalog-and-Arc...

However, we also have performance issues on the web client side of things.  These issues are intermittent and and difficult to replicate.  We can measure this latency using the Network tab of the "Developer Tools" in google chrome.  It will sometimes take 3-5 minutes for the server to return the data to the web browser, even on cached services that are already running

Depending on our configuration and the complexity of the MXD, publishing a service usually takes around 5 minutes at the best of times.  At the worst, republishing an existing map document can take up to 30 minutes.  If anyone else has experienced any of these issues please let me know!

We have monitored our system resources on the virtual machines, and we rarely hit upwards of 30% CPU usage, unless caching or restarting the machines.

Stability

Since moving to 10.1, we have maybe had a maximum of 1 week go by without a server outage / issue.  As we are growing as a company, more people are relying on our services in their workflows, and downtime becomes less and less bearable. In theory, a multiple machine site should be more stable.  One server goes offline, the web adapter recognizes this and redirects the traffic to a different server.

Main Issue:

  • We have noticed that ArcServer running on one of the machines will periodically crash and stop working.

  • We don't see a spike in system resources, or any other telltale signs on the vms, it just stops responding as it should.

  • We will experience this at least once a day.

Our normal fix is:

  • Check to see if the web adapter is responding; if not, restart the VM

  • Check to see if each individual machine is responding (try to log-into the ArcGIS Service Manager); if not, restart the VM

  • Reboot whichever server is crapping out, if that doesn't work, try the other one(s).

  • If it still doesn't work, try stopping the machine from https://[machinename]:6443/arcgis/admin, and then starting it again.

If anyone has some insight into what may be causing this issue, please let us know.

Thank you for taking the time to read this!

Brett



TL;DR: ArcServer 10.1 SP1 is still very buggy.  ArcServer will randomly stop working, and we will need to reboot the virtual machine it is running on.  We have to do this A LOT.
1 Solution

Accepted Solutions
by Anonymous User
Not applicable
Original User: btelliot

Update #4

Our server is now running okay, although our performance is fairly sluggish right now.

Our problem was resolved by undoing the modifications to the heap sizes that ESRI Tech Support recommended. 

We are currently back to using the defaults for App server maximum heap size and SOC maximum heap size (256 and 64 MB respectively).


Lessons learned:


  • Our config-store was most likely corrupted

  • Rebuilding our site solved this problem

  • Modifying our maximum heap size did not increase server performance, and caused instability

  • We may need to do further testing to find an optimum heap size for maximum performance and stability

I will be on vacation for the next 3 weeks, so I won't be able to update this thread.  Thanks for the help Richard.

-Brett

View solution in original post

0 Kudos
25 Replies
RichardWatson
Frequent Contributor
When ArcGIS Server crashes the system generates a crash dump file.

I believe that the dumps are placed in the config-store.  On my machine there are here:

C:\arcgisserver\logs\MachineName\errorreports

Here is example of one from my system:

ParcelEditing_MapServer_3664_0.dmp

Crash dump files can provide insights into what the underlying problem is.  ESRI has the best chance of reading these because they have the source code. 

If you post yours then I'll look at them and see if I see anything.  Post them to one of the cloud storage providers and post a link here.
0 Kudos
by Anonymous User
Not applicable
Original User: btelliot

When ArcGIS Server crashes the system generates a crash dump file.

I believe that the dumps are placed in the config-store.  On my machine there are here:

C:\arcgisserver\logs\MachineName\errorreports

Here is example of one from my system:

ParcelEditing_MapServer_3664_0.dmp

Crash dump files can provide insights into what the underlying problem is.  ESRI has the best chance of reading these because they have the source code. 

If you post yours then I'll look at them and see if I see anything.  Post them to one of the cloud storage providers and post a link here.


Here are the crash dumps from the last 24 hours or so.

https://www.dropbox.com/sh/k86kfx5710b43kb/L4XHgt3rxl

Let me know if you have any troubles accessing the files.
RichardWatson
Frequent Contributor
Analyzing crash dumps is a bottom up debugging approach.  What you do is look at the call stacks.  It will tell you whether or not 2 crashes are the same problem and often times the function names will give you some insight into what the code was doing before things went wrong. 

For example, one pattern below is a crash which occurs when trying to convert a symbol using the unique value renderer for a layer.  Perhaps you can see something wrong (or special) about the symbols for this in the map document?

6515_Internal_6515_EATL_Imagery_Cached_MapServer_25216_0.dmp

MapServerX!MapServerXConverter::PrepareImageDescriptionDef+0x530
MapServerX!MapServerX::GetLegendInfo+0x22c
MapServerX!MapServerX::HandleREST_LegendResource+0x8ee
MapServerX!MapServerX::HandleREST_RootResource+0x121e
AfCore!RESTSupport::RESTDispatcher::Dispatch+0xa7c
AfCore!RESTSupport::RESTDispatcher::HandleRESTRequest+0x135
AfCore!ServerObjectBase::HandleRESTRequest+0xdc
ntvinv!finvke+0x83

6515_Internal_t6515_Construct_MapServer_23020_0.dmp

MapServerX!MapServerXConverter::PrepareImageDescriptionDef+0x530
MapServerX!MapServerX::GetLegendInfo+0x22c
MapServerX!MapServerX::HandleREST_LegendResource+0x8ee
MapServerX!MapServerX::HandleREST_RootResource+0x121e
AfCore!RESTSupport::RESTDispatcher::Dispatch+0xa7c
AfCore!RESTSupport::RESTDispatcher::HandleRESTRequest+0x135
AfCore!ServerObjectBase::HandleRESTRequest+0xdc
ntvinv!finvke+0x83

6515_Internal_t6515_Wildlife_MapServer_22660_0.dmp

MapServerX!MapServerLayerDrawingDescUtility::ConvertSymbol+0x2aa1
MapServerX!JSONSerializerCartoX::QueryJSONSymbol+0x17b
MapServerX!JSONSerializerCartoX::QueryJSONUniqueValueRenderer+0x8af
MapServerX!JSONSerializerCartoX::QueryJSONFeatureRenderer+0xef
MapServerX!JSONSerializerCartoX::QueryJSONLayer+0x212e
MapServerX!JSONSerializerCartoX::QueryJSONLayer+0x1600
MapServerX!JSONSerializerCartoX::AddJSONLayers+0x174
MapServerX!JSONSerializerCartoX::QueryJSONLayers+0xae
MapServerX!MapServerX::HandleREST_LayersResource+0x1163
MapServerX!MapServerX::HandleREST_RootResource+0x121e
AfCore!RESTSupport::RESTDispatcher::Dispatch+0xa7c
AfCore!RESTSupport::RESTDispatcher::HandleRESTRequest+0x135
AfCore!ServerObjectBase::HandleRESTRequest+0xdc
ntvinv!finvke+0x83

6515_Internal_t6515_Wildlife_MapServer_29532_0.dmp

MapServerX!MapServerLayerDrawingDescUtility::ConvertSymbol+0x2aa1
MapServerX!JSONSerializerCartoX::QueryJSONSymbol+0x17b
MapServerX!JSONSerializerCartoX::QueryJSONUniqueValueRenderer+0x8af
MapServerX!JSONSerializerCartoX::QueryJSONFeatureRenderer+0xef
MapServerX!JSONSerializerCartoX::QueryJSONLayer+0x212e
MapServerX!JSONSerializerCartoX::QueryJSONLayer+0x1600
MapServerX!JSONSerializerCartoX::AddJSONLayers+0x174
MapServerX!JSONSerializerCartoX::QueryJSONLayers+0xae
MapServerX!MapServerX::HandleREST_LayersResource+0x1163
MapServerX!MapServerX::HandleREST_RootResource+0x121e
AfCore!RESTSupport::RESTDispatcher::Dispatch+0xa7c
AfCore!RESTSupport::RESTDispatcher::HandleRESTRequest+0x135
AfCore!ServerObjectBase::HandleRESTRequest+0xdc
ntvinv!finvke+0x83

7894_TMEP_FISHERIES_MapServer_11600_0.dmp

MappingCoreLib!PreviewHelper::CreateCanvas+0x32c
MappingCoreLib!PreviewHelper::ClearImage+0x62
MapServerX!JSONSerializerCartoX::QueryJSONSymbol+0x17b
MapServerX!JSONSerializerCartoX::QueryJSONSimpleRenderer+0x188
MapServerX!JSONSerializerCartoX::QueryJSONFeatureRenderer+0x114
MapServerX!JSONSerializerCartoX::QueryJSONLayer+0x212e
MapServerX!JSONSerializerCartoX::AddJSONLayers+0x174
MapServerX!JSONSerializerCartoX::QueryJSONLayers+0xae
MapServerX!MapServerX::HandleREST_LayersResource+0x1163
MapServerX!MapServerX::HandleREST_RootResource+0x121e
AfCore!RESTSupport::RESTDispatcher::Dispatch+0xa7c
AfCore!RESTSupport::RESTDispatcher::HandleRESTRequest+0x135
AfCore!ServerObjectBase::HandleRESTRequest+0xdc
ntvinv!finvke+0x83

7894_TMEP_t7894_Routing_MapServer_8196_0.dmp

BGLAPI!BGL::BGLRasterCanvas::CreateNewGroupImpl+0x271
BGLAPI!BGL::BGLCanvas::BracketOpenIsolatedGroup+0x1bc
MappingServicesLib!LayerBlendingModeMgr::LayerBlendingModeMgr+0x91
MappingServicesLib!Map2DRenderingService::DrawViewLayers+0x204
MappingServicesLib!Map2DRenderingService::DrawAnnotationPhase+0x1ce
MappingServicesLib!Map2DRenderingService::DrawMapView+0x59f
MappingServicesLib!BasicExportMapService::DrawLoop+0x19e
MappingServicesLib!BasicExportMapService::ExportMapViewPriv+0x240
MappingServicesLib!BasicExportMapService::ExportMapView+0x3e
MappingServicesLib!DynamicMapService::ExportMapImage+0x6f6
MapServerX!MapServerX::HandleREST_ExportOperation+0xd06
AfCore!RESTSupport::RESTDispatcher::Dispatch+0xa7c
AfCore!RESTSupport::RESTDispatcher::HandleRESTRequest+0x135
AfCore!ServerObjectBase::HandleRESTRequest+0xdc
ntvinv!finvke+0x83

7894_TMEP_t7894_Routing_MapServer_11760_0.dmp

BGLAPI!BGL::BGLRasterCanvas::CreateNewGroupImpl+0x271
BGLAPI!BGL::BGLCanvas::BracketOpenIsolatedGroup+0x1bc
MappingServicesLib!LayerBlendingModeMgr::LayerBlendingModeMgr+0x91
MappingServicesLib!Map2DRenderingService::DrawViewLayers+0x204
MappingServicesLib!Map2DRenderingService::DrawAnnotationPhase+0x1ce
MappingServicesLib!Map2DRenderingService::DrawMapView+0x59f
MappingServicesLib!BasicExportMapService::DrawLoop+0x19e
MappingServicesLib!BasicExportMapService::ExportMapViewPriv+0x240
MappingServicesLib!BasicExportMapService::ExportMapView+0x3e
MappingServicesLib!DynamicMapService::ExportMapImage+0x6f6
MapServerX!MapServerX::HandleREST_ExportOperation+0xd06
AfCore!RESTSupport::RESTDispatcher::Dispatch+0xa7c
AfCore!RESTSupport::RESTDispatcher::HandleRESTRequest+0x135
ntvinv!finvke+0x83

7894_TMEP_WILDLIFE_MapServer_13984_0.dmp

MappingServicesLib!BasicExportMapService::BasicExportMapService+0x14d
MappingServicesLib!DynamicMapService::DynamicMapService+0x97
MapServerX!MapServerX::LoadDynamicMapServices+0x10d6
MapServerX!MapServerX::Connect+0x9a3
MapServerX!MapServerX::Construct+0xc55
ServerContainer!CServerObject::ConstructOrValidate+0x19f
ServerContainer!CServerObject::Construct+0x22
ntvinv!finvke+0x83

8000_to_8099_t8018_Access_MapServer_17616_0.dmp

MapServerX!MapServerLayerDrawingDescUtility::ConvertSymbol+0x2aa1
MapServerX!JSONSerializerCartoX::QueryJSONSymbol+0x17b
MapServerX!JSONSerializerCartoX::QueryJSONSimpleRenderer+0x188
MapServerX!JSONSerializerCartoX::QueryJSONFeatureRenderer+0x114
MapServerX!JSONSerializerCartoX::QueryJSONLayer+0x212e
MapServerX!JSONSerializerCartoX::AddJSONLayers+0x174
MapServerX!JSONSerializerCartoX::QueryJSONLayers+0xae
MapServerX!MapServerX::HandleREST_LayersResource+0x1163
MapServerX!MapServerX::HandleREST_RootResource+0x121e
AfCore!RESTSupport::RESTDispatcher::Dispatch+0xa7c
AfCore!RESTSupport::RESTDispatcher::HandleRESTRequest+0x135
AfCore!ServerObjectBase::HandleRESTRequest+0xdc
ntvinv!finvke+0x83

8000_to_8099_t8018_TRIM_Watercourse_MapServer_21908_0.dmp

MapServerX!MapServerXConverter::PrepareImageDescriptionDef+0x530
MapServerX!MapServerX::GetLegendInfo+0x22c
MapServerX!MapServerX::HandleREST_LegendResource+0x8ee
MapServerX!MapServerX::HandleREST_RootResource+0x121e
AfCore!RESTSupport::RESTDispatcher::Dispatch+0xa7c
AfCore!RESTSupport::RESTDispatcher::HandleRESTRequest+0x135
AfCore!ServerObjectBase::HandleRESTRequest+0xdc
ntvinv!finvke+0x83

8200_to_8299_t8282_Arky_Site_MapServer_22972_0.dmp

MapServerX!MapServerXConverter::PrepareImageDescriptionDef+0x530
MapServerX!MapServerX::GetLegendInfo+0x22c
MapServerX!MapServerX::HandleREST_LegendResource+0x8ee
MapServerX!MapServerX::HandleREST_RootResource+0x121e
AfCore!RESTSupport::RESTDispatcher::Dispatch+0xa7c
AfCore!RESTSupport::RESTDispatcher::HandleRESTRequest+0x135
AfCore!ServerObjectBase::HandleRESTRequest+0xdc
ntvinv!finvke+0x83

10000_to_10099_t10075_Routing_MapServer_13208_0.dmp

AfCore!FixedBlockHeap::Alloc+0x11e
Geometry!ESRI::Line::operator new+0x11
Geometry!ESRI::Line::Densify+0x284
Geometry!ESRI::EditPolyBase::BasicDensify+0xa10
Geometry!ESRI::EditPolyline::Densify+0x4d
Geometry!ESRI::PolyBase::Densify+0x94
Geometry!ProjectedCoordinateSystem::UpdatePCSHorizon+0xcd0
Geometry!ProjectedCoordinateSystem::GetPCSHorizon+0x38
Geometry!ESRI::GeometryEnvironment::s_ClipToPCSHorizon+0x13f
Geometry!ESRI::Envelope::ProjectExImpl+0xb3e
Geometry!ESRI::Envelope::ProjectEx5+0x77
Geometry!ESRI::MultiPatch::Project+0x31
MappingCore!Map::QueryFullExtent+0x2a7
MapServerX!MapServerXConverter::CreateMapServerInfo+0x548
MapServerX!MapServerX::GetServerInfo+0xa9
MapServerX!MapServerX::GetServerInfoInternal+0x213
MapServerX!MapCookerX::ConnectX+0x36e
MapServerX!MapServerX::ConnectToCache+0x22d
MapServerX!MapServerX::InitCacheProperties+0xf75
MapServerX!MapServerX::Construct+0xd6b
ServerContainer!CServerObject::ConstructOrValidate+0x19f
ServerContainer!CServerObject::Construct+0x22
ntvinv!finvke+0x83
0 Kudos
by Anonymous User
Not applicable
Original User: btelliot

The symbols used by these map documents are pretty standard as far as web mapping documents go:
Simple Fill Symbol
Cartographic Line Symbol
Character Marker Symbol
Simple Marker Symbol

I've also noticed that sometimes publishing map documents will cause our site to crash.  I've added a new .DMP file from when the publishing service crashed.
0 Kudos
RichardWatson
Frequent Contributor
System_PublishingTools_GPServer_36288_0.dmp

GpServer!GPServer::HandleBinaryRequest2+0x1f
ntvinv!finvke+0x83

basemaps_Base_Hillshade_MapServer_16552_0.dmp

MappingCoreLib!PreviewHelper::CreateCanvas+0x32c
MappingServicesLib!DynamicMapService::CreateSymbolLegendPatch+0x589
MappingServicesLib!DynamicMapService::CreateLegendPatch+0x102
MapServerX!MapServerX::GetLegendInfo+0x102f
MapServerX!MapServerX::GetLegendInfo+0x46e
MapServerX!MapServerX::HandleREST_LegendResource+0x8ee
MapServerX!MapServerX::HandleREST_RootResource+0x121e
AfCore!RESTSupport::RESTDispatcher::Dispatch+0xa7c
AfCore!RESTSupport::RESTDispatcher::HandleRESTRequest+0x135
AfCore!ServerObjectBase::HandleRESTRequest+0xdc
ntvinv!finvke+0x83
0 Kudos
by Anonymous User
Not applicable
Original User: btelliot

Thanks for looking into this Richard.

I think analyzing call stacks is a bit over my head.

Are each of the rows of text in the call stacks ordered chronologically?  For example in the call stack for basemaps_Base_Hillshade_MapServer_16552_0.dmp:

'MappingCoreLib!PreviewHelper::CreateCanvas+0x32c'  would have run successfully, while the service crashes on 'AfCore!ServerObjectBase::HandleRESTRequest+0xdc
ntvinv!finvke+0x83'??

I don't really know what to make of this.
0 Kudos
RichardWatson
Frequent Contributor
You read the calls stacks from the bottom up.  The function listed first is the one that was executing when the crash occurred.

I am sorry that they were not more helpful to you.

Are you using custom styles?  This is a long shot but we found that some of the styles we had used transparency and that caused very bad behavior.
0 Kudos
by Anonymous User
Not applicable
Original User: btelliot

Hi Richard,

I just got off the phone with Peter Kovalchuk from ESRI Tech Support Canada.  He suggested the following solution:

---
From the ArcGIS Server Administrator Directory, Select �??Machines>[Machine Name]>Edit�?�.

Increase App server maximum heap size (in MB):from 256 to 2048
Increase SOC maximum heap size (in MB): from 64 to 512

---

Peter said that even though we have 16GB of memory allocated to our virtual machines, that our ArcServer software can�??t fully use it because each instance of javaw.exe is limited to 256MB.  When javaw.exe reaches its memory limit, our services will crap out and crash, bringing the site down with it (paraphrasing).

Peter recommended that we increase the maximum heap size to the following settings for all of our machines on the site.  If I have time, I will try adding the other machines back to the site and increasing these values.  He also mentioned that we are one of the only clients that is having this issue with stability, and they have generally seen an increase in reliability and stability in 10.1 from 10.0.

I will make these changes to our server and monitor it for stability over the next few days to see if there is an improvement.

Cheers,

Brett
0 Kudos
RichardWatson
Frequent Contributor
Good to hear!  I really hope that this resolves the instability problems that you are seeing. 

Please report your findings so that everyone can benefit from them.

Did your server logs, as seen in ArcGIS Server Manager, have any errors in them?  In particular, I am wondering about out of memory/heap type errors.
0 Kudos