Datastore replication issues - Primary has no server logs, standby has logs

1000
6
Jump to solution
01-16-2024 07:22 AM
Labels (1)
Thomas_Puthusserry
Occasional Contributor

Hi all,

I have a datastore associated with a HA deployment (10.9) which has been switched off for a while. I am in the process of upgrading this to 11.1. As part of the regular checks, it is found that the datastores are not synchronised. Looking at the primary datastore, it is not recording any logs in the usual /content/logs/servername/server folder. describedatastore.bat shows log location is empty:

Thomas_Puthusserry_0-1705417100883.png

The standby datastore machine seems working properly as shown below:

Thomas_Puthusserry_1-1705417277708.png

However the datastore mode is: READONLY in standby.

I have tried to make the standby data store VM as primary via ArcGIS Server Admin directory: server/admin/data/items/enterpriseDatabases/AGSDataStore_ds_402mc3l6/machines, but got the following error:

'Failed to change role for data store machine 'NG-DATASTR-SBY.AOW4LVHRJSBULNMTLLXUXOUGKD.ZX.INTERNAL.CLOUDAPP.NET'.
Caused by: Validation checks on data store machine 'NG-DATASTR-SBY.AOW4LVHRJSBULNMTLLXUXOUGKD.ZX.INTERNAL.CLOUDAPP.NET' failed.''

The full logs are attached. 

@HenryLindemann anything you can point me to?

I would like to have the datastore sync working properly before I upgrade to 11.1

There are no critical data in this instance, so I can even remove and reconfigure. However not sure how to dis-associate the datastore from the GIS server.

Thanks for suggestions

Thomas

 

0 Kudos
1 Solution

Accepted Solutions
Thomas_Puthusserry
Occasional Contributor

I have now been able to resolve this:

The issue was failover_on_primary_stop=false (set by default). Talking to @AndrewCord and colleagues from ESRI UK, this has to be set to true on both primary and standby VMs. Once that is set, the failover process works fine.

 

View solution in original post

0 Kudos
6 Replies
hlindemann
New Contributor III

Hi @Thomas_Puthusserry, my first thought would be firewall, the read-only is ok it is the default state, 

I would check with powershell if I can get to 2443 9876 and 6443 from both machines, e.g.

"Test-NetConnection datastore-1 -port 9876" on datastore 2 

"Test-NetConnection datastore-2 -port 9876"  on datastore 1

"Test-NetConnection datastore-1 -port 2443" on datastore 2 

"Test-NetConnection datastore-2 -port 2443"  on datastore 1

"Test-NetConnection arcgisserver -port 6443" on datastore 2 

"Test-NetConnection arcgisserver -port 6443"  on datastore 1

and for the logs that fails to write I would first just reapply windows security on the file structure it might just be a permission problem.

Hope it helps 
regards

Henry

0 Kudos
Thomas_Puthusserry
Occasional Contributor

Thanks @hlindemann  I have now checked the ports: 6443 is not running on these machines. See attached screenshots from our test and dev  machines. (test is 11.1 and dev is 10.9)

As for the the logs that fails to write I would first just reapply windows security on the file structure it might just be a permission problem.

I have updated the permission to the folders for arcgis service account, but nothing changed. No logs are being recorded in the server folder in the Primary machine.

Regards Thomas

 

0 Kudos
hlindemann
New Contributor III

Hi @Thomas_Puthusserry  the 6443 port should be your ArcGIS server DNS, e.g. where you installed it. can you also test from you standby to your primary? I did not see that in you test results.

Regards Henry

0 Kudos
Thomas_Puthusserry
Occasional Contributor

 

Thanks @hlindemann  I missed that part, please see above, that looks fine. I have also checked the port 2443 and 9876 to both data store VMs from arcgis server and are accessible.

Thanks Thomas

0 Kudos
Thomas_Puthusserry
Occasional Contributor

I have been looking into the HA deployment of data store and have done some additional testing. I got into a situation where:

1. After completely reinstalling the datastore in both primary and standby machines, the logs are recording properly

2. Can demote a Primary to Standby and vice-versa: Standby to Primary via the server/administrative rest service.

3. Tried to publish a hosted services when DSVM1machine is 'Primary' and DSVM2 machine is Standby and in the reverse order too (when DSVM2 is Primary and DSVM1 is Standby). The services are accessible when the VM roles are swapped.

What I am not clear about is the scenario where any of the VM become unhealthy and goes down for a while. What exactly happens. In the scenario I tested when DSVM1 is Primary, and when it goes down, DSVM2 is just saying :

ArcGIS Data Store has detected a replication failure.

Not stepping DSVM2 as primary. Is this the expected behavior? 

@hlindemann any suggestions?

0 Kudos
Thomas_Puthusserry
Occasional Contributor

I have now been able to resolve this:

The issue was failover_on_primary_stop=false (set by default). Talking to @AndrewCord and colleagues from ESRI UK, this has to be set to true on both primary and standby VMs. Once that is set, the failover process works fine.

 

0 Kudos