We’ve had instances where we’ve had to restart the appserver services because users can’t log in but when we check in PET, the status is Active. What could be causing the issue with not allowing users to log in when the status is available and active? Restarting the services resolves the issue but trying to determine what is causing the issue of this “false positive” status. The only error message in the log files we see around the time the issue occurs is below.
[17/07/23@16:34:31.086-0500] P-008776 T-007892 1 AS – as/activate.p activation procedure ended with an STOP condition. (8026)
[17/07/23@16:34:31.086-0500] P-008776 T-007892 1 AS – (Procedure: ‘as/deactivate.p’ Line:111) Invalid or inappropriate handle value given to RUN…IN statement. Procedure ‘as/deactivate.p’:111. (2128)
[17/07/23@16:34:48.157-0500] P-008776 T-007892 1 AS – (Procedure: ‘Activate manager/Session.p’ Line:3862) Tried to access a record without a valid id. (6167)
[17/07/23@16:34:48.159-0500] P-008776 T-007892 1 AS – as/activate.p activation procedure ended with an STOP condition. (8026)
[17/07/23@16:34:48.159-0500] P-008776 T-007892 1 AS – (Procedure: ‘as/deactivate.p’ Line:111) Invalid or inappropriate handle value given to RUN…IN statement. Procedure ‘as/deactivate.p’:111. (2128)
[17/07/23@16:38:47.985-0500] P-001764 T-008796 1 AS – (Procedure: ‘Activate manager/Session.p’ Line:3718) Lock wait timeout of 180 seconds expired (8812)
[17/07/23@16:38:47.985-0500] P-001764 T-008796 1 AS – as/activate.p activation procedure ended with an STOP condition. (8026)
[17/07/23@17:47:05.481-0500] P-008476 T-009620 1 AS – (Procedure: ‘DeleteSession manager/Session.p’ Line:4204) Lock wait timeout of 180 seconds expired (8812)
[17/07/23@17:47:05.482-0500] P-008476 T-009620 1 AS – as/disconnect.p disconnect procedure ended with an STOP condition. (8026)
1 - Enable deadlock tracing on your SQL server to help correlate some of the issues and processes involved
2 - Look at dataserv.lg in oe102a_WRK folder and see if there are some errors pointed to SQL.
The lock wait timeout is usually a pretty good indicator that you have some backend issues or some large processes consuming most of your resources. Also make sure you have no intermittent network disconnects between app server and SQL (even though this usually shows more in the log)
Any thoughts as to why the appservers would remain active but no users can log in? We have monitoring on when the services go down but this would not show up because it is active but no users can log in.
I do not have a detailed explanation of why this happens, maybe @aidacra can help out. This has been an issue for a long time and the app server does not gracefully shutdown or reconnect when there are SQL connectivity or other issues with SQL. There are other ways to monitor the app server using Powershell or log parsers that can provide ways to detect an issue like this since a service or process monitor will just let you down.
I agree with Dan, I suspect you are getting some sort of blocking on either your userfile or webuser table. It looks like those tables are the ones that store the date last logged in. I am not sure if they get updated on logout or login.
Judicious use of SQL profiler might help also
I’d also do a quick check of the tips or answer books in Epicweb using phrases such as login, userfile, webuser and see what comes up.
If this is an ongoing thing, I would be reviewing your SQL and Appserver configurations and review any networking if you are on a split server configuration. Have you done any Windows updates lately? Then perhaps you can nail it down to a time frame that the issues started occurring…
We have four load balancing appservers so that if one appserver goes down they can still log into one of the other appservers. We are looking at why our load balancer isn’t working as expected but we are trying to determine the cause why the appserver suddenly went down without noticing in PET. Even if our load balancer was working, there would be no way to determine that specific appserver is down until we do our weekly restarts. Would the blocking issue cause the appserver to go down so other users can’t log in also?
Your load balancer has health checks to determine if the app servers are available and some are more robust than others. I assume if you shut off an application server it works fine? Are you using a load balancing appliance (i.e. F5) or possibly NLB? Blocking is somewhat normal but when deadlocking occurs, that is not. The deadlocking and termination of the certain database processes is what will make it so no other users can access that application server.
Chia it is difficult to say unless you inspect SQL server at the time of the problem, you should be able to tell with an sp_who2 which will show what spids are blocking then a dbcc input buffer() to look at what the blocking spid is running at the time. This is just one method, tools like who is active are also helpful. Sp_who2 will also show the host that is connected on the blocking process. If anyone has not used it I encourage you to give it a go.
On top of trying to troubleshoot in real-time, like @Hally mentioned, I would encourage you to enable the 1204 and 1222 trace flags on your SQL server so you get the full output of the deadlocks. You can also do this using SQL profiler and enable the deadlock chain tracing. There are also some very simple ways to automate and monitor the blocking for SQL (Google is your best friend for borrowing what someone else created). Just keep in mind that some blocking is perfectly normal in the SQL world.