ARM ISSUE loosing connectivity

ARM ISSUE loosing connectivity

We investigated the ARM database issue on the clients server today and observed the following behavior.

 

-          Reports are unable to run due to a query timeout

-          The ARM database is constantly disconnecting

-          The ARM queue reaches the max size of 1000 every 300 seconds with many items in the queue.

-          Sql audit files are being created frequently, resulting in nearly 500MB of audit files that need to be appended to the database.

 

A brief explanation of these sql audit files:

When EFT loses connection to the database, it will write all the data to these local sql files. Once EFT can connect to the database again, it will append these sql files to the database and they will be cleared from the local drive. If there are many sql files, EFT can experience performance issues especially with the admin console.

 

What we see occurring on this server is EFT not being able to keep up with the work that it needs to do. The ARM queue is always full, so it cannot write to the database quickly enough, then it loses connection, writes the data to the sql files, reconnections, then it must do double the work: it must write the usual info to the database, AND now it must write the .sql files to the database as well.

 

I strongly suggest increasing the ARM queue size so that EFT can keep up with the 100k+ records that are sent to the database every 5 minutes. I also suggest sending all the sql files that are in the audit folder over to the database server and manually append these to the database so that EFT does not have to.

 

Here is the documentation to change the ARM queue size:

https://kb.globalscape.com/KnowledgebaseArticle11099.aspx

I suggest increasing to 50,000 or 100,000 to keep up with the traffic.

 

This documentation shows how to increase the timeout for reports:

https://kb.globalscape.com/KnowledgebaseArticle10627.aspx

This may not be necessary after we increase the ARM queue but it can’t hurt to increase this as well.

 

This issue may also be contributing to the issue in case 65150, however I will troubleshoot the behavior as well.