slefain
slefain PowerDork
8/1/20 11:55 a.m.

I'm trying to help a friend with his site. Fresh WordPress build, runs perfectly on the test environment. But put it on the production server (with about 100 simultaneous web users) and it work great for about an hour. After an hour or so the open database connections start creeping upwards, response times starts to climb, and the site grinds to a halt.  No slow queries reported. No heavy DB server load. Just a growing mounting of database connections that smother it.


Caching is enabled, plenty of drive space on AWS, RAM usage is fine. We smoke tested it on the test server for days, no issues. Put it live to the public and it will run like a cheetah until it just collapses. Damnedest thing I've seen.

Keith Tanner
Keith Tanner GRM+ Memberand MegaDork
8/1/20 12:36 p.m.

I don't know WP specifically, but it sounds as if it's not cleaning up after itself. I'd dial down the connection wait timeout. Default is a ridiculous number like 6 hours. Seems to me you could probably get away with something more like 5 minutes unless WP requires a constantly open connection. 

slefain
slefain PowerDork
8/1/20 12:58 p.m.
Keith Tanner said:

I don't know WP specifically, but it sounds as if it's not cleaning up after itself. I'd dial down the connection wait timeout. Default is a ridiculous number like 6 hours. Seems to me you could probably get away with something more like 5 minutes unless WP requires a constantly open connection. 

Thanks, we'll add that to the list. We're down to throwing poop against the wall at this point and hoping something sticks.

trumant (Forum Supporter)
trumant (Forum Supporter) GRM+ Memberand Reader
8/1/20 2:13 p.m.

Same operating system version, package versions and same kernel parameters?

trumant (Forum Supporter)
trumant (Forum Supporter) GRM+ Memberand Reader
8/1/20 2:14 p.m.

Also where is the MySQL dB running? On the same host as WP or on RDS? What size/instance type EC2 instance?

trumant (Forum Supporter)
trumant (Forum Supporter) GRM+ Memberand Reader
8/1/20 2:19 p.m.

Asked about kernel parameters because it could be something as simple as hitting a maxconns limit. For example, see https://levelup.gitconnected.com/linux-kernel-tuning-for-high-performance-networking-high-volume-incoming-connections-196e863d458a?gi=b4ebf8a15766

slefain
slefain PowerDork
8/1/20 2:22 p.m.

All excellent questions I will try to find out Trumant. My friend sent me a screenshot of the server dashboard when it croaked:

 

trumant (Forum Supporter)
trumant (Forum Supporter) GRM+ Memberand Reader
8/1/20 2:27 p.m.

While you are checking settings you should also check if you are configured to run MySQL connection pooling or not on the WP/PHP side of things and if so what your connection pool limit is set to.

GameboyRMH
GameboyRMH GRM+ Memberand MegaDork
8/1/20 7:08 p.m.

I'd be as confused as you are but Keith's idea of decreasing wait_timeout is a good one. Even 5 minutes is plenty, I'd think the optimal setting would be just above your PHP execution time limit, although that could break any query browser tools that use one long-running connection. Of course that's just treating the symptom of something not cleaning up after itself, but it could work.

ojannen
ojannen GRM+ Memberand Reader
8/1/20 7:28 p.m.

Do you have an idea of timeouts or percentage of error responses?  Does the increased db load coincide with increased server load?

slefain
slefain PowerDork
8/1/20 8:12 p.m.

Changing the timeout to 15 seconds didn't work, connections still spiked.

He's going to rebuild the server again, but change server types.

I know jack-all about modern web servers, but my Google Fu is strong so I've been trying to help my friend noodle this one out. I'll report back when the new server build is tested.

You'll need to log in to post.

Our Preferred Partners
Eq2pSc6AgNx3HoFd6VOiF7aft80gP2gTGoMdge7byW2g6PT78xw9NhmIoQqS27fQ