...and we're back

Ok we've made a change to a problematic query so can people let me know if things are performing better? It was specifically related to posting, so post away in here to your hearts content and let me know if you see any more errors.

Fingers crossed...

Fingers (and legs) crossed too!
 
Ok testing, and will edit this too.


EDIT: Helloooooo
 
It seems to have speeded up to usual speed in the last few seconds.

Speed seems to fluctuate from a few seconds to immediate.



EDIT (about 5-10 minutes later): Better but still not running smoothly.
 
Test bazillion and two

EDIT: bazillion and three
 
I notice that the Post Quick Reply dialogue box is still reacting temperamentally,
 

I tried to post using the quick reply but all it did was the timer clock that sometimes appears to the left of the button kept turning without actually posting.

I also noticed that sometimes when I use the quick reply button the posting goes ahead but without a post actually appearing.


Hope I'm not leading you up the wrong path but I have experienced that in the last 30mins or so.
 
Ok I'll keep an eye on it. We were tweaking things a bit over the last 30 minutes so that might have caused it, but we're done for the night now. If it keeps happening to you over the next few hours, let me know.
 
I'll miss the duplicate posts, specially the guy in the Sneijder thread who insisted on letting RHD post but only if carefully moderated.
 
Well done Niall, as someone who also works with damned databases and transfers, I'm glad to hear things are back up and running!

I actually got a lot work done in the last few days. I feel dirty.
 
Yep a brief hiccup this morning but it wasn't the site - the data centre was having some networking issues. The site should be solid from here on in.
 
Yep a brief hiccup this morning but it wasn't the site - the data centre was having some networking issues. The site should be solid from here on in.

Had no problems so far this morning

Cheers Niall
 
Yep a brief hiccup this morning but it wasn't the site - the data centre was having some networking issues. The site should be solid from here on in.

12254a885fadccc3b.gif
 
Just got the "She canee take the strain capt'n!" message a few times.
 
What exactly were you trying to do when you got them?
 
I don't know what Solius was trying to do, but I instead of 'Mark forums read' I clicked on 'Wiki' and it doesn't work (she canee take the strain capt'n!)

Not a big problem I guess, the most important is we've got forum back.
 
kin' hell, i dont post a lot here, but I am on this site almost all the time.... it was horrible at work without redcafe.
 
Oh yeah, the wiki hasn't been fully setup after the maintenance. Forgot to mention that :)
 
I'd say the canee take the strain is genuinely from influx of people making up for lost time.
 
I also just got it, after clicking on General Forum.
 
I just got it. You can only take a guess as to why though as we don't know how the system architecture is set up. It could be that the php code is getting impatient for a reply from the DB server, at least it seems that way to me, as the message comes up pretty quickly - not like say a 10 second wait.
 
After a ridiculously long stretch of downtime, we have successfully resolved a hardware fault with the database server and rolled back the database to a backup from approx midnight on Saturday. Obviously this means everything posted since then has been lost, but given the amount of data corruption in the database, it was the safest option.

I can't apologise enough for this downtime. It has been a hugely frustrating couple of days. What should have been a relatively straight forward move of server hardware from one data centre to another turned into a bit of a nightmare.

Hopefully now everything is back to normal and we can forget this ever happened ;) As ever, you guys are my best eyes and ears so if you come across any odd behaviour - error pages, double posts etc - please post here and I'll investigate. The more detail you can provide about the problem the better.

Thanks again for your patience. Hopefully this is the last time we have such a lengthy outage.

Update:
So I spoke to soon. Obviously we're still having a lot of problems, error pages, double posts. We're actively looking into it and hopefully a solution is not far away.

Update 2:
I think we've tracked down the problem, a rogue database query that was using the wrong index which lead to long running queries that would eventually block all the database threads. As ever, can people please let me know if things are working better now - specifically when it comes to posting and editing posts. Thanks.
Thanks for the great work you and your team do Niall.
 
I actually thought I might have broke the caf. I was deleted, and then when the back up was done and the caf came back, I posted and it broke again!
 
I just got it. You can only take a guess as to why though as we don't know how the system architecture is set up. It could be that the php code is getting impatient for a reply from the DB server, at least it seems that way to me, as the message comes up pretty quickly - not like say a 10 second wait.

Do you remember what you were doing when you got the error Weaste?

The 'canee take the strain' error pages come from Varnish - will dig into the logs and see if they reveal anything.