The Second Order Effect

Twitter Weekly Updates for 2010-03-28

by on Mar.28, 2010, under Uncategorized

Tasks keeping me away from what I’d rather be doing this past few days :

Did some updates on tardis, hygiene our general purpose linux server (all the servers have names from Dr. Who) – and updated the Dell OpenManage tools. Reboot. Simple, allergy right?

The machine responded to pings, viagra 100mg and nothing else.

It was 2pm. Traffic in LA becomes a nightmare somewhere around 3:30pm. Jumped in the car to crash cart the server (because the tech on site didn’t know what a crash cart was and when I described it – said ‘that sounds complicated.’ Not happy with our colo right now.)

Arrived to find the server paused — not hung, just waiting, on a “Starting openwsmand…” message. Turns out, there’s a new service in the latest version of OpenManage that requires a new OpenSSL certificate for functioning. This, on it’s own, is okay – but two problems :

  1. The cert is generated by the boottime init.d script if it’s missing.
  2. The cert is generated using /dev/random as the entropy source

The latter is a security concern, but should never be paired with the top. I’ve had servers hang for hours waiting for /dev/random to generate enough entropy.

This was easily enough fixed, but the stress of driving like a madman, (and back) to make it before traffic shut down all routes to West LA was something I could do without. I’m going to patch against the project and see if they’ll fix this ridiculous behavior, and I’ve started looking into a remote KVM solution like kvm2ethernet – just call the colo and ask them to plug into a particular server. Thanks to this post for cutting the debugging time massively.


The other issue was that customers weren’t able to sign up for bits of today because there was a lock residing on one of our DB tables. We purge our Database (about 50G) monthly, but the cruft of leftover billing records take up huge amounts of space and deleting them can be a problem – long table scans, and Microsoft SQL Server does a table lock – and that’s the ball game. The credit card server can’t track that a valid charge was placed, so it terminates instead. We started with a query from our Marketer/Data Analysis guy looking like :


DELETE from billing where
and start_date_time >= '18-JUL-2009'
and start_date_time < '24-JUL-2009'
and node_type not in (3,4,5)

(He wanted to delete up until 30-AUG, but was slicing it up in the hopes of avoiding this problem.)

The above is about 300,000 rows.

A few issues, however :

  1. billing has a clustered index around account_id. This makes perfect sense, the data is almost always referenced with respect to a particular customer, and ensuring that those records are all adjacent to each other on storage is common sense. However, the above query would be running all around the 9Gb table removing rows.
  2. billing doesn't have an index on node_type, meaning each row has to be fetched before it can be selected for deletion.

The latter is less of a problem if the select and the delete were separated, but the table lock existed throughout the query.

In looking to solve it, one approach was to force MSSQL to use ROWLOCK (and disable escalation from ROWLOCK to TABLELOCK), but this was going to be a performance hit. I considered trying NOLOCK, but I wasn't sure of what the ramifications would be, and I really didn't want to spend hours fixing a crash database/corrupted data.

The final solution was to carve up the deletes into more manageable bits. SQL Server Interactive can be set to execute on a limited number of rows - so we could delete, say, 1000 rows, pause (let other things have access to the table), and then continue. And, now that we know about the clustering, why make the DELETE run across the entire table - why not allow it to trim one section at a time? And thusly we have...
(I'm not really familiar with SQL Server Syntax, so this is a first effort. It's lousy code, but a decent query)


DECLARE @ACCOUNTSTEP int
DECLARE @WINDOWMIN int
DECLARE @WINDOWMAX int
DECLARE @ACCOUNTMAX int
DECLARE @WINDOWCOUNT int
DECLARE @STEPS int

SET @STEPS = 20

SET @ACCOUNTMAX = (select max(account_id) from accounts)
SET @ACCOUNTSTEP = (ceiling(@ACCOUNTMAX * 1/@STEPS))
SET @WINDOWMAX=0
SET @WINDOWCOUNT=0
PRINT CAST(@STEPS AS CHAR(10)) + '+1 Steps of Size ' + CAST(@ACCOUNTSTEP AS CHAR(10)) + ' leading up to ' + CAST(@ACCOUNTMAX AS CHAR(10))
SET ROWCOUNT 2000

moreaccounts:
SET @WINDOWCOUNT = @WINDOWCOUNT + 1
SET @WINDOWMIN = @WINDOWMAX
SET @WINDOWMAX = @ACCOUNTSTEP * @WINDOWCOUNT
IF @WINDOWMIN > @ACCOUNTMAX GOTO done
PRINT 'Now processing accounts between ' + CAST(@WINDOWMIN as char(10)) + ' and ' + CAST(@WINDOWMAX as char(10))
deletemore:
PRINT 'Deleting 2000 rows.'
waitfor delay '0:0:01'
DELETE from billing where account_id >= @WINDOWMIN and account_id < @WINDOWMAX
and start_date_time >= '18-JUL-2009'
and start_date_time < '30-AUG-2009'
and node_type not in (3,4,5)
if @@ROWCOUNT > 0 GOTO deletemore
PRINT 'Done with this set. Sleeping.'
waitfor delay '0:0:03'
GOTO moreaccounts
done:

Which effectively breaks out to :

  • Get the total number of accounts
  • For every twentieth window of accounts
    • Delete 2000 rows. Wait a second. Repeat until all rows gone.
  • Wait three seconds.

Ran the query, no locking issues at all, Table purged in about an hour. Victory!


I wanted to get Zabbix monitoring working for our Asterisk boxen - I went a bit overboard, and resulted in zasterisk.

Powered by Twitter Tools

Powered by Twitter Tools

  • Somehow I'm missing my house keys, medical
    my MLS keyfob, my favorite scarf and a pair of jeans. Last seen on Thurs.In Boston. I think I'm going mad #
  • Feel a little bit like the Doctor. I have a fab coat, a companion, and am exploring a strange land. (Vegas) #
  • @egradman Skype uses wideband codecs past the 8khz range that the PSTN supports. It's not your carrier or iPhone sucks,it's that phones suck in reply to egradman #
  • I have yet to find an ailment that can't be mended by 80s clubbing in New Orleans. And absinthe. #
  • Frenchman St in NOLA suffering a power outage. Live music and bartenders just kept on going, barely skipping a beat. Love it here. #
  • @stevenswei was in Vegas as an overnight stay on my way to NOLA. We should hang out next week whenever I finally decide to come home. in reply to stevenswei #
  • @Colleenky yay! in reply to Colleenky #

Powered by Twitter Tools

:

Leave a Reply

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...