The Second Order Effect

Archive for March, 2010

Twitter Weekly Updates for 2010-03-28

by on Mar.28, 2010, under Uncategorized

Tasks keeping me away from what I’d rather be doing this past few days :

Did some updates on tardis, hygiene our general purpose linux server (all the servers have names from Dr. Who) – and updated the Dell OpenManage tools. Reboot. Simple, allergy right?

The machine responded to pings, viagra 100mg and nothing else.

It was 2pm. Traffic in LA becomes a nightmare somewhere around 3:30pm. Jumped in the car to crash cart the server (because the tech on site didn’t know what a crash cart was and when I described it – said ‘that sounds complicated.’ Not happy with our colo right now.)

Arrived to find the server paused — not hung, just waiting, on a “Starting openwsmand…” message. Turns out, there’s a new service in the latest version of OpenManage that requires a new OpenSSL certificate for functioning. This, on it’s own, is okay – but two problems :

  1. The cert is generated by the boottime init.d script if it’s missing.
  2. The cert is generated using /dev/random as the entropy source

The latter is a security concern, but should never be paired with the top. I’ve had servers hang for hours waiting for /dev/random to generate enough entropy.

This was easily enough fixed, but the stress of driving like a madman, (and back) to make it before traffic shut down all routes to West LA was something I could do without. I’m going to patch against the project and see if they’ll fix this ridiculous behavior, and I’ve started looking into a remote KVM solution like kvm2ethernet – just call the colo and ask them to plug into a particular server. Thanks to this post for cutting the debugging time massively.


The other issue was that customers weren’t able to sign up for bits of today because there was a lock residing on one of our DB tables. We purge our Database (about 50G) monthly, but the cruft of leftover billing records take up huge amounts of space and deleting them can be a problem – long table scans, and Microsoft SQL Server does a table lock – and that’s the ball game. The credit card server can’t track that a valid charge was placed, so it terminates instead. We started with a query from our Marketer/Data Analysis guy looking like :


DELETE from billing where
and start_date_time >= '18-JUL-2009'
and start_date_time < '24-JUL-2009'
and node_type not in (3,4,5)

(He wanted to delete up until 30-AUG, but was slicing it up in the hopes of avoiding this problem.)

The above is about 300,000 rows.

A few issues, however :

  1. billing has a clustered index around account_id. This makes perfect sense, the data is almost always referenced with respect to a particular customer, and ensuring that those records are all adjacent to each other on storage is common sense. However, the above query would be running all around the 9Gb table removing rows.
  2. billing doesn't have an index on node_type, meaning each row has to be fetched before it can be selected for deletion.

The latter is less of a problem if the select and the delete were separated, but the table lock existed throughout the query.

In looking to solve it, one approach was to force MSSQL to use ROWLOCK (and disable escalation from ROWLOCK to TABLELOCK), but this was going to be a performance hit. I considered trying NOLOCK, but I wasn't sure of what the ramifications would be, and I really didn't want to spend hours fixing a crash database/corrupted data.

The final solution was to carve up the deletes into more manageable bits. SQL Server Interactive can be set to execute on a limited number of rows - so we could delete, say, 1000 rows, pause (let other things have access to the table), and then continue. And, now that we know about the clustering, why make the DELETE run across the entire table - why not allow it to trim one section at a time? And thusly we have...
(I'm not really familiar with SQL Server Syntax, so this is a first effort. It's lousy code, but a decent query)


DECLARE @ACCOUNTSTEP int
DECLARE @WINDOWMIN int
DECLARE @WINDOWMAX int
DECLARE @ACCOUNTMAX int
DECLARE @WINDOWCOUNT int
DECLARE @STEPS int

SET @STEPS = 20

SET @ACCOUNTMAX = (select max(account_id) from accounts)
SET @ACCOUNTSTEP = (ceiling(@ACCOUNTMAX * 1/@STEPS))
SET @WINDOWMAX=0
SET @WINDOWCOUNT=0
PRINT CAST(@STEPS AS CHAR(10)) + '+1 Steps of Size ' + CAST(@ACCOUNTSTEP AS CHAR(10)) + ' leading up to ' + CAST(@ACCOUNTMAX AS CHAR(10))
SET ROWCOUNT 2000

moreaccounts:
SET @WINDOWCOUNT = @WINDOWCOUNT + 1
SET @WINDOWMIN = @WINDOWMAX
SET @WINDOWMAX = @ACCOUNTSTEP * @WINDOWCOUNT
IF @WINDOWMIN > @ACCOUNTMAX GOTO done
PRINT 'Now processing accounts between ' + CAST(@WINDOWMIN as char(10)) + ' and ' + CAST(@WINDOWMAX as char(10))
deletemore:
PRINT 'Deleting 2000 rows.'
waitfor delay '0:0:01'
DELETE from billing where account_id >= @WINDOWMIN and account_id < @WINDOWMAX
and start_date_time >= '18-JUL-2009'
and start_date_time < '30-AUG-2009'
and node_type not in (3,4,5)
if @@ROWCOUNT > 0 GOTO deletemore
PRINT 'Done with this set. Sleeping.'
waitfor delay '0:0:03'
GOTO moreaccounts
done:

Which effectively breaks out to :

  • Get the total number of accounts
  • For every twentieth window of accounts
    • Delete 2000 rows. Wait a second. Repeat until all rows gone.
  • Wait three seconds.

Ran the query, no locking issues at all, Table purged in about an hour. Victory!


I wanted to get Zabbix monitoring working for our Asterisk boxen - I went a bit overboard, and resulted in zasterisk.

Powered by Twitter Tools

Powered by Twitter Tools

  • Somehow I'm missing my house keys, medical
    my MLS keyfob, my favorite scarf and a pair of jeans. Last seen on Thurs.In Boston. I think I'm going mad #
  • Feel a little bit like the Doctor. I have a fab coat, a companion, and am exploring a strange land. (Vegas) #
  • @egradman Skype uses wideband codecs past the 8khz range that the PSTN supports. It's not your carrier or iPhone sucks,it's that phones suck in reply to egradman #
  • I have yet to find an ailment that can't be mended by 80s clubbing in New Orleans. And absinthe. #
  • Frenchman St in NOLA suffering a power outage. Live music and bartenders just kept on going, barely skipping a beat. Love it here. #
  • @stevenswei was in Vegas as an overnight stay on my way to NOLA. We should hang out next week whenever I finally decide to come home. in reply to stevenswei #
  • @Colleenky yay! in reply to Colleenky #

Powered by Twitter Tools

Leave a Comment : more...

Twitter Weekly Updates for 2010-03-21

by on Mar.21, 2010, under Uncategorized

Tasks keeping me away from what I’d rather be doing this past few days :

Did some updates on tardis, hygiene our general purpose linux server (all the servers have names from Dr. Who) – and updated the Dell OpenManage tools. Reboot. Simple, allergy right?

The machine responded to pings, viagra 100mg and nothing else.

It was 2pm. Traffic in LA becomes a nightmare somewhere around 3:30pm. Jumped in the car to crash cart the server (because the tech on site didn’t know what a crash cart was and when I described it – said ‘that sounds complicated.’ Not happy with our colo right now.)

Arrived to find the server paused — not hung, just waiting, on a “Starting openwsmand…” message. Turns out, there’s a new service in the latest version of OpenManage that requires a new OpenSSL certificate for functioning. This, on it’s own, is okay – but two problems :

  1. The cert is generated by the boottime init.d script if it’s missing.
  2. The cert is generated using /dev/random as the entropy source

The latter is a security concern, but should never be paired with the top. I’ve had servers hang for hours waiting for /dev/random to generate enough entropy.

This was easily enough fixed, but the stress of driving like a madman, (and back) to make it before traffic shut down all routes to West LA was something I could do without. I’m going to patch against the project and see if they’ll fix this ridiculous behavior, and I’ve started looking into a remote KVM solution like kvm2ethernet – just call the colo and ask them to plug into a particular server. Thanks to this post for cutting the debugging time massively.


The other issue was that customers weren’t able to sign up for bits of today because there was a lock residing on one of our DB tables. We purge our Database (about 50G) monthly, but the cruft of leftover billing records take up huge amounts of space and deleting them can be a problem – long table scans, and Microsoft SQL Server does a table lock – and that’s the ball game. The credit card server can’t track that a valid charge was placed, so it terminates instead. We started with a query from our Marketer/Data Analysis guy looking like :


DELETE from billing where
and start_date_time >= '18-JUL-2009'
and start_date_time < '24-JUL-2009'
and node_type not in (3,4,5)

(He wanted to delete up until 30-AUG, but was slicing it up in the hopes of avoiding this problem.)

The above is about 300,000 rows.

A few issues, however :

  1. billing has a clustered index around account_id. This makes perfect sense, the data is almost always referenced with respect to a particular customer, and ensuring that those records are all adjacent to each other on storage is common sense. However, the above query would be running all around the 9Gb table removing rows.
  2. billing doesn't have an index on node_type, meaning each row has to be fetched before it can be selected for deletion.

The latter is less of a problem if the select and the delete were separated, but the table lock existed throughout the query.

In looking to solve it, one approach was to force MSSQL to use ROWLOCK (and disable escalation from ROWLOCK to TABLELOCK), but this was going to be a performance hit. I considered trying NOLOCK, but I wasn't sure of what the ramifications would be, and I really didn't want to spend hours fixing a crash database/corrupted data.

The final solution was to carve up the deletes into more manageable bits. SQL Server Interactive can be set to execute on a limited number of rows - so we could delete, say, 1000 rows, pause (let other things have access to the table), and then continue. And, now that we know about the clustering, why make the DELETE run across the entire table - why not allow it to trim one section at a time? And thusly we have...
(I'm not really familiar with SQL Server Syntax, so this is a first effort. It's lousy code, but a decent query)


DECLARE @ACCOUNTSTEP int
DECLARE @WINDOWMIN int
DECLARE @WINDOWMAX int
DECLARE @ACCOUNTMAX int
DECLARE @WINDOWCOUNT int
DECLARE @STEPS int

SET @STEPS = 20

SET @ACCOUNTMAX = (select max(account_id) from accounts)
SET @ACCOUNTSTEP = (ceiling(@ACCOUNTMAX * 1/@STEPS))
SET @WINDOWMAX=0
SET @WINDOWCOUNT=0
PRINT CAST(@STEPS AS CHAR(10)) + '+1 Steps of Size ' + CAST(@ACCOUNTSTEP AS CHAR(10)) + ' leading up to ' + CAST(@ACCOUNTMAX AS CHAR(10))
SET ROWCOUNT 2000

moreaccounts:
SET @WINDOWCOUNT = @WINDOWCOUNT + 1
SET @WINDOWMIN = @WINDOWMAX
SET @WINDOWMAX = @ACCOUNTSTEP * @WINDOWCOUNT
IF @WINDOWMIN > @ACCOUNTMAX GOTO done
PRINT 'Now processing accounts between ' + CAST(@WINDOWMIN as char(10)) + ' and ' + CAST(@WINDOWMAX as char(10))
deletemore:
PRINT 'Deleting 2000 rows.'
waitfor delay '0:0:01'
DELETE from billing where account_id >= @WINDOWMIN and account_id < @WINDOWMAX
and start_date_time >= '18-JUL-2009'
and start_date_time < '30-AUG-2009'
and node_type not in (3,4,5)
if @@ROWCOUNT > 0 GOTO deletemore
PRINT 'Done with this set. Sleeping.'
waitfor delay '0:0:03'
GOTO moreaccounts
done:

Which effectively breaks out to :

  • Get the total number of accounts
  • For every twentieth window of accounts
    • Delete 2000 rows. Wait a second. Repeat until all rows gone.
  • Wait three seconds.

Ran the query, no locking issues at all, Table purged in about an hour. Victory!


I wanted to get Zabbix monitoring working for our Asterisk boxen - I went a bit overboard, and resulted in zasterisk.

Powered by Twitter Tools

Powered by Twitter Tools

Leave a Comment : more...

Twitter Weekly Updates for 2010-03-14

by on Mar.14, 2010, under Uncategorized

this is a repost from my old site.

I’ve created RPMs for RHEL 5.0, cure 5.1, shop 5.2 and 5.3, pills as well as CentOS 5.0, 5.1 and 5.2 for Python 2.5 and 2.6.

The RHEL/CentOS line only supports Python 2.4, and as it’s an integral part of yum and many other packages, any installation of new Pythons will require parallel installation.

(Note : Currently, the 2.5 and 2.6 RPMs cannot both be installed without an RPM –force)

TODO : Repository URL & GPG signing

As for the RPMs themselves :

Download

Caveats

Neither of these packages fully pass the Python test suite (import test.autotest).

That being said, we’ve been using the Python 2.5 package, released by Bashton, in production for a while now, and we haven’t seen any problems.

4 tests failed: test_distutils test_email test_email_renamed test_nis

I don’t know about email, but there are patches for CentOS/Fedora against distutils, so I expect that’s why it fails the test.

For Python 2.6, note the following :

  • I had to gimp autoconf’s version detection to force it to use 2.59. This may have unintended consequences
  • This RPM uses patches from the Fedora Core 11 Alpha release. (python-2.6-4.fc11 specifically). As Fedora updates this package, I’ll release updates. (Feel free to email to remind.)

Usage

Just install the RPMs for the version that you want. You’ll need at least the base python version package and the libs package.
To start Python 2.6, type python26 at your commandline rather than python. (Your original Python 2.4 is still installed.)
If you’re installing packages with setuptools, make sure to use the correct python version. (i.e. python26 setup.py install)
If you’d like to build an x86_64, call rpmbuild –rebuild with the appropriate arch setting.

Powered by Twitter Tools

Powered by Twitter Tools

Powered by Twitter Tools

  • Anyone *love* their 3G data plan? Turns out AT&T requires a 2yr commitment, viagra
    even if I use old equipment – not interested… #
  • Loving Intercon J – Setting up the set for Limbo now, then gaming all day! #
  • Man, I really want to sit down and write a game now. #
  • I'm thankful that the challenges in my life stem from an abundance of opportunities/ideas/directions rather than a lack of them. #
  • @randbot man, I thought you were referring to something that happened in game. in reply to randbot #

Powered by Twitter Tools

Leave a Comment : more...

Twitter Weekly Updates for 2010-03-07

by on Mar.07, 2010, under Uncategorized

this is a repost from my old site.

I’ve created RPMs for RHEL 5.0, cure 5.1, shop 5.2 and 5.3, pills as well as CentOS 5.0, 5.1 and 5.2 for Python 2.5 and 2.6.

The RHEL/CentOS line only supports Python 2.4, and as it’s an integral part of yum and many other packages, any installation of new Pythons will require parallel installation.

(Note : Currently, the 2.5 and 2.6 RPMs cannot both be installed without an RPM –force)

TODO : Repository URL & GPG signing

As for the RPMs themselves :

Download

Caveats

Neither of these packages fully pass the Python test suite (import test.autotest).

That being said, we’ve been using the Python 2.5 package, released by Bashton, in production for a while now, and we haven’t seen any problems.

4 tests failed: test_distutils test_email test_email_renamed test_nis

I don’t know about email, but there are patches for CentOS/Fedora against distutils, so I expect that’s why it fails the test.

For Python 2.6, note the following :

  • I had to gimp autoconf’s version detection to force it to use 2.59. This may have unintended consequences
  • This RPM uses patches from the Fedora Core 11 Alpha release. (python-2.6-4.fc11 specifically). As Fedora updates this package, I’ll release updates. (Feel free to email to remind.)

Usage

Just install the RPMs for the version that you want. You’ll need at least the base python version package and the libs package.
To start Python 2.6, type python26 at your commandline rather than python. (Your original Python 2.4 is still installed.)
If you’re installing packages with setuptools, make sure to use the correct python version. (i.e. python26 setup.py install)
If you’d like to build an x86_64, call rpmbuild –rebuild with the appropriate arch setting.

Powered by Twitter Tools

Powered by Twitter Tools

Powered by Twitter Tools

Leave a Comment : more...

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...