Tag Archives: List

Maintaining A Drop List Database

In Part 1: What You Need to Build A Drop List, we discussed the costs, memberships, and other requirements you would need to build a drop list from the zone file. In Part 2: How To Build A Drop List, we discussed how to actually populate the database.

Now in Part 3: Maintaining A Drop List Database, we will look into how you will update the database, remove old names, add new names and keep it accurate and up to date.

Drop List Maintenance

Now with the recently completed drop list, you need to maintain the database. Roughly 15,000 to 17,000 domains expire on any given day. Around 5,000 to 7,000 are no longer required allowed to drop per day. On average 5,000+ new domains are created per day, which are a mixture of all extensions and recently released. Around 100 to 200 of these are drop caught on the day. Around 150 are registered during the following week as various people paw over the drop lists, expired name lists and other such lists.

All of these need scanning, updating and / or removing from the database on a rolling basis as well the month of old data accrued during initial population.

You can see how quickly the data will become stale and unwieldy.

Remove The Old Names
There are 2 ways to remove up to 200,000 expired names from the previous month of dropped domains, while you’ve been building the database and of course filling it up.

Option 1, Create a small application which scans any dates older than today (now()), and runs them against the DAC. It then (1) removes any unregistered names, and (2) updates the dropdate on the rest (an tag/etc). This uses a wedge of your DAC quota, which isn’t good if you’re planning to drop catch as well.

A simple query like…

SELECT domain FROM 'zonefile'.'droplist' WHERE 'dropdate' < NOW();

You may need to include an additional where statement to check the last updated field, and if its been updated in the last 2-4 weeks, don’t scan it.

SELECT domain FROM 'zonefile'.'droplist' WHERE 'dropdate' < NOW() and 'updated' > NOW() + INTERVAL 30 DAY;

You can change 30 Day to any number you wish, but it will stop you rescanning the same names before time.  Another alternative is to include a “rolling” and a “maintained” flag, so you can run full database scans and daily cleans.

It may also be worth adding an “ignore” boolean flag, for names like .sch.uk and .ltd.uk and other Nominet owned domains so you don’t keep scanning them. While we’re modifying the data a base, a “pastdue” flag may help, which is for domains which haven’t dropped in sequence.

I have updated the database to look like…

CREATE TABLE IF NOT EXISTS `nzf_demo` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `domain` varchar(128) NOT NULL,
  `tag` varchar(32) NOT NULL,
  `dropdate` date NOT NULL,
  `daily` tinyint(1) NOT NULL,
  `periodical` tinyint(1) NOT NULL,
  `pastdue` tinyint(1) NOT NULL,
  `ignored` tinyint(1) NOT NULL,
  `added` date NOT NULL,
  `updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  UNIQUE KEY `domain` (`domain`),
  KEY `id` (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;

Option 2, You could perform a “DIFF” between the original file you uploaded, and download the latest zone file to compare them. The result could then be run as a query to remove those present in the old file but not present in the new one from the database.

This would leave you with around 5,000-7,000 names which have been caught and/or registered containing old data, which would need manual scanning using option 1.

I would use Linux Shell Diff for this job, but I have linked to a PHP variant since that’s my chosen language for these articles.

Along with the 200,000 removed, there would be 200,000 newly created names, which could be found with a reverse “diff” of the above, or by repeating the loading names into the database with the new file. Again they need scanning, this would bring your database bang up to date.

This is easy on the dac, but increases server load, and complexity, I personally would use option 1. This way when you have a non-catch day you an pound the database and bring it up to date.

 

Rolling Maintenance

Every day, you would need to perform the above on a smaller scale to keep up to date. I would also suggest scanning a date approx 1 week, 1 month and 90 days ahead. On a rolling basis this makes your database easier to manage and control.

Today, when the article were wrote not published, there were:

18,929 moving from Registered Until Renewal onto Renewal Required (92 days from drop).
15,805 names moving from Renewal Required to Suspended (60 days from drop).
10,203 domains 30 days between Dropping and Suspended.
7,142 domains, 7 Days away Dropping.
6,310 domains moving from Suspended to Dropping Today.

If you don’t perform rolling maintenance, your daily drop lists would contain  up to 13,000 renewed domains every day, making it harder for you to find gold as you look further ahead.

This rolling script can be made using my Query Script or the Script made part 2.

PHP Switch Script

As an upgrade to the above scans, you could use the script from part 2, and add a php switch, to modulate the query results and the MySQL updates, so instead of having 3-4 small apps, you could have 1 app with a switch performing the same job

function actionQuery($actn){
	switch ($actn) {
		case "history":
			$query= "SELECT domain FROM  `zonefile` WHERE dropdate <= CURDATE( ) - INTERVAL 1 DAY ORDER BY `zonefile`.`dropdate`;";
			break;
		case "yesterday":
			$query = "SELECT domain FROM `zonefile` WHERE 'dropdate' = curdate() - interval 1 day order by 'expiry';";
			break;		
		case "month":
			$query = "SELECT domain FROM `zonefile` WHERE 'dropdate' = curdate() + interval 30 day order by 'expiry';";
			break;
		case "dropday":
			$query = "SELECT domain FROM `zonefile` WHERE 'dropdate' = curdate() + interval 92 day order by 'expiry';";
			break;
		default:
			$query = "DEFAULT query as you wish";
	}
	return $query;
}

Stripping Out

Whatever method you use above, it will you’ll need to write more to the database… I would set a daily cron to load between 12:30am – 1:00am, and use for example…

SELECT domain FROM  `zonefile` WHERE 'dropdate' <= CURDATE( ) - INTERVAL 1 DAY ORDER BY `zonefile`.`dropdate` AND 'updated' <= CURDATE() - INTERVAL 14 day LIMIT 5000;";

This will select 5,000 domains, with a drop date older than today but not updated in the last 2 week, you could add an “AND “ignored” != “1”” to the query as well.

A few crons to run on various sub-selections, on the +7, +30 and +90, this will use about 15-20,000 queries per day. A top tip here is, the further into the future you go, the longer the scan date you use. So scanning at +90 days, you need your last scanned to be 70 days, at 30 days you want 40 days, and 7 days you want 20 days. Using catch-free days means there will lots of days where you’re scanning only a few thousand per day.

The Script made in the previous part or using my Dac Query script can be adapted to make this. I think I will post some code in the summery post, if your totally lost.

This whole process including coding, database design, would take approx 30-35 days depending on your efficiency, resources and planning.

This maintenance script can be made using my Query Script or the Script made part 3.

Accessing Your Drop Lists

I will go into more detail in the 4th part about “Accessing Your Drop Lists”.

I were going to include this in this article, but it ended up yet again longer than I expected. This started out as a single article, now its looking like 5 parter, possibly 6 if I put some code together.