Diff’ing Data Needles Across 2 MySQL Haystacks

Recently I needed to compare a specific set of tables in two MySQL databases on a remote system where I did not have Admin privileges (but DB user credentials). Both databases had the same structure, just some different data. I didn’t want to see all the differences in all the tables because there were probably hundreds of them. Not having access to the server logs, I was considering doing some brute-force regex-driven diff of dump files or perhaps a complex SQL query. I knew there had to be a better way.

After fishing around I found a relatively stand-alone tool (Perl dependency, not much else) that could be set up and used quickly on a non-Admin *nix account: mysql_coldiff. It’s easy-to-use, well-documented, and most-importantly, scriptable..

Since I had about a dozen tables that I wanted to compare I wrote a wrapper script to automate the process and produce a nice report of the differences.

#!/bin/bash
DIFF_SCRIPT="$HOME/mysql_coldiff-1_0/mysql_coldiff"

db4credentials="$1"
query_list="$2"
out_file="$3"

if [ ! -e "$query_list" -o -z "$out_file" ]; then
	echo 'usage: '`basename $0`' <db4credentials> <query_list_file> <out_file>'
	echo '   db: seems only 1 set of credentials can be used for both DBs'
	echo '   query_list_file: list of tables to diff. 1st 2 lines are DB names'
	echo '      remaining lines have 3 fields per record:'
	echo '      <table_name> <id_column_name>  [IGNORE]'
	exit 1
else
	db1=`sed -ne '1 p' "$query_list"`
	db2=`sed -ne '2 p' "$query_list"`
fi

read -p "username for $db4credentials: " us
read -p "password for $db4credentials: " ps

awk '
	($0!="" && $NF!="IGNORE" && NR>2){
		gsub(/[\047"]/,"",$0) # strip devious chars for now
		print "------------------------------\n" $1 "\n------------------------------"
		cmd="'$DIFF_SCRIPT' -h localhost -u '$us' -p '$ps' -i " $2 " -n '$db1'." $1 " '$db2'." $1
		system(cmd);
		close(cmd)
}' "$query_list" > "$out_file"

This strategy seems to work best in situations where differences are minimal (but important). Case-in-point (or bird in hand?), I quickly found the modified field values in a table I had been looking for and changed them with phpMyAdmin. Here is the report I used to find those diffs (output condensed where “…” appears):

...
------------------------------
eav_attribute
------------------------------
We're comparing the the_database_number_1.eav_attribute using the attribute_id column and the database_number2.eav_attribute using the attribute_id column.
...
------------------------------
eav_attribute_group
------------------------------
...
------------------------------
eav_attribute_option
------------------------------
...
------------------------------
eav_entity_text
------------------------------
We're comparing the the_database_number_1.eav_entity_text using the value_id column and the database_number2.eav_entity_text using the value_id column.
...
------------------------------
eav_entity_type
------------------------------
We're comparing the the_database_number_1.eav_entity_type using the entity_type_id column and the database_number2.eav_entity_type using the entity_type_id column.
...
+---------------------------------------+----------------+------------------------------+
|                                       | entity_type_id |              increment_model |
| the_database_number_1.eav_entity_type +----------------+------------------------------+
|_______________________________________|             11 | eav/entity_increment_numeric |
|                                       |             11 | eav/entity_increment_alphabetic
|      database_number2.eav_entity_type +----------------+------------------------------+
|                                       | entity_type_id |              increment_model |
+---------------------------------------+----------------+------------------------------+
...

Gotchas?

  • Had to use the same login credentials for both DBs. May be possible to use 2 sets?
  • Not sure if mysql_coldiff handles tables where the primary key is not a single identity column, i.e.; tables that use a composite primary key. From glancing over the mysql_coldiff documentation I think it may not matter too much what you choose as the “index column”, but I figured it probably did, that’s why I added the “IGNORE” param to the input file for my wrapper script when I found that one of the tables I was hoping to compare had a composite primary key.
  • There’s always ways to further investigate certain problems and solve them better. But if it ain’t broke neither are you.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

* Copy This Password *

* Type Or Paste Password Here *

14,781 Spam Comments Blocked so far by Spam Free Wordpress