(This entire process requires an internet connection.)
cd isbnsearch/devel/scripts
./gateway.sh
bash ./asyncz.sh
mysql devel -p < ../sql/tables.sql
mysql devel -p < ../sql/gatewaynew.sql
mysql devel -p < ../sql/asyncz.sql
mysql devel -p < ../sql/servers.sql
duplicates.pl
export-servers.pl
serverfail.pl
servers.pl
verify.pl
perl duplicates.pl
The following sections describe how each script operates.
All scripts are intended to be run from the
isbnsearch/devel/scripts
directory.
The bash file gateway.sh
automates the update from the
LoC page. The majority of the output and work is using wget
. The
gateway page itself is downloaded first. The HTML is processed and compared
against a version of the previous htdocs/gateway.html
page, reduced
to just compare new servers, not name changes. The major omission in the data at
this stage is the database table name. The scritps/gateway.sh
script now uses wget
to connect to each new server using the
information from the gateway.html
page and requests the HTML query
page for each server. This HTML contains a DBNAME hidden input field that
contains the database name in the VALUE attribute. The
databasename.pl
perl script then processes the wget output
to generate this SQL output:
insert into gateway set name = "alice.mainlib.brandeis.edu", port = 9929,
z3950 = "MBB01";
Finally, gateway.sh
moves gateway.html
into
htdocs/
, overwriting the previous release copy. This will be the
base of the next comparison.
Because of the process, the new servers should be usable as new listings - servers that fail to initialise Z39.50 via the LoC gateway will fail to generate a database name.
As with the gateway servers, diff is too simplistic to find the new or updated servers in the AsyncZ list because a lot of changes are just name changes to the description field. AsyncZ has historically had a lot of duplicates, including some lines where the canonical name in one line resolves to the IP address listed in another line. If the port and z3950 names are also identical, this duplicate needs to be ignored by isbnsearch.
http://search.cpan.org/src/TURNERMM/Net-Z3950-AsyncZ-0.10/scripts/DB.txt
This process is not completely automated - the DB.txt file is accessible
over the internet but the URL contains the version number of the release and
this may need to be scripted to look for the newer releases, so the automation
may be part of the next devel release.
For v0.5, isbnsearch-devel provides a single bash script to process the
content once the DB.txt file is downloaded to the htdocs/ directory.
bash ./asyncz.sh
The script strips out the variable names and leaves just the vital server
data.
There are also formatting errors in scripts/DB.txt of the AsyncZ module where
fields are not correctly delimited with ; e.g.:
St.;Louis,;MO (INNOPAC) catalog.wustl.edu 210 INNOPAC
should be:
St. Louis, MO (INNOPAC) ;catalog.wustl.edu;210;INNOPAC
These errors will cause problems for verification and need to be ignored.
asyncz.sh
calls the Perl script asynczcheck.pl
to
iterate through the asyncz.result file and check each canonical name for a valid
DNS entry. The port number is checked to be in the range 0-100000. All entries
are converted to lower case to catch capitalisation duplication. The IP address
is then combined with the port number and database name to form the key to a
hash. The canonical name is stored as the value. This prevents duplication of
the same server. For records that do not contain a canonical name, the IP
address is combined with the port number and database name and the hash is
checked to see if this key value already exists. If not, the key is added with
the IP address as value.
Layout for gateway and asyncz tables:
CREATE TABLE gateway ( id int(10) unsigned NOT NULL auto_increment, info text, name varchar(100) default NULL, port varchar(10) default NULL, z3950 varchar(100) default NULL, error varchar(50) default '?', htmlid varchar(50) default NULL, interface varchar(50) default NULL, PRIMARY KEY (id) ) TYPE=MyISAM; CREATE TABLE asyncz ( id int(10) unsigned NOT NULL auto_increment, info text, name varchar(100) default NULL, port varchar(10) default NULL, z3950 varchar(100) default NULL, error varchar(50) default '?', htmlid varchar(50) default NULL, interface varchar(50) default NULL, PRIMARY KEY (id) ) TYPE=MyISAM;
Create these two tables (a SQL file is provided as
sql/tables.sql
) in a TEMPORARY MySQL database that can be
dropped once the complete verification process is finished.
If these tables already exist from a previous verification, drop the tables and re-create them with the SQL in the package. The old data is neither required nor useful.
Layout for servers table - the devel.servers table is to contain the combined list of servers with all duplicates removed, including where the canonical name appears in the gateway and the corresponding IP in asyncz. This has the side-effect of eliminating most canonical names that do not resolve using DNS. Most servers that previously appeared as DEAD in isbnsearch-devel reports will now simply be ignored until the next run of duplicates.pl at the time of the next release. Previous error states are not transferred to the new table so that all servers are re-tested.
CREATE TABLE servers ( id int(10) unsigned NOT NULL auto_increment, name varchar(100) default NULL, port varchar(10) default NULL, z3950 varchar(100) default NULL, error varchar(50) default '?', PRIMARY KEY (id) ) TYPE=MyISAM;
Create the servers table (if you used sql/tables.sql
it will
have been created for you) in the same temporary database.
Import both sql/gatewaynew.sql
and sql/asyncz.sql
into the relevant tables, remembering to switch to the temporary database or to
specify the temporary database name on the command line.
This is an example only, using the temporary database name 'devel'
mysql devel -p < sql/gatewaynew.sql
mysql devel -p < sql/asyncz.sql
Now import the main isbnsearch servers from the isbnsearch package (copied
into the devel package as sql/servers.sql
- this will allow
existing servers to be re-verified.
mysql devel -p < sql/servers.sql
duplicates.pl
needs to be configured for your database
settings before it can run. Use your favourite text editor to add details for
your MySQL host, user, password and the name of the temporary database.
Unlike the main isbnsearch package, the table names are not configurable,
if you need to change them, be careful to change all references in the
duplicates.pl
script AND the SQL files.
duplicates.pl
combines the two tables into one SQL output of
unique records that is then added to the 'servers' table, ready for
verification:
perl duplicates.pl
Note: duplicates.pl will delete all entries from the asyncz and gateway tables in the temporary database.
The copyright licensing notice below applies to this text.
Copyright © 2003-2004 Neil Williams
Permission is granted to copy, distribute, and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of this license is included in the file copying.txt