xmcd-textfiles) to a MySQL-database using
freedbimporter.jar- many steps will just be explained in a click-here-click-there-manner. Along this some questions may arise and some things might be worth mentioning - some of these are shifted into the appendix of this document (the internals-section).
If you want to set up a MySQL-port from FreeDB you have to be patient. A complete transfer (including a final creation of MySQL-indices) might take 16 to 20 hours.
C:\install\freedb. Your system has Java 1.5 (or higher) installed. You have downloaded
freedbimporter.jarfrom http://sourceforge.net/projects/freedbimporter/files/. You have a MySQL-server 5.0 running on
3306) and there is the superuser named
rootwithout any password required. To avoid
freedbimporter.jarvia commandline with something like this:
java -Xms256m -Xmx512m -jar freedbimporter.jar
If you face sever problems following the instructions here please take a look at the debug-section below. For recent (February 2012)
NullPointerExceptions consider advise to change charset-detector.
C:\install\freedbby melting or merging them into only few (about 1500) bigger textfiles in
C:\install\freedb.merged. This merge itself takes time (about four hours) but it speeds up subsequent building the MySQL-database and enforces
UTF-8-encoding for all merged files.
To start merging you pick first button Merge FreeDB-textfiles from the four-sliced MainGUI
and the MergerConfigurator opens.
This initial merging is crucial for a successful transfer from FreeDB's database (
xmcd-metadata stored in textfiles) to MySQL. The
xmcd-text might be encoded as
For merging you can either stick to
ISO-8859-1 or pick
UTF-8 to be standard-encoding and assume that all files provided by FreeDB are encoded this way. As this is obviously wrong you can choose an additional filter to check out different encodings for each and every textfile to merge. This package contains IBM's ICU and Mozilla's Chardet to detect files' encodings. You can use them both in an intersection mode that only picks a different encoding for a read file if ICU and Chardet spot it simultaneously. At the same time you can switch off any of these both and assume that your chosen standard-encoding fits all files - doing so is a bad idea.
Note: There seems to be a problem with ICU's charset-detection since the beginning of 2012. You are strongly advised to switch the charset-detector to Chardet if you perform merging on the update-sections of FreeDB-Data.
After you picked source- and target-directory the merger is ready to go
Now you Start merge. After the merging succeeded (which takes three to four hours) you receive a report and find about 600 textfiles in
C:\install\freedb.merged - this merged dataset is completely encoded as
For both scenarios let's assume that your MySQL-server is a freshly installed default-system running on
127.0.0.1), listening on port
3306 with the superuser named
root and without any password for this superuser.
Moreover let's assume that you already merged the original dataset from FreeDB so that all textfiles to transfer are already
and the DatabaseConfigurator opens.
You need to connect to your MySQL-server and click Connect. The MySQLConfigurator shows up:
For this connection that just issues the initial SQL-
CREATE-sequence the last three options Adapter, Genre-Blacklist and Genre-Whitelist are irrelevant so they will be discussed later.
The only interesting option is Tables. This setting defines the number of tables in the database and should be set either to 2 large tables or to 4 smaller tables. Take a look at the internals below to figure out whether to create two or four tables. If you do not want to get into details and just start investigating FreeDB you select the two-tabled version. The only important thing here: the server you are going to connect does not already have a database named
Connect to get back to DatabaseConfigurator.
To create the target-database
freedb just press Go!. If there has been no database named
localhost you instantly can confirm success:
By closing the DatabaseConfigurator's window in the upper right corner you retain the root-connection to MySQL-server opened. You can use it later after filling the freshly created database to validate it (an option) and to index it (a recommendation).
and the SpoolConfigurator shows up:
First you configure the MySQL-Connection again.
The settings for the MySQL-Connection are roughly the same they were when creating the database. As the database
freedb now exists you can stick to option Tables' default unset (autodetect).
In this situation there are the three options Adapter, Genre-Blacklist and Genre-Whitelist to consider.
Filling the MySQL-database is often referred to by spooling or transferring - these three terms all denote the same procedure: reading
xmcd-textfiles from FreeDB (handles different encodings, might fail and loose some data or mistake encoding and produce garbage), parsing those files (coping with
xmcd-violations, might loose some data) and generating some SQL-statements that use the parsed and partially postprocessed data. Just as the both first steps reading and parsing this second step (sending
INSERT-statements) looses data due to the content of the data or some of its characters that the MySQL-server might not be able to store in its database.
One consequence is that neither the count of tiny textfiles received from FreeDB nor the number of CDs parsed from merged data determines the number of CDs that will be stored in your MySQL-database.
An intermediate problem is that many CD-descriptions are either rubbish or they contain mistyped strings (names, titles or genres). Most of the postprocessing takes place as successful parsed CDs get spooled to MySQL so this processing is introduced and configured here.
A MySQL-connection takes a CD and adapts it before the connector tries to generate
INSERT-statements and sends them to MySQL in order to store the CD that connector just received. Any MySQL-connector can have one of the following adapters:
|does not change any data received and does not try to determine the CD's language.|
|does not change data but adds a language-code, if the CD's language is recognized.|
|tries to eliminate mistyped artists, genres or titles and filters out some rubbish. This adapter does not try to determine CD's language.|
|tries to eliminate mistyped artists, genres or titles and filters out some rubbish. This adapter tries to determine CD's language. It also can use a black- or a whitelist to reject or accept CDs depending on their genre.|
Determining a CD's language is a rather time-consuming task that takes almost two hours to check for the European languages German, English, French, Italian, Portuguese or Spanish.
Validating CD's data means either rejecting strings for being too short or a given genre is blacklisted or - quite the contrary - not whitelisted. You can use a black- or a whitelist to drop some data before it even gets spooled to MySQL. Here is a longish whitelist that seems to be a useful filter. If you like you can combine a black- and a whitelist for more sophisticated filtering.
Click Connect to return to SpoolConfigurator. After you picked your source-directory
C:\install\freedb.merged (containing only
UTF-8-encoded textfiles) you are ready to start.
Start spooling and be patient. Four to five hours will elapse before you will receive a report that does not give an accurate count of CDs in your database. While the CDs in the database themselves are valid there are some holes in the interior numbering-sequences of the database. These holes are harmless but they yield an incorrect count of CDs stored. To correct that you validate your database before creating indices.
By closing the SpoolConfigurator's window in the upper right corner you retain the
root-connection to MySQL-server opened. Think about closing this connection first.
Prepare to validate the database (an option, it takes about two hours again) and create indices (a recommendation, it takes four to six hours and causes massive harddisc-activity, your MySQL-server might be nearly gone meanwhile).
Instead of a detailed report you receive a brief notice when MySQL finishes.
C:\install\freedband skip the initial database-creation. You could even skip merging.
Merging is a very good idea if you expect to have more then just a single pass through the source-directory. Once you have found an appropriate configuration for your MySQL-database (two- or four-tabled) and you know how to set up the spooler's connection to the MySQL-server it might be sufficient to configure the spooler itself to look for the text-files' encodings while it traverses the unmerged source-directory
The only important difference to the initial setup of a FreeDB-port is the end of the transfer-procedure: If the indices for that database already have been created and existed before the latest spooling then you do not need to (re-) create them as they have been maintained while transferring new CDs to your MySQL-database. The only thing you are recommended to do is to validate the database even after you just spooled few new CDs.
The second thing is the encoding of the MySQL-databases. As there are two different schemata there also are two different encodings. The two-tabled-version of the database uses a
LATIN2-encoding whereas the four-tabled-version uses
UTF8-encoding. The later seems to be very piggy about the CDs delivered and does not store the same amount of CDs as its two-tabled stepbrother.
I have not investigated this and can not tell whether the higher count of CDs accepted by the
LATIN2-encoded database is due to its superior encoding or if it just accepts much more garbage than the
freedblargefor both tables (discs and tracks) are not too normalized and contain more textual information than the same tables within the second schema that uses four tables and is referred to as
freedbsmall. This small variant externalizes artists in a separate table and lists genres in a fourth table. The two-tabled (large) version of the database is
LATIN2-encoded whereas the four-tables (small) version is
The storage-space these different database-schemata require is just the opposite of what the names suggest: An unfiltered large FreeDB-port stores about 2 million CDs within two tables and uses about 3,1 GB on harddisc - that's a ratio harddisc to CDs of 1,55 for a large schema. An unfiltered small FreeDB-port stores about 1,6 million CDs within four tables and occupies 2,5 GB - that's a ratio harddisc to CDs of 1,56 for a small schema.
CREATEdescribes the two-tabled database-schema used by
freedbimporter.jarto store FreeDB's data. This database
freedblargehandels all chars
LANTIN2-encoded. The indices you can create within third pass are appended here.
CREATE DATABASE IF NOT EXISTS freedblarge DEFAULT CHARACTER SET LATIN2;
DROP TABLE IF EXISTS freedblarge.discs;
CREATE TABLE IF NOT EXISTS freedblarge.discs (
did BIGINT(10) UNSIGNED NOT NULL,
freedbdiscid BIGINT(10) UNSIGNED NOT NULL,
aname VARCHAR(203) NOT NULL,
dtitle VARCHAR(187) NOT NULL,
gtitle VARCHAR(106) NOT NULL,
dreleased YEAR(4) DEFAULT NULL,
dtracks INT(4) UNSIGNED NOT NULL,
dseconds INT(4) UNSIGNED NOT NULL,
dlang ENUM ('deu', 'fra', 'ita', 'por', 'spa', 'eng') DEFAULT NULL,
UNIQUE(aname, dtitle, dtracks),
) ENGINE = MyISAM DEFAULT CHARACTER SET LATIN2;
DROP TABLE IF EXISTS freedblarge.tracks;
CREATE TABLE IF NOT EXISTS freedblarge.tracks (
discs_did BIGINT(10) UNSIGNED NOT NULL,
tnumber INT(4) UNSIGNED NOT NULL,
ttitle VARCHAR(255) NOT NULL,
aname VARCHAR(203) NOT NULL,
tseconds INT(4) UNSIGNED NOT NULL,
PRIMARY KEY(discs_did, tnumber)
) ENGINE = MyISAM DEFAULT CHARACTER SET LATIN2;
CREATE INDEX discs_index ON freedblarge.discs (
freedbdiscid, aname, dtitle, gtitle, dreleased, dseconds
CREATE INDEX tracks_index ON freedblarge.tracks (
CREATEdescribes the four-tabled database-schema used by
freedbimporter.jarto store FreeDB's data. This database
freedbsmallhandels all chars
UTF8-encoded. The indices you can create within third pass are appended.
CREATE DATABASE IF NOT EXISTS freedbsmall DEFAULT CHARACTER SET UTF8; DROP TABLE IF EXISTS freedbsmall.artists; CREATE TABLE IF NOT EXISTS freedbsmall.artists ( aid BIGINT(10) UNSIGNED NOT NULL, aname VARCHAR(255) NOT NULL, PRIMARY KEY(aid), UNIQUE(aname) ) ENGINE = MyISAM DEFAULT CHARACTER SET UTF8; DROP TABLE IF EXISTS freedbsmall.genres; CREATE TABLE IF NOT EXISTS freedbsmall.genres ( gid BIGINT(10) UNSIGNED NOT NULL, gtitle VARCHAR(255) NOT NULL, PRIMARY KEY(gid), UNIQUE(gtitle) ) ENGINE = MyISAM DEFAULT CHARACTER SET UTF8; DROP TABLE IF EXISTS freedbsmall.discs; CREATE TABLE IF NOT EXISTS freedbsmall.discs ( did BIGINT(10) UNSIGNED NOT NULL, freedbdiscid BIGINT(10) UNSIGNED NOT NULL, artists_aid BIGINT(10) UNSIGNED NOT NULL, dtitle VARCHAR(241) NOT NULL, genres_gid BIGINT(10) UNSIGNED NOT NULL, dreleased YEAR(4) DEFAULT NULL, dtracks INT(4) UNSIGNED NOT NULL, dseconds INT(4) UNSIGNED NOT NULL, dlang ENUM ('deu', 'fra', 'ita', 'por', 'spa', 'eng') DEFAULT NULL, UNIQUE(artists_aid, dtitle, dtracks), PRIMARY KEY(did) ) ENGINE = MyISAM DEFAULT CHARACTER SET UTF8; DROP TABLE IF EXISTS freedbsmall.tracks; CREATE TABLE IF NOT EXISTS freedbsmall.tracks ( discs_did BIGINT(10) UNSIGNED NOT NULL, tnumber INT(4) UNSIGNED NOT NULL, artists_aid BIGINT(10) UNSIGNED NOT NULL, ttitle VARCHAR(255) NOT NULL, tseconds INT(4) UNSIGNED NOT NULL, PRIMARY KEY(discs_did, tnumber) ) ENGINE = MyISAM DEFAULT CHARACTER SET UTF8; CREATE INDEX artists_index ON freedbsmall.artists ( aname ); CREATE INDEX genres_index ON freedbsmall.genres ( gtitle ); CREATE INDEX discs_index ON freedbsmall.discs ( freedbdiscid, artists_aid, dtitle, genres_gid, dreleased, dseconds ); CREATE INDEX tracks_index ON freedbsmall.tracks ( ttitle );
freedbimporterhas been started eleven years ago. Development came to rest about two years ago. The few mails that arrive indicate that the most important problem using
freedbimporteris the very poor information one gets if interaction with the mysql-server fails.
First step to gain some more information is to download the logging-property-file
log4j.properties and put it into the same directory that contains
freedbimporter.jar. You open a commandline there and issue
java -Xms512m -Xmx768m -jar freedbimporter.jar --log4jProperties log4j.properties
From now on
freedbimporter logs its interaction with the mysql-server to a file named
freedbimporter.log. This log should reveal what is going wrong. Be careful repeating your steps: The log grows big very fast and soon becomes much too large to send it to me ;-)
|at least since||issue|
|July 2015||Dropped |
|February 2012||Check for |
|December 2010||take some screenshots from current GUI|
|December 2009||redesign and redo language-annotation (ease language annotation for languages not spotted yet)|
|September 2009||take disc's track-count into index (prepares quicker fuzzy retrieval form rudimentarily implemented cddb-substitute)|
|November 2008||implement prepared statements on mysql-server-side (should speed up spooling)|
|E-Mail: Christian Kruggel||Version 2.1 by 21.10.2015|