EsoErik

Sunday, July 26, 2009

retrieving original iPod music filenames from an iPodDB

I noticed that when iTunes writes a track to an iPod, it hashes the track filename. For example, an MP3 on an iPod might be called HQXA.mp3 and located in one of many directories with two letter numeric names. Incidentally, this style of naming files is similar to that used by CCache, squid, and other applications that store a large number of files.

I desired to import numerous such files from an iPod into iTunes. iTunes supports automatic organization of imported tracks by the information stored in each track (such as artist name, album name, track number, disc number). Thus, for tracks imported into iTunes, filenames are irrelevant - provided that those tracks include tag information. Some of the tracks that I wanted to import included no useful information in their tags. The original tack filenames was descriptive and obviously stored somewhere on the iPod - when the iPod played these tracks, it showed their original filenames (minus extension). I browsed through the data files on the iPod and quickly found that iPod_Control/iTunes/iTunesDB appeared to contain track names in little endian UCS-16 or UTF-16 format in addition to a significant amount of other data.

On a flight from Boston to Reykjavik, I wore down my laptop battery exploring the iTunesDB file from an iPod containing the music I wished to recover. Looking at the file in a hex editor, I quickly recognized that the iTunesDB is definitely a database composed of a hierarchical record structure. Records are delimited by plain ASCII strings beginning with "mh"; offsets and sizes are specified as 32 bit little endian integers that are probably unsigned (I would need a 2GiB+ iTunesDB to verify this - mine was just 500KiB, and the DB does not use virtual addressing).

To keep myself occupied as I overcame jetlag upon reaching Norway, I wrote a ruby module and front end application to retrieve original filenames from an iTunes DB and to give hashed filenames their original names. It worked well with my data; that said, I haven't tested it with any other, so be careful if you use my application.

fix_iPod_filenames.rb (the frontend application)


#! /usr/bin/env ruby19
#
# (c) Erik Hvatum, 2009
#

require 'pathname'
require 'FileUtils'
require 'iTunesDB'

if ARGV.size != 2
    abort "Usage: fix_iPod_filenames.rb <directory containing iPod_Control> <directory in which to place renamed music files>"
end

iPodRoot = Pathname.new(File.expand_path(ARGV[0]))
dbPath = iPodRoot + "iPod_Control/iTunes/iTunesDB"
if !dbPath.exist?
    abort "iPod database \"#{dbPath}\" does not exist or is inaccessible."
end
destPath = Pathname.new(File.expand_path(ARGV[1]))
if !destPath.exist?
    abort "Destination path \"#{destPath}\" does not exist or is inaccessible."
end

db = MiTunesDB::CiTunesDB.new
db.open(dbPath)
tracks = db.tracks
db.close
db = nil

tracks.sort! {|l, r| l.location <=> r.location}
tracks.each do |track|
    needSep = false
    dstFn = ""
    doField = Proc.new do |field|
        v = eval "track.#{field}"
        if v != nil
            if needSep
                dstFn << "-"
            else
                needSep = true
            end
            dstFn << v.to_s
        end
    end
    doField.call("album")
    doField.call("discNumber")
    doField.call("trackNumber")
    doField.call("artist")
    doField.call("title")
    dstFn.gsub!(/[*!?\\\/:%]/, "_")
    src = iPodRoot.to_s + track.location.gsub(/:/, "/")
    dst = destPath.to_s + "/" + dstFn[0, 250] + "." + track.format.downcase.gsub(/ $/, "")
    FileUtils.mv(src, dst)
end

iTunesDB.rb (the module used by the frontend application)


# (c) Erik Hvatum, 2009

module MiTunesDB

    # Reads the binary DB written by iTunes Windows to an iPod circa late 2008
    class CiTunesDB
        Track = Struct.new(:title,
                           :location,
                           :album,
                           :artist,
                           :genre,
                           :fileType,
                           :comment,
                           :composer,
                           :grouping,
                           :description,
                           :albumArtist,
                           :format,
                           :trackNumber,
                           :discNumber)
        attr :mhbd
        attr :file

        def initialize
            @mhbd = nil
            @file = nil
        end

        # Opens the iTunesDB specified by filename, loading its contents into memory
        def open(fileName)
            @file = File.open(fileName, "rb")
            @mhbd = Cmhbd.new(self)
        end

        # Closes the iTunesDB
        def close()
            @mhbd = nil
            @file = nil
        end

        # Returns all known tracks as an array of Track structures (defined above)
        def tracks()
            ts = Array.new
            if @mhbd
                @mhbd.children.each do |mhsd|
                    if mhsd.children
                        mhsd.children.each do |sdChild|
                            if sdChild.is_a? Cmhlt
                                sdChild.children.each do |mhit|
                                    t = Track.new
                                    ts << t
                                    mhit.children.each do |mhod|
                                        if Cmhod::Types.has_key? mhod.type
                                            sym = (Cmhod::Types[mhod.type].to_s + '=').to_sym
                                            t.method(sym).call(mhod.str)
                                        end
                                    end
                                    t.format = mhit.format
                                    t.trackNumber = mhit.trackNumber
                                    t.discNumber = mhit.discNumber
                                end
                            end
                        end
                    end
                end
            end
            return ts
        end

        # Base class for objects in the iTunes DB
        class Cmh_base
            attr :addr
            attr :depth
            attr :len
            attr :recordName
            attr :children

            def initialize(addr, db, depth)
                @addr = addr
                @depth = depth
                @len = nil
                @db = db
                @children = nil
                load()
            end

            def to_s()
                indent = "\t" * @depth
                return "#{indent}#{@recordName}:\n#{indent} len: #@len\n"
            end

            protected

            def load(loadLen = true)
                if @db.file.tell != @addr
                    @db.file.seek(@addr)
                end
                recordName = @db.file.read(@recordName.length())
                if recordName != @recordName
                    raise "Invalid record identifier in DB: expected \"#@recordName\" but read \"#{recordName}\"."
                end
                @len = readUInt32() if loadLen
            end

            def loadChildren()
                @children = []
            end

            def readUInt32()
                return @db.file.read(4).unpack("V")[0]
            end

            def children_to_s()
                ret = String.new
                if @children
                    @children.each { |child| ret << child.to_s() }
                end
                return ret
            end
        end

        # Record describing the database.  This is the first record in the database and is located at the beginning
        # of the database file.
        class Cmhbd < Cmh_base
            attr :dbVersion
            attr :dbLen

            def initialize(file)
                @dbVersion = nil
                @dbLen = nil
                @recordName = "mhbd"
                super(0, file, 0)
            end

            def to_s()
                indent = "\t" * @depth
                return super() << "#{indent} dbVersion: #@dbVersion\n#{indent} dbLen: #@dbLen\n" << children_to_s()
            end

            protected

            def load()
                super
                seekTo = @addr + @recordName.length() + 4
                if @db.file.tell != seekTo
                    @db.file.seek(seekTo)
                end
                @dbLen = readUInt32()
                @dbVersion = Array.new(3) { readUInt32() }
                loadChildren()
            end

            def loadChildren()
                super
                childAddr = @addr + @len
                while childAddr < @dbLen
                    @children << Cmhsd.new(childAddr, @db, @depth + 1)
                    childAddr += @children[-1].childSize
                end
            end
        end

        # Record describing a dataset.
        class Cmhsd < Cmh_base
            # Adding childSize to @addr gives the addr of the next mhsd record
            attr :childSize
            attr :childType

            def initialize(addr, db, depth)
                @recordName = "mhsd"
                @childSize = nil
                @childType = nil
                super
            end

            def to_s()
                indent = "\t" * @depth
                return super() << "#{indent} childSize: #@childSize\n#{indent} childType: #@childType\n" << children_to_s()
            end

            protected

            def load()
                super
                seekTo = @addr + @recordName.length() + 4
                if @db.file.tell != seekTo
                    @db.file.seek(seekTo)
                end
                @childSize = readUInt32()
                @childType = readUInt32()
                loadChildren()
            end

            def loadChildren()
                super
                if @childType == 1
                    @children << Cmhlt.new(@addr + @len, @db, @depth + 1)
                end
            end
        end

        # Record describing a track list
        class Cmhlt < Cmh_base
            attr :numChildren

            def initialize(addr, db, depth)
                @numChildren = nil
                @recordName = "mhlt"
                super
            end

            def to_s()
                indent = "\t" * @depth
                return super() << "#{indent} numChildren: #@numChildren\n" << children_to_s()
            end

            protected

            def load()
                super
                seekTo = @addr + @recordName.length() + 4
                @db.file.seek(seekTo)
                @numChildren = readUInt32()
                loadChildren()
            end

            def loadChildren()
                super
                addr = @addr + @len
                @numChildren.times do
                    mhit = Cmhit.new(addr, @db, depth + 1)
                    @children << mhit
                    addr += mhit.totalLen
                end
            end
        end

        # Record describing a track item
        class Cmhit < Cmh_base
            attr :totalLen
            attr :numStrMhods
            attr :id
            attr :format
            attr :trackNumber
            attr :discNumber

            def initialize(addr, db, depth)
                @totalLen = nil
                @numStrMhods = nil
                @id = nil
                @format = nil
                @recordName = "mhit"
                @trackNumber = nil
                @discNumber = nil
                super
            end

            def to_s()
                indent = "\t" * @depth
                str = super
                str << "#{indent} totalLen: #@totalLen\n"
                str << "#{indent} numStrMhods: #@numStrMhods\n"
                str << "#{indent} id: #@id\n"
                str << "#{indent} format: #@format\n"
                str << "#{indent} trackNumber: #@trackNumber\n"
                str << "#{indent} discNumber: #@discNumber\n"
                str << children_to_s()
                return str
            end

            protected

            def load()
                super
                seekTo = @addr + @recordName.length() + 4
                @db.file.seek(seekTo)
                @totalLen = readUInt32()
                @numStrMhods = readUInt32()
                @id = readUInt32()
                @db.file.seek(4, IO::SEEK_CUR)
                @format = @db.file.read(4).reverse()
                @db.file.seek(@addr + 44)
                trackNumber = readUInt32()
                @trackNumber = trackNumber if trackNumber > 0
                discNumber = readUInt32()
                @discNumber = discNumber if discNumber > 0
                loadChildren()
            end

            def loadChildren()
                super
                addr = @addr + @len
                numStrMhods.times do
                    mhod = Cmhod.new(addr, @db, depth + 1)
                    children << mhod
                    addr += mhod.len
                end
            end
        end

        # Record describing a data object
        class Cmhod < Cmh_base
            Types = {1 => :title,
                     2 => :location,
                     3 => :album,
                     4 => :artist,
                     5 => :genre,
                     6 => :fileType,
                     8 => :comment,
                     12 => :composer,
                     13 => :grouping,
                     14 => :description,
                     22 => :albumArtist}.freeze
            attr :headerLen
            attr :type
            attr :strLen
            attr :str

            def initialize(addr, db, depth)
                @headerLen = nil
                @type = nil
                @str = nil
                @strLen = nil
                @recordName = "mhod"
                super
            end

            def to_s()
                indent = "\t" * @depth
                str = super
                str << "#{indent} headerLen: #@headerLen\n"
                str << "#{indent} type: #@type\n"
                str << "#{indent} strLen: #@strLen\n"
                str << "#{indent} str: #@str\n" if @str
                return str
            end

            protected

            def load()
                super(false)
                @db.file.seek(@addr + @recordName.length())
                @headerLen = readUInt32()
                @len = readUInt32()
                @type = readUInt32()
                @db.file.seek(12, IO::SEEK_CUR)
                @strLen = readUInt32()
                if @strLen > 0 && @strLen % 2 == 0
                    @db.file.seek(8, IO::SEEK_CUR)
                    begin
                        str = @db.file.read(@strLen).force_encoding("UTF-16LE").encode("US-ASCII")
                    rescue
                    else
                        @str = str
                    end
                end
            end
        end
    end

end

Labels: coding, iPod, iTunesDB, ruby

posted by Erik Hvatum # 6:06 PM

Comments:

EsoErik

Sunday, July 26, 2009

retrieving original iPod music filenames from an iPodDB

Post a Comment

Archives