Migration

When the dtd has been updated, if there are some new required elements, we have to update the XML files as well. Since we can load an existing XML file without validating it, it’s very easy to update it.

Updating a XML file after a dtd change

We use the same dtd as before:

<!ELEMENT movies (movie*)>
<!ELEMENT movie (title, date?, realisator, characters, (good-comment|bad-comment)*, (bad|good|awesome)?)>
<!ATTLIST movie idmovie ID #IMPLIED>
<!ELEMENT title (#PCDATA)>
<!ELEMENT date (#PCDATA)>
<!ELEMENT realisator (#PCDATA)>
<!ELEMENT characters (character+)>
<!ELEMENT character (#PCDATA)>
<!ATTLIST character idcharacter ID #IMPLIED>
<!ELEMENT good-comment (#PCDATA)>
<!ELEMENT bad-comment (#PCDATA)>
<!ELEMENT bad (#PCDATA)>
<!ELEMENT good (#PCDATA)>
<!ELEMENT awesome (#PCDATA)>

We just put the date element as required

<!ELEMENT movies (movie*)>
<!ELEMENT movie (title, date, realisator, characters, (good-comment|bad-comment)*, (bad|good|awesome)?)>
<!ATTLIST movie idmovie ID #IMPLIED>
<!ELEMENT title (#PCDATA)>
<!ELEMENT date (#PCDATA)>
<!ELEMENT realisator (#PCDATA)>
<!ELEMENT characters (character+)>
<!ELEMENT character (#PCDATA)>
<!ATTLIST character idcharacter ID #IMPLIED>
<!ELEMENT good-comment (#PCDATA)>
<!ELEMENT bad-comment (#PCDATA)>
<!ELEMENT bad (#PCDATA)>
<!ELEMENT good (#PCDATA)>
<!ELEMENT awesome (#PCDATA)>

The XML doesn’t have the date defined:

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE movies SYSTEM "examples/movies-new.dtd">
<movies>
  <movie>
    <title>Titanic</title>
    <realisator>Cameron James</realisator>
    <characters>
      <character>Leonardo DiCaprio</character>
      <character>Kate Winslet</character>
    </characters>
    <good-comment>My comment 1</good-comment>
    <bad-comment>My comment 2</bad-comment>
  </movie>
</movies>
>>> import xmltool
>>> filename = 'examples/migration.xml'
>>> # It fails since by default we validate the XML follows the DTD
>>> obj = xmltool.load(filename)
Traceback (most recent call last):
...
DocumentInvalid:
>>> obj = xmltool.load(filename, validate=False)
>>> # The date tag is automatically added when generating the XML
>>> print obj
<movies>
  <movie>
    <title>Titanic</title>
    <date></date>
    <realisator>Cameron James</realisator>
    <characters>
      <character>Leonardo DiCaprio</character>
      <character>Kate Winslet</character>
    </characters>
    <good-comment>My comment 1</good-comment>
    <bad-comment>My comment 2</bad-comment>
  </movie>
</movies>