Score:5

How do I replace multiple fields in multiple XML files?

hu flag

I have around 4000 XML files and I need to replace the value of both <filename> and <path> fields. I need to replace those fields dynamically. e.g. images0001.xml should have images0001 inside the two fields, images0002.xml should have images0002 inside the two fields, etc.

I've already used this command to rename the files sequentially:

rename 's/.+/our $i; sprintf("images%04d.jpg", 1+$i++)/e' *

And I also used this command to delete the .jpg extension that was in the two fields I'm trying to change:

sed -i 's/.jpg//g' Annotations/*

Here is the current state of the contents of the XML files:

<annotation>
    <folder></folder>
    <filename>1608644703_2.rf.fa179c1e6c47d72d668ac3d83c7f79d1</filename>
    <path>1608644703_2.rf.fa179c1e6c47d72d668ac3d83c7f79d1</path>
    <source>
        <database>roboflow.ai</database>
    </source>
    <size>
        <width>416</width>
        <height>416</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>megot</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <occluded>0</occluded>
        <bndbox>
            <xmin>129</xmin>
            <xmax>292</xmax>
            <ymin>145</ymin>
            <ymax>351</ymax>
        </bndbox>
    </object>
</annotation>

And here is how I need the files to be changed:

<annotation>
    <folder></folder>
    <filename>images0001</filename>
    <path>images0001</path>
    <source>
        <database>roboflow.ai</database>
    </source>
    <size>
        <width>416</width>
        <height>416</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>megot</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <occluded>0</occluded>
        <bndbox>
            <xmin>129</xmin>
            <xmax>292</xmax>
            <ymin>145</ymin>
            <ymax>351</ymax>
        </bndbox>
    </object>
</annotation>

I'm looking for a way to do this in command line, but I can't figure out a solution after searching for a while!

Any help will be appreciate. Thanks in advance!

Score:5
jp flag

You would need an XML tool(like xmlstarlet) ... so:

sudo snap install xmlstarlet

In a loop ... so:

for f in *.xml
    do
    xml ed -L -u "(//annotation/filename)" -v "${f/.xml/}" -u "(//annotation/path)" -v "${f/.xml/}" "$f"
    done
Axel Boissier avatar
hu flag
I've installed xmlstarlet and made a script containing the code below but when I try to run it I keep getting the error `line 4: xml: command not found`. Am I missing something ? Thanks for your answer btw !
Axel Boissier avatar
hu flag
Just found out that I just had to replace `xml` by `xmlstarlet` and it worked perfectly !
Score:4
ca flag

One way to do it with sed is by running the following command:

for f in *.xml; do sed "s|\(<filename>\).*\(</filename>\)|\1${f%.*}\2|; s|\(<path>\).*\(</path>\)|\1${f%.*}\2|" "$f"; done
  • for f in *.xml; do ... ; done is a basic for loop for .xml files in your current directory. Each file found is stored in the f variable.

  • sed "s|\(<filename>\).*\(</filename>\)|\1${f%.*}\2|; s|\(<path>\).*\(</path>\)|\1${f%.*}\2|" "$f" is the command that is run for each file found. The command does two similar replacements, one for the <filename> field and one for the <path> field:

    • s|\(<filename>\).*\(</filename>\)|\1${f%.*}\2| matches the text \(<filename>\).*\(</filename>\) using regex (.* means match any character between <filename> and </filename>). The \( and \) are not matched but are used to capture the <path> and </filename> texts which are then used as replacements. Then the text is replaced by the first capture \1 (<filename>), the filename without the extension ${f%.*}, and the second capture \2 (</filename>).

    • Similarly for path, using it instead of filename.

After making sure that the above command works as intended by using it in a copied portion of your files, add the -i flag right after sed to change the files in place.

Raffa avatar
jp flag
+1 This is a working solution as well … Not XML specific but, “all roads lead to Rome” :-)
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.