OxCLIC MDIDimporting
Contents
-
Importing Images and the associated Portfolio catalogue into MDID as a new collection
- Export from Portfolio the catalogue information as catalogue.txt
- Importing a catalogue from portfolio to MDID ...more fun and games
- Checking in Excel the .csv file
- Convert the .csv to XML using helper tool
- Create in MDID an empty collection
- Import images into MDID using Imagemanager tool
- Customising the collection in MDID
- Draft early version of importing steps
Importing Images and the associated Portfolio catalogue into MDID as a new collection
Export from Portfolio the catalogue information as catalogue.txt
File > Export
Importing a catalogue from portfolio to MDID ...more fun and games
In Portfolio you should select the records for which you would like to export the metadata. Choose File -> Export Field Values. You will then get to choose which field values you would like to export to a tab-separated text file.
You will need to open the file up and convert it to CSV. This is most conveniently done using a spreadsheet such as excel.
Then you will need to manipulate the first row of the file, which contains the headings from the Portfolio catalogue. You will need to replace certain characters (in the first line only) for the inport filter to be able to proceed (see Section 5.2 above ).
- spaces with an underscore _
- colons : with an underscore _
- semicolons : with an underscore _
- parentheses : with an underscore _
ampersands & with an underscore _
If you do not remove these characters, any column that contains such a character will be discarded silently by the convert.exe program, and will not be imported into MDID. It is arduous to add the data in later, so you must get this operation correct right now.
The file: [MDID2 Curator's Workshop .pdf http://mdid.org/mdidwiki/images/b/bb/MDID2CuratorWorkshop.pdf] gives instructions from this point onwards, and is quite good.
Checking in Excel the .csv file
- The source file must follow strict guidelines; otherwise the conversion will fail or result in a meaningless XML file. The first row in a spreadsheet or first line in a CSV file must contain the column headings.
Identifier Field
- One of the columns must contain values that can be used as unique identifiers for
the records. These values will be used to match input records to existing records in the collection. This column is referred to as the Identifier Field.
Resource Field
- One of the columns must contain values that represent the image resources,
usually the file names of the associated image files. This column is referred to as the Resource Field.
All other columns must match exactly one field in the target collection.Not all collection fields need to have a column in the spreadsheet or CSV file, but all columns in the spreadsheet or CSV file must match a field in the collection. In general, each row in the spreadsheet or line in the CSV file represents one data record. The only exception is for fields that have multiple values, in which case the following rows or lines can contain additional values, as long as the Identifier Field and Resource Field for these additional rows are blank.
Convert the .csv to XML using helper tool
Locate the program convert.exe, which comes with the MDID2 Tools download (avaliable from sourceforge with the MDID2 server software). Run the program (does not seem to run over remote desktop on Windows server 2003) and point it at the csv file that you have just created. Apparently it is possible to work from an Excel worksheet instead of a CSV file, but I have not tested how well this works.
There are a number of settings that you need to make before you generate the XML file.
Choose the Identifier field (this should be the accession number e.g. arth_aa1994) and Resource field (this should be the filename e.g. arth_aa1994.JPG ) from the drop down menus.
Under Options you should choose to split field values at newline characters rather than semicolons.
Click Start Conversion and you will be prompted for the name and destination of the XML file.
Create in MDID an empty collection
The next step is to log into the MDID2 application and create a new collection to hold the images and metadata. Fill out the name, Description and image path (which you will presumably have to know by this stage) Then click on Field Definitions at the top of the page. Choose to get your field definitions from the XML file you have just created, and navigate to locate it.
Next you choose Import Data
Choose your XML file again, and click the import data button.
The page will refresh, showing you the progress it has made in importing your data.
{{{Import Status
The last data import for this collection successfully finished. 1107 of 1107 records processed: 1107 records added 0 records replaced 0 records already existed and were skipped 0 records did not have an identifier and were skipped }}} Assuming this completes satisfactorily you now need to deal with the images in your collection.
Import images into MDID using Imagemanager tool
On the MDID server, run Imagemanager.
Connect to the server on the non-webauthed port (8081 in our case) e.g.
http://oxclic.oucs.ox.ac.uk:8081/
- It will need the username and password of an admin user.
Select the collection you have just created (or possibly the collection for which you want to update images).
Missing images are shown in red in the image manager. Initially you should click 'Select Problems'
Then you need to point the program at a single directory that contains the images associated with the collection. This is done by Selecting 'Collection' -> Assign Local resources and navigating to locate the image directory. Next you press the Action Upload button, then Start. You can either repeat this process for individual folders of images, or bung all your images into a single folder.
The program will import the images to the MDID2 Application.
Check that the images are now available to be browsed in the online MDID application.
Check all images are there and all metadata fields are there. If fields are missing then it is likely that the field name contains XML unfriendly characters.
Customising the collection in MDID
- Setup the fields for presentation in the right order via
Collection > Field order
- Hide some obscure fields so they don't appear in search screens etc
Collection > fields > modify
Tip: Tidy the field name order to be the most suitable for presentation on screen, the most important field first, perhaps hiding the more obscure field names from browses and searches. In MDID as collection administrator to modify field definitions so they .-
Management > Collections > Manage Collection X > Field Definitions > Modify
- You can rearrange the field names so that the most important ones are listed first.
Management > Collections > Manage Collection X > Field Definitions > Move
For compatibility with MDID's default settings you should mark one of your fields to be a 'Title' field. You do this here:
Management > Collections > Manage Collection X > Field Definitions > Modify > Label > Title
In order to facilitate cross searching you also need to make a field to be the dc:title field i.e. "Dublin Core Title" that appears in Lightbox views etc
Management > Collections > Manage Collection X > Field Definitions > Modify > DC Elements > Title
Mapping fields to Dublin Core for cross-searching
It is also possible to map these fields to Dublin Core Elements.
From the Curators Workshop Manual:
"DC Element: The Dublin core element, which is used to map fields in different collections to each other. In order to perform cross collection searches, all involved collections must have Dublin core elements set for at least some of their fields. This property is optional. Multiple fields can have the same Dublin core element entry. "
Management > Collections > Manage Collection X > Field Definitions > Modify > DC Elements
Draft early version of importing steps
Open catalogue.txt in Excel
- WARNING !! You cannot import field names with characters including spaces,brackets. Workaround is to rename at the Excel stage.
The Curators Workshop manual takes you through this process very thoroughly:
"The instructions in this section assume that your cataloging data is available in a Microsoft Excel spreadsheet, with the first row containing column headers and each row below representing an image record.
Preparing a CSV file
The only format in which cataloging data can be imported into MDID2 is specially formatted XML. Using the data conversion tool, Microsoft Excel spreadsheets and CSV (Comma Separated Value) files can be converted to correctly formatted XML files.
- The source file must follow strict guidelines; otherwise the conversion will fail or result
in a meaningless XML file:
- The first row in a spreadsheet or first line in a CSV file must contain the column
headings. The headings themselves must match the field names in the target collection exactly.
- One of the columns must contain values that can be used as unique identifiers for
the records. These values will be used to match input records to existing records in the collection. This column is referred to as the Identifier Field.
- One of the columns must contain values that represent the image resources,
usually the file names of the associated image files. This column is referred to as the Resource Field.
- All other columns must match exactly one field in the target collection. Not all
collection fields need to have a column in the spreadsheet or CSV file, but all columns in the spreadsheet or CSV file must match a field in the collection.
- In general, each row in the spreadsheet or line in the CSV file represents one data
record. The only exception is for fields that have multiple values, in which case the following rows or lines can contain additional values, as long as the Identifier Field and Resource Field for these additional rows are blank.
- The spreadsheet or CSV file must not contain any additional data to the right or
below the data records.
Dealing with special characters not supported in the field names
Portfolio has many default field names with spaces and it is common to use a colon in DC and VRA field names. It is necessary to modify all field names that contain spaces e.g. "Short Filename" with "Short_Filename". It is straightforward to open the catalogue.csv file in Excel and change any top row field name that contain spaces to have underscores instead.
MDID needs a field which is a unique identifier, called Identifier and a field that refers to the resource image name called Resource. It is sensible to have these two fields as the first two fields in the .csv file. As we have a unique identifier from the OxCLIC naming conventions in the form of the name of the image, this field can if duplicated to serve both purposes.
Steps to do this in Excel
- Move image file name column ( which should be the unique identifier) to be the first column. Duplicate this column
- Name first column - "Identifier"
- Name second column - "Resource"
- Paste Column Identifier into Text Editor, remove the .jpg from all the rows using search for ".jpg" and replace with no string, then paste back in the excel column. ( Note: is there an Excel solution for search and replace per column?? )
- Save as .csv file ( comma separated variable file )
Move the .csv file to the MDID server
Check there aren't any odd characters appearing if moved from Mac to PC by looking at the file in a text editor. Check again that the field names don't contain spaces or unusual characters. If they do replace with underscores using find and replace in a text editor.
- Warning: Field Names with unusual characters can't be supported by XML, so the helper application doesn't convert them and you'll find the field missing in MDID
Download the set of Curators tools from the MDID Source Forge site http://sourceforge.net/project/showfiles.php?group_id=115144
The steps below are clearly outlined in the essential document, The Curator's Workshop Handbook. The Curator's workshop manual is the key document that talks you through this process step by step. The notes below are brief comments that are relevant to the OxCLIC workflow. http://mdid.org/mdidwiki/images/b/bb/MDID2CuratorWorkshop.pdf
You could also get the tools from http://mdid.org, clicking on downloads and then clicking on "MDID2 package downloads at SourceForge.net" on the down load page. Both tools require Windows XP with .net installed
Use the "Convert.exe" XML generator tool to convert the .csv into the required XML
In the application, open the catalogue.csv file
- Choose the Identifier column
- Choose the Resource Column
- Under options - choose "Remove Resource Column from Output" - To save having an unneeded column in the XML file
Click - "Start Conversion"
You should now have an xml version called catalogue.xml
Setup the collection in MDID
Log into MDID
Go
Management > Collections > Create New Collection
Setup a new empty collection in MDID and
Go
Field Definitions > Import Data > catalogue.xml
This takes around 30 seconds for 350 records. Please wait while it imports the records.
Check to see the field names are the correct ones and that none are missing. If missing check that the original .csv file hasn't got spaces, brackets or commas in any field titles. A simple batch "find and replace" of these characters in a text editor can help tidy up the column field names. Once the cataogue information is imported you can search and broswe the information in the online application, it is not necessary to import the images immediately as MDID recognises that the images are not available.
Loading the images into MDID
Run the Imagemanager.exe application
It should say all images are missing !
Select All - Browse to the folder where the images are located
Select the records you want to assign to images and import. Repeat with folders of images until all are imported.
- Warning: MDID cannot see into multi-level folders so each folder has to be imported one at a time.