OxCLIC MDIDimporting
Importing Images and the associated Portfolio catalogue into MDID as a new collection
Export from Portfolio the catalogue information as catalogue.txt
File > Export
Open catalogue.txt in Excel
- WARNING !! You cannot import field names with characters including spaces,brackets. Workaround is to rename at the Excel stage.
The Curators Workshop manual takes you through this process very thoroughly:
"The instructions in this section assume that your cataloging data is available in a Microsoft Excel spreadsheet, with the first row containing column headers and each row below representing an image record.
Preparing a CSV file
The only format in which cataloging data can be imported into MDID2 is specially formatted XML. Using the data conversion tool, Microsoft Excel spreadsheets and CSV (Comma Separated Value) files can be converted to correctly formatted XML files.
- The source file must follow strict guidelines; otherwise the conversion will fail or result
in a meaningless XML file:
- The first row in a spreadsheet or first line in a CSV file must contain the column
headings. The headings themselves must match the field names in the target collection exactly.
- One of the columns must contain values that can be used as unique identifiers for
the records. These values will be used to match input records to existing records in the collection. This column is referred to as the Identifier Field.
- One of the columns must contain values that represent the image resources,
usually the file names of the associated image files. This column is referred to as the Resource Field.
- All other columns must match exactly one field in the target collection. Not all
collection fields need to have a column in the spreadsheet or CSV file, but all columns in the spreadsheet or CSV file must match a field in the collection.
- In general, each row in the spreadsheet or line in the CSV file represents one data
record. The only exception is for fields that have multiple values, in which case the following rows or lines can contain additional values, as long as the Identifier Field and Resource Field for these additional rows are blank.
- The spreadsheet or CSV file must not contain any additional data to the right or
below the data records.
Dealing with special characters not supported in the field names
Portfolio has many default field names with spaces and it is common to use a colon in DC and VRA field names. It is necessary to modify all field names that contain spaces e.g. "Short Filename" with "Short_Filename". It is straightforward to open the catalogue.csv file in Excel and change any top row field name that contain spaces to have underscores instead.
MDID needs a field which is a unique identifier, called Identifier and a field that refers to the resource image name called Resource. It is sensible to have these two fields as the first two fields in the .csv file. As we have a unique identifier from the OxCLIC naming conventions in the form of the name of the image, this field can if duplicated to serve both purposes.
Steps to do this in Excel
- Move image file name column ( which should be the unique identifier) to be the first column. Duplicate this column
- Name first column - "Identifier"
- Name second column - "Resource"
- Paste Column Identifier into Text Editor, remove the .jpg from all the rows using search for ".jpg" and replace with no string, then paste back in the excel column. ( Note: is there an Excel solution for search and replace per column?? )
- Save as .csv file ( comma separated variable file )
Move the .csv file to the MDID server
Check there aren't any odd characters appearing if moved from Mac to PC by looking at the file in a text editor. Check again that the field names don't contain spaces or unusual characters. If they do replace with underscores using find and replace in a text editor.
- Warning: Field Names with unusual characters can't be supported by XML, so the helper application doesn't convert them and you'll find the field missing in MDID
Download the set of Curators tools from the MDID Source Forge site http://sourceforge.net/project/showfiles.php?group_id=115144
The steps below are clearly outlined in the essential document, The Curator's Workshop Handbook. The Curator's workshop manual is the key document that talks you through this process step by step. The notes below are brief comments that are relevant to the OxCLIC workflow. http://mdid.org/mdidwiki/images/b/bb/MDID2CuratorWorkshop.pdf
You could also get the tools from http://mdid.org, clicking on downloads and then clicking on "MDID2 package downloads at SourceForge.net" on the down load page. Both tools require Windows XP with .net installed
Use the "Convert.exe" XML generator tool to convert the .csv into the required XML
In the application, open the catalogue.csv file
- Choose the Identifier column
- Choose the Resource Column
- Under options - choose "Remove Resource Column from Output" - To save having an unneeded column in the XML file
Click - "Start Conversion"
You should now have an xml version called catalogue.xml
Setup the collection in MDID
Log into MDID
Go
Management > Collections > Create New Collection
Setup a new empty collection in MDID and
Go
Field Definitions > Import Data > catalogue.xml
This takes around 30 seconds for 350 records. Please wait while it imports the records.
Check to see the field names are the correct ones and that none are missing. If missing check that the original .csv file hasn't got spaces, brackets or commas in any field titles. A simple batch "find and replace" of these characters in a text editor can help tidy up the column field names. Once the cataogue information is imported you can search and broswe the information in the online application, it is not necessary to import the images immediately as MDID recognises that the images are not available.
Tip: Tidy the field name order to be the most suitable for presentation on screen, the most important field first, perhaps hiding the more obscure field names from browses and searches. It is also possible to map these fields to Dublin Core Elements.
From the Curators Workshop Manual:
"DC Element: The Dublin core element, which is used to map fields in different collections to each other. In order to perform cross collection searches, all involved collections must have Dublin core elements set for at least some of their fields. This property is optional. Multiple fields can have the same Dublin core element entry. "
In MDID as collection administrator to modify field definitions so they .-
Management > Collections > Manage Collection X > Field Definitions > Modify
- You can rearrange the field names so that the most important ones are listed first.
Management > Collections > Manage Collection X > Field Definitions > Move
Loading the images into MDID
Run the Imagemanager.exe application
It should say all images are missing !
Select All - Browse to the folder where the images are located
Select the records you want to assign to images and import. Repeat with folders of images until all are imported.
- Warning: MDID cannot see into multi-level folders so each folder has to be imported one at a time.
Check that the images are now available to be broswed in the online MDID application.
Customising the collection
Setup the fields in the right order via collection > field order
Hide some obscure fields so they don't appear in search screens etc collection > fields >modify
- Title field willnot be filled - not sure why ?? or where to specify this