OxCLIC MDIDimporting

<<TableOfContents(2)>>

= Importing Images and the associated Portfolio catalogue into MDID as a new collection =

== Export from Portfolio the catalogue information as catalogue.txt ==

File > Export



== Importing a catalogue from portfolio to MDID ...more fun and games ==
In Portfolio you should select the records for which you would like to export the metadata. Choose File -> Export Field Values.
You will then get to choose which field values you would like to export to a tab-separated text file.

You will need to open the file up and convert it to CSV. This is most conveniently done using a spreadsheet such as excel.

Then you will need to manipulate the first row of the file, which contains the headings from the Portfolio catalogue.
You will need to replace certain characters (in the first line only) for the inport filter to be able to proceed (see Section 5.2 above ).
 * spaces with an underscore _
 * colons : with an underscore _
 * semicolons : with an underscore _
 * parentheses : with an underscore _
 * ampersands & with an underscore _

If you do not remove these characters, any column that contains such a character will be discarded silently by the convert.exe program, and will not be imported into MDID. It is arduous to add the data in later, so you must get this operation correct right now.

The file: [MDID2 Curator's Workshop .pdf http://mdid.org/mdidwiki/images/b/bb/MDID2CuratorWorkshop.pdf] gives instructions from this point onwards, and is quite good.

== Checking in Excel the .csv file ==

 The source file must follow strict guidelines; otherwise the conversion will fail or result in a meaningless XML file. The first row in a spreadsheet or first line in a CSV file must contain the column headings.  

Identifier Field
 *One of the columns must contain values that can be used as unique identifiers for 
the records. These values will be used to match input records to existing records 
in the collection. This column is referred to as the Identifier Field. 


Resource Field
 *One of the columns must contain values that represent the image resources, 
usually the file names of the associated image files. This column is referred to as 
the Resource Field. 

All other columns must match exactly one field in the target collection.Not all collection fields need to have a column in the spreadsheet or CSV file, but all columns in the spreadsheet or CSV file must match a field in the collection. In general, each row in the spreadsheet or line in the CSV file represents one data record.  The only exception is for fields that have multiple values, in which case the following rows or lines can contain additional values, as long as the Identifier Field and Resource Field for these additional rows are blank. 

== Convert the .csv to XML using helper tool ==

Locate the program convert.exe, which comes with the MDID2 Tools download (avaliable from sourceforge with the MDID2 server software).
Run the program (does not seem to run over remote desktop on Windows server 2003) and point it at the csv file that you have just created. Apparently it is possible to work from an Excel worksheet instead of a CSV file, but I have not tested how well this works.

There are a number of settings that you need to make before you generate the XML file.

Choose the Identifier field (this should be the accession number e.g. arth_aa1994) and Resource field (this should be the filename e.g. arth_aa1994.JPG ) from the drop down menus.

Under Options you should choose to split field values at newline characters rather than semicolons.

Click Start Conversion and you will be prompted for the name and destination of the XML file.

== Create in MDID an empty collection ==

The next step is to log into the MDID2 application and create a new collection to hold the images and metadata.
Fill out the name, Description and image path (which you will presumably have to know by this stage)
Then click on Field Definitions at the top of the page.
Choose to get your field definitions from the XML file you have just created, and navigate to locate it.


Next you choose Import Data

Choose your XML file again, and click the import data button.



The page will refresh, showing you the progress it has made in importing your data.


{{{Import Status

The last data import for this collection successfully finished.
1107 of 1107 records processed:
1107 records added
0 records replaced
0 records already existed and were skipped
0 records did not have an identifier and were skipped
}}}
Assuming this completes satisfactorily you now need to deal with the images in your collection.

== Import images into MDID using Imagemanager tool ==

On the MDID server, run Imagemanager. 

Connect to the server on the non-webauthed port (8081 in our case) e.g. 
{{{
http://oxclic.oucs.ox.ac.uk:8081/
}}}

 It will need the username and password of an admin user.

Select the collection you have just created (or possibly the collection for which you want to update images).

Missing images are shown in red in the image manager. Initially you should click 'Select Problems'

Then you need to point the program at a single directory that contains the images associated with the collection. This is done by Selecting 'Collection' -> Assign Local resources and navigating to locate the image directory. Next you press the Action Upload button, then Start.
You can either repeat this process for individual folders of images, or bung all your images into a single folder.

The program will import the images to the MDID2 Application.


Check that the images are now available to be browsed in the online MDID application. 

Check all images are there and all metadata fields are there. If fields are missing then it is likely that the field name contains XML unfriendly characters.

== Customising the collection in MDID ==

 * Setup the fields for presentation in the right order via 

Collection > Field order

 * Hide some obscure fields so they don't appear in search screens etc  

Collection > fields > modify

Tip: Tidy the field name order to be the most suitable for presentation on screen, the most important field first, perhaps hiding the more obscure field names from browses and searches. 
In MDID as collection administrator  to modify field definitions so they .- 

Management > Collections > Manage Collection X  >  Field Definitions > Modify 

 You can rearrange the field names so that the most important ones are listed first.

Management > Collections > Manage Collection X  >  Field Definitions > Move

For compatibility with MDID's default settings you should mark one of your fields to be a 'Title' field.
You do this here:

Management > Collections > Manage Collection X  >  Field Definitions > Modify > Label > Title


In order to facilitate cross searching you also need to make a field to be the dc:title field i.e. "Dublin Core Title" that appears in Lightbox views etc

Management > Collections > Manage Collection X  >  Field Definitions > Modify > DC Elements > Title

=== Mapping fields to Dublin Core for cross-searching ===

It is also possible to map these fields to Dublin Core Elements. 

From the Curators Workshop Manual:

"DC Element:  The Dublin core element, which is used to map fields in different 
collections to each other.  In order to perform cross collection searches, all 
involved collections must have Dublin core elements set for at least some of their 
fields.  This property is optional.  Multiple fields can have the same Dublin core 
element entry. "

Management > Collections > Manage Collection X  >  Field Definitions > Modify > DC Elements


= Draft early version of importing steps =

== Open catalogue.txt  in Excel ==

 WARNING !!  You cannot import field names with characters including spaces,brackets. Workaround is to rename at the Excel stage.

The Curators Workshop manual takes you through this process very thoroughly:

"The instructions in this section assume that your cataloging data is available in a 
Microsoft Excel spreadsheet, with the first row containing column headers and each row 
below representing an image record. 
 
=== Preparing a CSV file ===

The only format in which cataloging data can be imported into MDID2 is specially 
formatted XML.  Using the data conversion tool, Microsoft Excel spreadsheets and CSV 
(Comma Separated Value) files can be converted to correctly formatted XML files. 
 The source file must follow strict guidelines; otherwise the conversion will fail or result 
in a meaningless XML file: 
 *The first row in a spreadsheet or first line in a CSV file must contain the column 
headings.  The headings themselves must match the field names in the target 
collection exactly. 
 *One of the columns must contain values that can be used as unique identifiers for 
the records. These values will be used to match input records to existing records 
in the collection. This column is referred to as the Identifier Field. 
 *One of the columns must contain values that represent the image resources, 
usually the file names of the associated image files. This column is referred to as 
the Resource Field. 
 *All other columns must match exactly one field in the target collection.  Not all 
collection fields need to have a column in the spreadsheet or CSV file, but all 
columns in the spreadsheet or CSV file must match a field in the collection. 
 *In general, each row in the spreadsheet or line in the CSV file represents one data 
record.  The only exception is for fields that have multiple values, in which case 
the following rows or lines can contain additional values, as long as the Identifier 
Field and Resource Field for these additional rows are blank. 
 *The spreadsheet or CSV file must not contain any additional data to the right or 
below the data records.

== Dealing with special characters not supported in the field names ==


Portfolio has many default field names with spaces and it is common to use a colon in DC and VRA field names. It is necessary to modify all field names that contain spaces e.g. "Short Filename" with "Short_Filename". It is straightforward to open the catalogue.csv file in Excel and change any top row field name that contain spaces to have underscores instead.

MDID needs a field which is a unique identifier, called Identifier and a field that refers to the resource image name called Resource. It is sensible to have these two fields as the first two fields in the .csv file. As we have a unique identifier from the OxCLIC naming conventions in the form of the name of the image, this field can if duplicated to serve both purposes.

Steps to do this in Excel

 *Move image file name column ( which should be the unique identifier) to be the first column. Duplicate this column

 *Name first column - "Identifier" 
 *Name second column - "Resource"

 *Paste Column Identifier into Text Editor, remove the .jpg from all the rows using search  for ".jpg" and replace with no string, then paste back in the excel column. ( Note: is there an Excel solution for search and replace per column?? )

 *Save as .csv  file  ( comma separated variable file )

== Move the .csv file to the MDID server ==

Check there aren't any odd characters appearing if moved from Mac to PC by looking at the file in a text editor. Check again that the field names don't contain spaces or unusual characters. If they do replace with underscores using find and replace in a text editor. 

 Warning: Field Names with unusual characters can't be supported by XML, so the helper application doesn't convert them and you'll find the field missing in MDID

Download the set of Curators tools from the MDID Source Forge site
http://sourceforge.net/project/showfiles.php?group_id=115144

The steps below are clearly outlined in the essential document, The Curator's Workshop Handbook. The Curator's workshop manual is the key document that talks you through this process step by step. The notes below are brief comments that are relevant to the OxCLIC workflow.
http://mdid.org/mdidwiki/images/b/bb/MDID2CuratorWorkshop.pdf

''You could also get the tools from http://mdid.org, clicking on downloads  
and then clicking on "MDID2 package downloads at SourceForge.net" on  
the down load page.  Both tools require Windows XP with .net installed''

Use the "Convert.exe" XML generator tool to convert the .csv into the required XML

In the application, open the catalogue.csv file 

 *Choose the Identifier column

 *Choose the Resource Column

 *Under options - choose "Remove Resource Column from Output" - To save having an unneeded column in the XML file

Click - "Start Conversion"

You should now have an xml version called catalogue.xml

== Setup the collection in MDID ==

Log into MDID

Go

Management > Collections > Create New Collection

Setup a new empty collection in MDID and 

Go

Field Definitions > Import Data > catalogue.xml

This takes around 30 seconds for 350 records. Please wait while it imports the records.

Check to see the field names are the correct ones and that none are missing. If missing check that the original .csv file hasn't got spaces, brackets or commas in any field titles. A simple batch "find and replace" of these characters in a text editor can help tidy up the column field names. Once the cataogue information is imported you can search and broswe the information in the online application, it is not necessary to import the images immediately as MDID recognises that the images are not available. 

== Loading the images into MDID ==

Run the Imagemanager.exe application

It should say all images are missing !

Select All - Browse to the folder where the images are located

Select the records you want to assign to images and import. Repeat with folders of images until all are imported. 

  Warning: MDID cannot see into multi-level folders  so each folder has to be imported one at a time.