Skip to Main Content

Metadata Management

This guide is about managing external data in Primo/Alma and other platforms.

Piping ETD Records into Primo

As different library departments tend to choose specialized software and platforms, information is separated. Instead of locating information at one stop, users are forced to visit multiple websites to look for library resources due to disparate systems. The electronic theses and dissertations (ETDs), an important part of the online scholarship for higher education, usually reside in institutional repositories (IR) and cannot be accessed via a library's discovery portal unless configuring special settings.

To achieve a unified platform for ETDs and other library collections, metadata librarians face several challenges. While Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) provides a low-barrier metadata harvesting method, some repositories do not support specified collection harvesting based on collection name. For academic libraries with large digital resources, it is challenging to customize the settings to collect only ETD metadata to meet their needs. In addition, since the discovery system and IR may use different metadata schemas, librarians may have metadata crosswalk challenges for ETD collection, including loss of metadata granularity and inconsistency of metadata between print copies of theses and dissertations and ETDs.

SDSU has identified a method to harvest only ETD collection and configured a PRIMO external resources import profile for ETD XML record import. We chose XML rather than DC because XML have more granular fields.

This guide includes the following sub-sections:

User Search Difficulties for ETD Collection

From 9/30/2020 to 9/23/2021, the Library received 68 patrons’ requests about SDSU Thesis and Dissertation Collection on LibChat.

Request

Count

Find (a) specific thesis/theses/dissertation(s)

35

Ask the publication timeline of thesis and dissertation 

10

Ask where or how to find a thesis/dissertation

9

Acquire a digital/print copy of a thesis/dissertation

10

Remove a thesis/dissertation from the Library website

2

Others

2

Table 1. Reasons of Requesting ETD Collection Materials

 

Request made by the author

49

Request not made by the author

19

Table 2. Patron’s Relation with the Author

 

Metadata elements that patrons used to search

Count

Note

Title

8

 

Author name

17

 

Year

11

 

Department name

12

Geology; ART; music; Physics; History; Philosophy;

Political Science; Chemistry; Biology

Program name

5

MPH (2); MBA (1); MPA (1); ECE (1)

Course name

1

BA765

Degree level

8

Doctoral; master; graduate

Topic

5

 

Table 3. Counts and Details of Metadata Elements that Patrons Use to Search

Based on the above data, SDSU decided to add department name, advisor, and program information into the ETD metadata

Harvest Data & Get Ready for Import

Piping ETD records from Islandora to Primo need to be performed monthly. It does not require much manual work.  All Python scripts that were developed for this project can be found in the GitHub Repository

Before running any scripts, one needs to maker sure there are four folders in the same directory as the python scripts: idfiles, single_xml, merged_pre_upload, and final_output.

  • idfiles: Have two TXT files, one is id_bf.txt (hold downloaded ids), the other one is id_new_date.txt (hold ids to be downloaded)
  • single_xml: Hold ETD XML records to be downloaded
  • merged_pre_upload: Included one merged XML that needs some manual check
  • final_output: Final output xml ready for ALMA import.

Two scripts are included in this folder.

  • HarvestFromIslandora.py: This file check for new ETD records in Islandora, download new records, and create one merged XML record in the merged_pre_upload folder. To run this script, run the following line in the command "python (path of this script) last_page_number_of_ETD_collection_in_Islandora full_path_of_the_folder_that_holds_the_four_folders_above date"

    • example: python "F:/123/456/789/HarvestFromIslandora.py" "535" "F:/123/456/789/" "01032023"

  • ChangeURI.py: After manual validating the merged XML file, update the Identifier[@type=url] element for each record, which will be the access method for users. To run this script, run the following line in the command "python (path of this script) full_path_of_the_merged_xml_in_the_merged_pre_upload_folder full_path_of_the_new_XML_(should be in the final_output folder)"

    • example: python "F:/123/456/789/ChangeURI.py" "F:/123/456/789/merged_pre_upload/output01032023.xml" "F:/123/456/789/final_output/final01042023.xml"

Manual validation needed:

After running HarvestFromIslandora.py, one needs to manually check the script generated file to remove/replace any incorrect elements. Usually, one may encounter the following errors:

  • <mods> element include unnecessary content: There should not be any attributes or namespaces in this element
  • extra space: "< " and " />" extra space after and before the starting and closing tags

An example of final output file can be viewed via this link.

Harvest Selected Records

If uploaded records have new changes, one may need to harvest those record using ID. To harvest a single record:

To harvest a list of record, create a TXT file, one row per ID like this file. Run the HarvestSelectedRecords.py script in this Github Repository by putting the following line in the command "python (path of this script) full_path_of_the_TXT_file_for_IDs full_path_of_the_new_XML_(should be in the final_output folder) date. e "F:/xxx/xxx/xxx/final_output/final20221128.xml 20230105"

  • For example: python [path to HarvestSelectedRecords.py] "F:/xxx/xxx/xxx/id.txt" "F:/xxx/xxx/xxx" "20230105"

Local Fields

Two local fields are created: department name and program information.

To create new local field,

  • Go to Alma Discovery > Manage Display and Local Fields > Add Field > Add Local Field
  • Pick a local field number between 1-50, add field name. Check Enable field for search if you want. Check Use the parallel Local Field 01/50 from the Dublin Core record

  • Go to Discovery > Configure Views > Edit the default view
  • Click the Full record services tab > click the ellipsis of the details, add new local field

If you want to add an indexed field:

  • Discovery > Display Configuration: Manage Display and Local Fields > Add Field > Add local field
  • Full Record Services tab > details ... > Configure  
  • Config > Discovery > Display Configuration: Hypertext Linking Definitions > Add Row > Display Field
  • Request re-index if you cannot wait till the next indexing time (Every six months, Ex Libris reindexes all inventory data in order to enhance Alma's search mechanism. The semi-annual reindexing generally starts in May and November and finishes in July and January (completion times can vary from institution to institution).

After creating new local fields, don't forget to add the fields to the test or the discovery view in production. To do that:

  • Go to Discover > Configure View> Find the view you want to use > Click the ellipsis on the right side > Click Edit
  • Click the Full Record Service Tab > Click the ellipsis of the row of details > Click Configure
  • Click Add Filed to add any fields you need or created. Adjust the order of the fields if needed. 

Norm Rules

Norm rules to change XML fields to DC fields and local fields. An example of discovery norm rule can be viewed in this Google Doc. After creating new norm rules for discovery, one needs to create a new process task for this norm rule.

  • Go to Configuration > Discovery > Loading External Data Sources > Normalization Process Task
  • Add process: The default setting for step 1 should be right and then add name and description for the new rule. Add new norm rule

External Resources Import Profile Configuration

Before create a new import profile for Primo VE, please follow the instruction in the link to create new search profiles.

After a new search profile created,  create a new import profile.

After creating the import profile, one can run a job to import new records by clicking the ellipsis and selecting run. One can also reload the records. Reload will re-run all the jobs in history. Be cautious when using the reload option. To delete records, please see this guide.