Research by Subject: E-book &amp; Streaming Media Management: Metadata & File Requirements

Data & File Requirements

There are some requirements for data sync bibliographic collection data.

1. Leader and directory

Leader offset	Leader element in MARC bibliographic format	Valid values in MARC bibliographic format
00-04	Record length	Computer-generated, five-character number equal to the length of the entire record, including itself and the record terminator. The number is right justified and each unused position contains a zero.
5	Record status	a, c, d, n, p
6	Type of record	a, c, d, e, f, g, i, j, k, m, o, p, r, t
7	Bibliographic level	a, b, c, d, i, m, s
8	Type of control	blank space, a
9	Character coding scheme	blank space, a
10	Indicator count	2
11	Subfield code length	2
12-16	Base address of data	Computer-generated, five-character numeric string that indicates the first character position of the first variable control field in a record. The number is right justified and each unused position contains a zero.
17	Encoding level	blank space, 1, 2, 3, 4, 5, 7, 8, u, z
18	Descriptive cataloging form	blank space, a, c, i, n, u
19	Multipart resource record level	blank space, a, b, c
20	Length of the length-of-field portion	4
21	Length of the starting-character-position portion	5
22	Length of the implementation-defined portion	0
23	Undefined	0

2. Fields and subfields

Field/Subfield	Name	Requirement
008/15-17	Country codes	Ensure Country code (008/15-17) is not blank.
008/35-37	Language codes	Codes should all be in lowercase.
035	System Control Number	Required if available. If available, include an OCLC control number, with valid prefix, in every record.
040$b	Cataloging Source: Language of cataloging	Include a language code if any cataloging data is in a language other than English. If this is not coded, our system will assume the item is cataloged in English.
040$e	Cataloging Source: Description conventions	Include a cataloging description MARC code for rare and archival materials only.
066	Character Sets Present	Where this field exists, include 880 fields.
245	Title Statement	This tag is mandatory. Include the title proper.
5xx	Note Fields	Use UTF-8 Unicode or MARC-8 character encoding.
6XX	Subject Fields	common error: 6xx 2nd indicator 4 Source not specified — the formulation of the subject added entry conforms to a controlled list, but the source cannot be specified by one of the thesaurus or subject heading systems covered by the other 2nd indicator values or by a code for a specific subject heading list in $2.
6XX	Subject Fields	common error: 6xx 2nd indicator 7 plus $2 Source is specified in $2 — Subject headings or terms are based on other subject authorities (i.e. on authorities other than those listed here). Identify the source $2.
880	Alternate Graphic Representation	Where this field exists, include field 066.

3. Local system number

001 field of the record should be MMS ID.

4. File name

The file name should start with the 7-digit collection ID (required), followed by 'CDS' (required), date of the file was created or description, separated by period, and the file type (e.g. '.mrc' and 'marcxml'). (Collection ID is only available after the data sync collection is created.)

Export Data from Alma

Since MMS ID is needed for data sync collection, we need to export the MARC records from Alma.

Create a set of ebook records (e titles):
- Using advanced search, search e-titles using the collection name or other criteria
- Select the records you want to export or click on Select All, then click on Save and Filter Query. Save it to a set.
Export the set:
- Go to Admin > Run a Job, find Export Bibliographic Records, select Next on the top right
- Find the set you created, select Next on the top right
- Change format from XML to Binary
- Click on Next and then Submit
- Go to Admin Jobs > Monitor Jobs, click on the job you just run
- The job report includes a link to the MARC records of the set. Click the link, it will start to downloading the MARC file
Rename the MARC file if collection ID is available

Evaluate MARC Data

Often, the WCM selected records should meet the following requirements. If not, before uploading a collection to WCM, we need to make sure the collection meets the above requirements. There are three steps for data evaluation that you may follow:

1. Identify non UTF8 field(s)

Using MarcEdit: MARCValidator to identify non utf-8 fields

If non-utf8 fields are found, you could use the editing tools in MarcEdit to delete fields/subfields/records
- Tutorials can be found in Terry's Worklog

2. Using MarcEdit evaluate some of the fields and subfields requirements

Open MarcEdit, click on Tools > Export > Export Tab Delimited Records on the top

Select MARC File and Save File, select 'Tab' as delimiter, uncheck Normalized Data (because we want to keep indicators), add the following fields and subfields:
- LOR; 008; 035$a; 040$b; 040$e; 066; 245; 5xx; 6xx; 880

Click Process, MarcEdit will generate a TXT file with the name and location you defined in the former step.
If you are interested in the result, we suggest you use Google Spreadsheet to open it as Excel might not be able to identify
- Click import > Upload, select the TXT file, select Tac as seperator.

3. Run a Python script

The Python script uses the TXT file exported from MarcEdit as the input. Download the Python script into the folder as the TXT file, open terminal, type 'python data_sync_validator.py <filename of the TXT file>' (e.g. python data_sync_validator.py report.txt). If there's no problematic records in the TXT file, the script will print 'All records are valid!' in the terminal. If not, it will generate a report and a CSV file of all problematic records in the same folder.

E-book & Streaming Media Management

Library Hours

e-Resource Problem Report