There are some requirements for data sync bibliographic collection data.
Leader offset | Leader element in MARC bibliographic format | Valid values in MARC bibliographic format |
---|---|---|
00-04 | Record length | Computer-generated, five-character number equal to the length of the entire record, including itself and the record terminator. The number is right justified and each unused position contains a zero. |
5 | Record status | a, c, d, n, p |
6 | Type of record | a, c, d, e, f, g, i, j, k, m, o, p, r, t |
7 | Bibliographic level | a, b, c, d, i, m, s |
8 | Type of control | blank space, a |
9 | Character coding scheme | blank space, a |
10 | Indicator count | 2 |
11 | Subfield code length | 2 |
12-16 | Base address of data | Computer-generated, five-character numeric string that indicates the first character position of the first variable control field in a record. The number is right justified and each unused position contains a zero. |
17 | Encoding level | blank space, 1, 2, 3, 4, 5, 7, 8, u, z |
18 | Descriptive cataloging form | blank space, a, c, i, n, u |
19 | Multipart resource record level | blank space, a, b, c |
20 | Length of the length-of-field portion | 4 |
21 | Length of the starting-character-position portion | 5 |
22 | Length of the implementation-defined portion | 0 |
23 | Undefined | 0 |
Field/Subfield |
Name |
Requirement |
---|---|---|
008/15-17 | Country codes | Ensure Country code (008/15-17) is not blank. |
008/35-37 | Language codes | Codes should all be in lowercase. |
035 | System Control Number | Required if available. If available, include an OCLC control number, with valid prefix, in every record. |
040$b | Cataloging Source: Language of cataloging | Include a language code if any cataloging data is in a language other than English. If this is not coded, our system will assume the item is cataloged in English. |
040$e | Cataloging Source: Description conventions | Include a cataloging description MARC code for rare and archival materials only. |
066 | Character Sets Present | Where this field exists, include 880 fields. |
245 | Title Statement | This tag is mandatory. Include the title proper. |
5xx | Note Fields | Use UTF-8 Unicode or MARC-8 character encoding. |
6XX | Subject Fields | common error: 6xx 2nd indicator 4 Source not specified — the formulation of the subject added entry conforms to a controlled list, but the source cannot be specified by one of the thesaurus or subject heading systems covered by the other 2nd indicator values or by a code for a specific subject heading list in $2. |
common error: 6xx 2nd indicator 7 plus $2 Source is specified in $2 — Subject headings or terms are based on other subject authorities (i.e. on authorities other than those listed here). Identify the source $2. | ||
880 | Alternate Graphic Representation | Where this field exists, include field 066. |
001 field of the record should be MMS ID.
The file name should start with the 7-digit collection ID (required), followed by 'CDS' (required), date of the file was created or description, separated by period, and the file type (e.g. '.mrc' and 'marcxml'). (Collection ID is only available after the data sync collection is created.)
Since MMS ID is needed for data sync collection, we need to export the MARC records from Alma.
Often, the WCM selected records should meet the following requirements. If not, before uploading a collection to WCM, we need to make sure the collection meets the above requirements. There are three steps for data evaluation that you may follow:
The Python script uses the TXT file exported from MarcEdit as the input. Download the Python script into the folder as the TXT file, open terminal, type 'python data_sync_validator.py <filename of the TXT file>' (e.g. python data_sync_validator.py report.txt). If there's no problematic records in the TXT file, the script will print 'All records are valid!' in the terminal. If not, it will generate a report and a CSV file of all problematic records in the same folder.