Eliminate MODS in Access Systems / Abstract out description
- Status: drafted without decision
- Decider(s):
- Infrastructure Team
- Access Team
- Andrew Berger
- Arcadia Falcone
- Date(s):
- Proposed: 2023-11-22
Context and Problem Statement
Currently, SDR descriptive metadata is represented as Cocina, from which MODS is generated during publishing based on a mapping from Cocina to MODS (contained in the cocina gem). Description is extracted from MODS and rendered by a number of access systems (including PURL, Exhibits), indexers (Searchworks and Earthworks) and gems (mods-display, stanford-mods). Further, the MODS is used to generate Dublin Core, which is used by IIIF, and descriptive metadata is extracted directly from Cocina in other systems (including dor-indexing-app).
Note that MODS fills an additional functions within SDR that are outside the scope of this ADR: For many items, the Cocina descriptive metadata is generated by mapping from MARC to MODS to Cocina.
The short-comings of the current approach are:
- Description extraction and munging code is duplicated across multiple codebases.
- Semantically similar description extraction is performed for both Cocina and MODS.
- The mapping between Cocina and MODS requires maintaining a hefty amount of complicated code.
- The combination of the mapping from Cocina to MODS and the dispersed MODS code make it difficult to determine how particular descriptive metadata is rendered and troubleshoot any problems (i.e., descriptive traceability).
- There may be some description use cases in which description cannot be mapped from Cocina to MODS. (At this point, this should be considered speculative, as there may be acceptable work-arounds.)
It is therefore proposed that:
- MODS should be deprecated as a descriptive metadata format used by SDR for rendering and indexing. This means that it should not be generated, used for description extraction, or displayed to users. (It will continue to be used for mapping from MARC and to Dublin Core.)
- A description abstraction gem should be created. This gem will be used by applications instead of directly extracting description from MODS or Cocina (or using a gem that does). Initially the gem will extract description from MODS and Cocina, but over time will replace all extraction from MODS with extraction from Cocina. The gem will not leak any of the underlying Cocina or MODS.
- A new mapping be created from Cocina to Dublic Core.
Decision Drivers
- Maintainable code.
- Traceability of descriptive rendering.
- Ability to handle all descriptive use cases (speculative; see above).
Considered Options
- The above proposed approach.
- Within access systems, use MODS exclusively for description extraction.
- Within access systems, use a mixture of Cocina and MODS for description extraction.
Decision Outcome
Chosen option: TBD
Pros and Cons of the Options
The above proposed approach
- Pro: Eliminates Cocina to MODS mapping code.
- Pro: Less indirection between descriptive metadata and rendering of that metadata.
- Pro: Possibly reduces redundant description extraction and munging.
- Pro: If changes are made to the modeling of description in Cocina or description more generally, the description abstraction gem might make the transition easier.
- Pro: Description abstraction gem will make it easier to determine which systems are using what descriptive metadata, reducing risk when making metadata changes.
- Pro: Eliminating the Cocina to MODS mapping may also simplification of Cocina description. Per Arcadia:
Some of descriptive Cocina’s complexity was due to the requirement for full MODS roundtripping in order to make sure no data was being lost. If the SDR metadata of record is Cocina and MODS roundtripping/full mapping is no longer necessary for our systems to use that metadata, that’s an opportunity for streamlining the model and reducing the diversity of data shapes.
- Con: Requires substantial work to implement and use description abstraction gem.
- Con: Eliminates a seam between infrastructure and access systems; changes to the Cocina data model will now affect access systems.
- Con: Risk that uses of description between systems are too dissimilar and the description abstraction gem will add additional complexity without commensurate benefit.
- Con: Does not eliminate MODS to Cocina mapping code.
Additional considerations:
- How will it be determined if cocina model changes break access systems?
- Should access systems use the cocina gem? Does this complicate the cocina gem versioning problem?
- How will cocina items cached in the access system be updated for cocina model updates?
- Is it feasible to create a new mapping to Dublic Core?
Use MODS exclusively for description extraction
- Pro: Maintains seam between infrastructure and access systems.
- Pro: Avoids semantically similar description extraction being performed for both Cocina and MODS.
- Pro: Does not require additional work.
- Pro: MODS is a documented standard, that will be recognizable by other parts of SUL and the larger digital library community.
- Con: Maintains current short-comings.
- Con: MODS is a standard, that will be difficult to change, and may need to be extended, thus defeating the purpose of it being a standard.
Use a mixture of Cocina and MODS for description extraction
- Pro: May support (speculative) description use cases in which description cannot be mapped from Cocina to MODS.
- Con: Inherits every short-coming and con listed above.
Links
- Spike for abstracting description extracting and munging in PURL: Note that in this spike, the abstraction layer remains in the PURL codebase instead of a separate gem.