Extract Technical Metadata on a Per-File Basis
- Status: drafted
- Decider(s):
- Andrew Berger
- Vivian Wong
- Infrastructure Team
- Justin Coyne
- Mike Giarlo
- Peter Mangiafico
- Jeremy Nelson
- Justin Littman
- Naomi Dushay
- John Martin
- Aaron Collier
- Date(s):
- drafted: 2019-10-29
- …
Context and Problem Statement
Currently, we extract technical metadata per-object and run one extraction job serially per-file. This takes a problematically long time for objects with many files; blocks other objects from accessioning; and complicates restarts which must begin again and process the entire object.
NOTE: Needs discussion: Fedora 3 does not support concurrent writes on the same datastream so we can either split out filesets as a first-class objects in the F3 data model or use temporary caching to generate a consolidated techMD datastream.
Decision Drivers
- Blocker for Google Books project
- Slows down accessioning process
Considered Options
- Do nothing
- Extract metadata on a per-file basis rather than on a per-object basis to benefit from parallelism
Decision Outcome
TBD!
Positive Consequences
- [e.g., improvement of quality attribute satisfaction, follow-up decisions required, …]
- …
Negative Consequences
- [e.g., compromising quality attribute, follow-up decisions required, …]
- …
Pros and Cons of the Options
[option 1]
[example | description | pointer to more information | …] |
- Good, because [argument a]
- Good, because [argument b]
- Bad, because [argument c]
- …
[option 2]
[example | description | pointer to more information | …] |
- Good, because [argument a]
- Good, because [argument b]
- Bad, because [argument c]
- …
[option 3]
[example | description | pointer to more information | …] |
- Good, because [argument a]
- Good, because [argument b]
- Bad, because [argument c]
- …