Parser and validator library for BioImage.IO specifications
Project description
Specifications for BioImage.IO
This repository contains specifications defined by the BioImage.IO community. These specifications are used for defining fields in YAML files which we called Resource Description Files
or RDF
. The RDFs can be downloaded or uploaded to the bioimage.io website, produced or consumed by BioImage.IO-compatible consumers(e.g. image analysis software or other website). Currently we defined two types of RDFs: a dedicated RDF specification for AI models (i.e. model RDF
) and a general RDF specification. The model RDF is a RDF with additional fields that specifically designed for describing AI models.
All the BioImage.IO-compatible RDF must fulfill the following rules:
- Must be a YAML file encoded as UTF-8; If yaml syntax version is not specified to be 1.1 in the first line by
% YAML 1.1
it must be equivalent in yaml 1.1 and yaml 1.2. For differences see https://yaml.readthedocs.io/en/latest/pyyaml.html#differences-with-pyyaml. - The RDF file extension must be
.yaml
(not.yml
) - The RDF file can be saved in a folder (or virtual folder) or in a zip package, the following additional rules must apply:
- When stored in a local file system folder, github repo, zenodo deposition, blob storage virtual folder or similar kind, the RDF file name should match the pattern of
*.rdf.yaml
, for examplemy-model.rdf.yaml
. - When the RDF file and other files are zipped into a RDF package, it must be named as
rdf.yaml
.
- When stored in a local file system folder, github repo, zenodo deposition, blob storage virtual folder or similar kind, the RDF file name should match the pattern of
As a general guideline, please follow the model RDF spec to describe AI models and use the general RDF spec for other resource types including dataset
, application
. You will find more details about these two specifications in the following sections. Please also note that the best way to check whether your RDF file is BioImage.IO-compliant is to run the BioImage.IO Validator against it.
Resource Description File Specification
A BioImage.IO-compatible Resource Description File (RDF) is a YAML file with a set of specifically defined fields.
You can find detailed field definitions here:
The specifications are also available as json schemas:
Here you can find some examples for using RDF to describe applications, notebooks, datasets etc.
Model Resource Description File Specification
Besides the general RDF spec, the Model Resource Description File Specification
(model RDF
) defines a file format for representing pretrained AI models in YAML format. This format is used to describe models hosted on the BioImage.IO model repository site.
Here is a list of model RDF Examples:
Collection Resource Description File Specification
Another specialized RDF spec, the Collection Resource Description File Specification
(collection RDF
) defines a file format for representing collections of resources for the BioImage.IO website.
Linking resource items
You can create links to connect resource items by adding another resource item id to the links
field. For example, if you want to associate an applicaiton with a model, you can set the links field of the models like the following:
application:
- id: HPA-Classification
source: https://raw.githubusercontent.com/bioimage-io/tfjs-bioimage-io/master/apps/HPA-Classification.imjoy.html
model:
- id: HPAShuffleNetV2
source: https://raw.githubusercontent.com/bioimage-io/tfjs-bioimage-io/master/models/HPAShuffleNetV2/HPAShuffleNetV2.model.yaml
links:
- HPA-Classification
Hosting RDFs
You can host the resource description file on one of the public git repository website, including Zenodo Github, Gitlab, Bitbucket, or Gist. In order to make it available in https://bioimage.io, you can submit the RDF package via the uploader.
Recommendations
- For AI models, consider using the model-specific spec (i.e. model RDF) instead of the general RDF. Only fallback to the general RDF if writing model specific RDF is not possible for some reason.
- The RDF or package file name should not contain spaces or special characters, it should be concise, descriptive, in kebab case or camel case.
- Due to the limitations of storage services such as Zenodo, which does not support subfolders, it is recommended to place other files in the same directory level of the RDF file and try to avoid using subdirectories.
- Use the bioimage.io spec validator to verify your YAML file
- Store the yaml file in a version controlled Git repository (e.g. Github or Gitlab)
- Use or upgrade to the latest format version
bioimageio command-line interface (CLI)
The BioImage.IO command line tool makes it easy to work with BioImage.IO RDFs. A basic version of it, documented here, is provided by the bioimageio.spec package, which is extended by the bioimageio.core package.
validate
It is recommended to use this validator to verify your models when you write it manually or develop tools for generating RDF files.
Use the validate
command to check for formatting errors like missing or invalid values:
bioimageio validate <MY-MODEL-SOURCE>
<MY-MODEL-SOURCE>
may be a local RDF yaml "<MY-MODEL>/rdf.yaml
" or a DOI / URL to a zenodo record, or a URL to an rdf.yaml file.
To see if your model is compatible to the latest bioimage.io model format use the spec validator with the --update-format
flag:
bioimageio validate --update-format `<MY-MODEL-SOURCE>`
The output of the validate
command will indicate missing or invalid fields in the model file. For example, if the field timestamp
was missing it would print the following:
{'timestamp': ['Missing data for required field.']}
or if the field test_inputs
does not contain a list, it would print:
{'test_inputs': ['Not a valid list.']}.
update-format
Similar to the validate
command with --update-format
flag the update-format
command attempts to convert an RDF
to the latest applicable format version, but saves the result in a file for further manual editing:
bioimageio update-format <MY-MODEL-SOURCE> <OUTPUT-PATH>
bioimageio.spec Python package
The bioimageio.spec package allows to work with BioImage.IO RDFs within Python. The commands on which the bioimageio CLI is based can be used as functions. Additionally, IO functions are provided to work with BioImage.IO RDFs as 'raw nodes' Python objects, e.g. the raw representation of a model RDF 0.4 at bioimageio.spec.model.v0_4.raw_nodes. The bioimageio.core package extends this 'raw' representation for more convenience.
installation
bioimageio.spec can be installed with either pip
or conda
:
# pip
pip install -U bioimageio.spec
# conda
conda install -c conda-forge bioimageio.spec
As a dependency it is included in bioimageio.core library, which extends bioimageio.spec with more powerful commands like 'predict'.
Environment variables
Name | Default | Description |
---|---|---|
BIOIMAGEIO_USE_CACHE | "true" | Enables simple URL to file cache. possible, case-insensitive, positive values are: "true", "yes", "1". Any other value is interpreted as "false" |
BIOIMAGEIO_CACHE_PATH | generated tmp folder | File path for simple URL to file cache; changes of URL source are not detected. |
BIOIMAGEIO_CACHE_WARNINGS_LIMIT | "3" | Maximum number of warnings generated for simple cache hits. |
Changelog
bioimageio.spec 0.5.4.post16
- fix rdf_update of entries in
resolve_collection_entries()
bioimageio.spec 0.5.4.post15
- pass root to
enrich_partial_rdf
arg ofresolve_collection_entries()
bioimageio.spec 0.5.4.post14
- keep
ResourceDescrption.root_path
as URI for remote resources. This fixes the collection RDF as the collection entries are resolved after the collection RDF has been loaded.
bioimageio.spec 0.5.4.post13
- new bioimageio.spec.partner module adding validate-partner-collection command if optional 'lxml' dependency is available
bioimageio.spec 0.5.4.post12
- new env var
BIOIMAGEIO_CACHE_WARNINGS_LIMIT
(default: 3) to avoid spam from cache hit warnings - more robust conversion of ImportableSourceFile for absolute paths to relative paths (don't fail on non-path source file)
bioimageio.spec 0.5.4.post11
- resolve symlinks when transforming absolute to relative paths during serialization; see #438
bioimageio.spec 0.5.4.post10
- fix loading of collection RDF with id (id used to be ignored)
bioimageio.spec 0.5.4.post9
- support loading bioimageio resources by their animal nickname (currently only models have nicknames).
bioimageio.spec 0.5.4.post8
- any field previously expecting a local relative path is now also accepting an absolute path
- load_raw_resource_description returns a raw resource description which has no relative paths (any relative paths are converted to absolute paths).
bioimageio.spec 0.4.4post7
- add command
commands.update_rdf()
/update-rdf
(cli)
bioimageio.spec 0.4.4post2
- fix unresolved ImportableSourceFile
bioimageio.spec 0.4.4post1
- fix collection RDF conversion for type field
bioimageio.spec 0.4.3post1
- fix to shape validation for model RDF 0.4: output shape now needs to be bigger than halo
- moved objects from bioimageio.spec.shared.utils to bioimageio.spec.shared[.node_transformer]
- additional keys to validation summary: bioimageio_spec_version, status
bioimageio.spec 0.4.2post4
- fixes to general RDF:
- ignore value of field
root_path
if present in yaml. This field is used internally and always present in RDF nodes.
- ignore value of field
bioimageio.spec 0.4.1.post5
- fixes to collection RDF:
- RDFs specified directly in collection RDF are validated correctly even if their source field does not point to an RDF.
- nesting of collection RDF allowed
bioimageio.spec 0.4.1.post4
- fixed missing field
icon
in general RDF's raw node - fixes to collection RDF:
- RDFs specified directly in collection RDF are validated correctly
- no nesting of collection RDF allowed for now
links
is no longer an explicit collection entry field ("moved" to unknown)
bioimageio.spec 0.4.1.post0
- new model spec 0.3.5 and 0.4.1
bioimageio.spec 0.4.0.post3
load_raw_resource_description
no longer acceptsupdate_to_current_format
kwarg (useupdate_to_format
instead)
bioimageio.spec 0.4.0.post2
load_raw_resource_description
acceptsupdate_to_format
kwarg
RDF Format Versions
model RDF 0.4.5
- Breaking changes that are fully auto-convertible
parent
field changed to hold a string that is a BioImage.IO ID, a URL or a local relative path (and not subfieldsuri
andsha256
)
model RDF 0.4.4
- Non-breaking changes
- new optional field
training_data
- new optional field
dataset RDF 0.2.2
- Non-breaking changes
- explicitly define and document dataset RDF (for now, clone of general RDF with type="dataset")
model RDF 0.4.3
- Non-breaking changes
- add optional field
download_url
- add optional field
dependencies
to all weight formats (not only pytorch_state_dict) - add optional
pytorch_version
to the pytorch_state_dict and torchscript weight formats
- add optional field
model RDF 0.4.2
- Bug fixes:
- in a
pytorch_state_dict
weight entryarchitecture
is no longer optional.
- in a
collection RDF 0.2.2
- Non-breaking changes
- make
authors
,cite
,documentation
andtags
optional
- make
- Breaking changes that are fully auto-convertible
- Simplifies collection RDF 0.2.1 by merging resource type fields together to a
collection
field, holindg a list of all resources in the specified collection.
- Simplifies collection RDF 0.2.1 by merging resource type fields together to a
(general) RDF 0.2.2 / model RDF 0.3.6 / model RDF 0.4.2
- Non-breaking changes
rdf_source
new optional fieldid
new optional field
collection RDF 0.2.1
- First official release, extends general RDF with fields
application
,model
,dataset
,notebook
and (nested)collection
, which hold lists linking to respective resources.
(general) RDF 0.2.1
- Non-breaking changes
- add optional
email
andgithub_user
fields to entries inauthors
- add optional
maintainers
field (entries like inauthors
butgithub_user
is required (andname
is not))
- add optional
model RDF 0.4.1
- Breaking changes that are fully auto-convertible
- moved field
dependencies
toweights:pytorch_state_dict:dependencies
- moved field
- Non-breaking changes
documentation
field accepts URLs as well
model RDF 0.3.5
- Non-breaking changes
documentation
field accepts URLs as well
model RDF 0.4.0
- Breaking changes
- model inputs and outputs may not use duplicated names.
- model field
sha256
is required ifpytorch_state_dict
weights are defined. and is now moved to thepytroch_state_dict
entry asarchitecture_sha256
.
- Breaking changes that are fully auto-convertible
- model fields language and framework are removed.
- model field
source
is renamedarchitecture
and is moved together withkwargs
to thepytorch_state_dict
weights entry (if it exists, otherwise they are removed). - the weight format
pytorch_script
was renamed totorchscript
.
- Other changes
- model inputs (like outputs) may be defined by
scale
ing andoffset
ing areference_tensor
- a
maintainers
field was added to the model RDF. - the entries in the
authors
field may now additionally containemail
orgithub_user
. - the summary returned by the
validate
command now also contains a list of warnings. - an
update_format
command was added to aid with updating older RDFs by applying auto-conversion.
- model inputs (like outputs) may be defined by
model RDF 0.3.4
- Non-breaking changes
- Add optional parameter
eps
toscale_range
postprocessing.
- Add optional parameter
model RDF 0.3.3
- Breaking changes that are fully auto-convertible
reference_input
for implicit output tensor shape was renamed toreference_tensor
model RDF 0.3.2
- Breaking changes
- The RDF file name in a package should be
rdf.yaml
for all the RDF (notmodel.yaml
); - Change
authors
andpackaged_by
fields from List[str] to List[Author] with Author consisting of a dictionary{name: '<Full name>', affiliation: '<Affiliation>', orcid: 'optional orcid id'}
; - Add a mandatory
type
field to comply with the general RDF. Only valid value is 'model' for model RDF; - Only allow
license
identifier from the SPDX license list;
- The RDF file name in a package should be
- Other changes
- Add optional
version
field (default 0.1.0) to keep track of model changes; - Allow the values in the
attachments
list to be any values besides URI;
- Add optional
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for bioimageio.spec-0.4.5.post16.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | cd2c85e3cd6d84ad4e01a63e0f24f72fea57e8da5e6ba989617986a653aa7c6f |
|
MD5 | 5e0952ce3d2c941dc889356291f5271e |
|
BLAKE2b-256 | d99bb2f432df1fb3fa5f3eacb1658e2349387c630ce83b380f99fc8b89bed681 |
Hashes for bioimageio.spec-0.4.5.post16-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ec47a9dba45f8496ff37b5f645b720a29c6cd08af8a8023f4d7aadfc907b9813 |
|
MD5 | 62a09f3beb9e7b3fc69b32d4857e5d75 |
|
BLAKE2b-256 | 5addb9d1355610a76bb1b87ae50e50fbac91e2d98890ebe4b5654c4ebde6876b |