CADC Storage Inventory Architecture and Deployment

Introduction

Storage Inventory Terms and Concepts

artifact
A representation of a file and its metadata in the SI database, specifically the inventory.Artifact table (see the SI data model). The term artifact is often used to refer to both the database representation of a file and the file itself as one thing, but when an SI application is acting on an artifact, it specifically refers to the database.

bucket SI assigns a bucket label to files on storage and to artifacts in the inventory database.

namespace
A namespace is an SI identifier for a collection of artifacts, and can be used to define the logical structure of data within SI. For example, cadc:CFHT/ might be used to identify all CFHT files held at the CADC; cadc:CFHT/raw/ might be used to identify all of the raw CFHT files held at the CADC -- both cadc:CFHT/ and cadc:CFHT/raw/ are namespaces but they identify different scopes of CFHT artifacts. SI services and applications often act on namespaces defined using regex patterns -- such as during file replication and determining access permissions -- so some thought must be put into what is used. See the SI data model page for more detail on the concepts of URI and namespace.

resourceID
This is an unique ID for a deployed service. A registry service provides a look-up to translate these IDs into service URLs. Example: ivo://opencadc.org/minoc, which might resolve to https://www.opencadc.org/minoc.

Storage Inventory Resources

Container Image Repository

curl -s https://images.opencadc.org/api/v2.0/projects | jq '[.[].name] | sort'
curl -s https://images.opencadc.org/api/v2.0/projects/storage-inventory/repositories | jq '[.[].name] | sort'
curl -s https://images.opencadc.org/api/v2.0/projects/storage-inventory/repositories/minoc/artifacts | jq '[.[].tags | select (. != null) | .[].name] | sort'

Other Documents

Overview of Storage Inventory

System Architecture

Storage Inventory consists of the components that make up one or more Storage sites and a Global site. A Storage site can exist on its own, as a mechanism for maintaining a structured inventory of files. A Global site is required when there are two or more Storage sites which need to be synchronized; it also provides a single site for users to go to find all available copies of a file. A detailed description of the data model, features and limitations can be found here.

In general:

Standalone Storage Site

A Storage site maintains an inventory of the files stored at a particular location, and provides mechanisms to access (minoc) those files and query (luskan) the local inventory. Below is an outline of a stand-alone (no Global site) Storage Inventory Storage site, with one storage system, one database, etc, in one data centre. If you have files in multiple data centres, or more than one storage platform in one data centre (e.g. some files on a posix file-system and some on Ceph object storage), you would have more than one Storage site, and each site would run its own services, database, storage, and applications.

A standalone Storage Inventory Storage site will consist of following:

Global Site with Multiple Storage Sites

If you need to replicate files among multiple Storage Sites, you will need a Global Site. The Global site maintains a view of all Storage sites, allowing individual Storage sites to discover files that they need to copy. This also provides a single site which users can query to find files, rather than having to know about and search individual Storage sites.

A Global site will be required different services than a Storage site, and both Storage sites and Global sites will need to run additional applications to synchronize metadata and files.

Metadata synchronization

File synchronization

  1. User PUTs a file to the site1.minoc service, either directly or via negotiation with a global raven service.
  2. global.fenwick.site1 discovers the new inventory metadata for the file by querying site1.luskan.
  3. site2.fenwick.global discovers the new inventory metadata for the file by querying global.luskan.
  4. site2.critwall finds the locations of the new file via global.raven -- this returns a list of URLs from which the file can be downloaded.
  5. site2.critwall downloads the file from site1.minoc.

Client Software

Generic HTTP client tools such as curl or wget can be used to interact with the SI, however multi-step operations such as transfer negotiations or transfer of large files with SI transactions might require dedicated scripts. (TODO provide examples of usage)

Alternatively, the CADC maintains Python client applications/libraries that can be used with the SI:

The CADC Direct Data Service presents a variety of scenarios for accessing the CADC SI using generic and specific client tools.

Deployment Prerequisites

Hardware Requirements

Database:

Storage platform:

Worker nodes:

Software Requirements

Deployment

Note on logging: Storage inventory services and application containers all log to stdout by default -- for a production deployment, these should be captured and preserved by whatever mechanism is available on your system.

  1. Storage Adapters

    • Required for: Storage site only
    • Required by: critwall, minoc, and tantar
    • How to configure your storage is dependent upon your local hardware and data centre details. However --
      • Currently, there are two types of storage supported, each of which requires a storage adapter and associated configuration files:
        • POSIX file-system
          • for POSIX storage, the storage file-system will need to be mounted directly into the containers (e.g. a 'volume' path in Docker or a PVC in kubernetes). Since the storage will be mounted by several containers, it will need to be a shared file-system which supports writes from multiple hosts.

          • NOTE: the services in the containers run as a user with a UID:GID of 8675309:8675309. This user must be allowed to read and write files on the configured file-system. This is usually done by ensuring a non-privileged (or even 'nologin') user is configured on your system with this UID:GID.

          • in the cadc-storage-adapter-fs.properties configuration file for POSIX storage :

            • the org.opencadc.inventory.storage.fs.baseDir parameter must point to the location that the storage is mounted inside the container. For example,
              docker run --user tomcat:tomcat -v /path/on/host:/mountpoint/in/container minoc:0.9.2
              
              or
              apiVersion: app/v1
              kind: deployment
              <...snip....>
              volumeMounts:
                  - mountPath: "/mountpoint/in/container"
                  name: lustre-volume
              securityContext:
                  runAsUser: 8675309
                  runAsGroup: 8676309
              volumes:
                  - name: lustre-ceph-volume
                      hostPath:
                      path: /path/on/host
                      type: Directory
              
            • the org.opencadc.inventory.storage.fs.OpaqueFileSystemStorageAdapter.bucketLength sets the depth of the directory tree created to store files. At each node in the tree, 16 hex (0-f) directories are created -- a bucketLength of 2 will create 16 directories (0-f) each with sixteen subdirectories (0-f) -- only the 256 (16x16) subdirectories at the bottom of the tree will be used to store files. For efficiency of validation, you should choose a bucketLength which results in only a few thousand files in each directory. e.g, for a bucketLength=3 and baseDir = /mount/in/container:
              [container]$ ls -F /mount/in/container/
              0/ 1/ 2/ 3/ 4/ 5/ 6/ 7/ 8/ 9/ a/ b/ c/ d/ e/ f/         # Depth=1/3
              [container]$ ls -F /mount/in/container/a/
              0/ 1/ 2/ 3/ 4/ 5/ 6/ 7/ 8/ 9/ a/ b/ c/ d/ e/ f/         # Depth=2/3
              [container]$ ls -F /mount/in/container/a/7/
              0/ 1/ 2/ 3/ 4/ 5/ 6/ 7/ 8/ 9/ a/ b/ c/ d/ e/ f/         # Depth=3/3
              [container]$ ls -F /mount/in/container/a/7/4/
              test0001.fits test0002.fits test0003.fits test0004.fits test0005.fits
              
            • in the above example the 'bucket' is the directory path: a74, and there will be a total of 4096 (16x16x16) buckets.
            • Note: as you can see, the currently implemented version of this storage adapter is the 'OpaqueFileSystem' adapter -- the structure of subdirectories is not something which can easily be mounted and used elsewhere. It would be possible to develop a filesystem adapter which provides a human readable directory structure.
        • Swift Object Store API (e.g. CEPH Object Store)
          • for Swift storage, the Ceph gateway URL must be reachable from inside the containers.
          • in the cadc-storage-adapter-swift.properties configuration file for Swift storage:
            • org.opencadc.inventory.storage.swift.SwiftStorageAdapter.bucketLength sets the number of hex characters in the configured buckets (e.g. a74), and the total number of buckets (i.e. a bucketLength of 3 will create 16^3 (4096) buckets). Configure the bucketLength so the expected number of files per bucket is no more than a few thousand.
    • Whichever storage you use, it must be directly available to certain Storage Inventory services and applications. These are:
      • minoc -- this service will write files to and retrieve files from storage.
      • critwall -- only required in a Storage Inventory deployment with more than one Storage site and a Global site. This application will scan the site inventory database for artifacts with metadata but no associated local file copy in storage, then negotiate the transfer of those files with the Global site raven service.
      • tantar -- this application verifies the content of the storage system against the inventory database, so will need to read files from the storage.
  2. Database

    • Required for: Storage site and Global site
    • Required by: critwall, fenwick, luskan, minoc, ratik, raven, ringhold, tantar
    • The database will be accessed by all local SI site services and applications, so must be reachable from wherever you deploy your containers for that site.
    • Storage Inventory services have been tested with postgres 12.3. Newer versions will likely work as well.
    • As the content in the database grows, you'll need to think about its storage requirements. For the PG data and indices, this is roughly 1KB/artifact (storage site) or 1.5KB/artifact (global site)
    1. In the following, the database being created is called si_db, but you can change that name as you see fit. Whatever you choose, it will need to be referenced in the service and application configuration.

      • Initialize the database: initdb -D /var/lib/postgresql/data --encoding=UTF8 --lc-collate=C --lc-ctype=C
      • You might need to change the data location (-D), depending on your postgres installation and hardware layout.
    2. As the postgres user, create a file named si.dll with the linked content, edit as appropriate, and run psql -f si.dll -a

      • This will create three users:
        • A TAP admin user (e.g. tapadm) - privileged user. Manages the tap schema with permissions to create, alter, and drop tables. Used by:
          • luskan
        • A User TAP query user (e.g. tapuser) - unprivileged user. Used by the luskan service to query the inventory database. Used by:
          • luskan
          • raven
        • An Inventory admin user (e.g. invadm) - privileged user. Manages the inventory schema with privileges to create, alter, and drop tables, and is also used to insert, update, and delete rows in the inventory tables. Used by:
          • critwall
          • fenwick
          • minoc
          • ratik
          • ringhold
          • tantar
    • NOTE: The first service or application configured with the Inventory admin user to connect to the database will create and initialize the tables and indices using the above privileged user roles.
    • a basic example of a developer deployment of a compatible database can be found in here. (Except pgsphere is not required...)
  3. Proxy

    • Required for: Storage site and Global site
    • ❗x509 proxy certificates -- the longer certificate chain for these might not be supported by all balancers/proxies. These proxy certificates are required for some A&A mechanisms.
      • haproxy
        • haproxy will need to be compiled against openssl 1.0.2k or a compatible version. Newer versions of openssl do not support proxy certificates.
        • the environment variable OPENSSL_ALLOW_PROXY_CERTS=1 needs to be set in the proxy environment.
        • a basic example of a developer deployment of a compatible instance of haproxy can be found here and here.
      • nginx
        • untested, but likely has the same requirements and restrictions as haproxy for proxy certificates.
    • SSL termination -- although you will need to support https connections to your proxy, the SI containers do not accept https connections. Because of this, your proxy must terminate the SSL connection and pass only non-SSL http connections to the containers.
    • To make a service available under a different name or path than the default, complex proxy rules are not required: see war-rename.conf in the FAQ.
  4. Registry service

    • Required for: Storage site and Global site (Note: could be the same registry instance for both)
    • Container image: Use the latest core/reg image from images.opencadc.org or a different IVOA-compatible registry service.
    • See the opencadc registry server documentation for configuarion details.
    • resourceIDs
      • you will need to choose resourceIDs for services and resources that you deploy, and which need to be referenced by other services and applications. For example, if your minoc service is available at the URL https://www.example.org/minoc and you choose a resourceID of ivo://example.org/minoc, the registry config for that resource (in the reg-resource-caps.properties file for the registry service) would look like:
              ivo://example.org/minoc = https://www.example.org/minoc
      
      This resourceID will appear in, for example, the minoc.properties file in the minoc service config:
              org.opencadc.minoc.resourceID = ivo://example.org/minoc
      
    • test with, e.g., curl https://www.example.org/reg/resource-caps
  5. baldur - Permission service

    • Required for: Storage site and Global site (Note: would likely be the same baldur instance for both)
    • Container image: Use the latest storage-inventory/baldur image from images.opencadc.org
    • See the opencadc storage inventory baldur documentation for more configuration details.
    • Uses an IVOA compatible GMS service and configured namespaces to determine file access permissions.
    • Configuration notes:
      • baldur.properties:
        • The org.opencadc.baldur.allowedUser x509 DN specified here is generally a 'service' user -- the services that call baldur need to be configured with this user's certificate.
          • in minoc and raven coniguration, this is the cadcproxy.pem file for these services.
        • The org.opencadc.baldur.allowedGroup is an IVOA GMS group resourceID.
          • The GMS service must be registered in the available Registry service.
          • the configured readOnlyGroup and readWriteGroup entries are also IVOA GMS group resourceIDs.
    • test with, e.g., curl https://www.example.org/baldur/availability
  6. GMS - Group Membership service

  7. minoc - File service

    • Required for: Storage site only
    • Container image: Use the latest storage-inventory/minoc image from images.opencadc.org
    • See the opencadc storage inventory minoc documentation for details.
    • Configuation notes:
      • minoc.properties:
        • org.opencadc.minoc.resourceID:
          • this is the resourceID of this instance of minoc, and will need to be configured in your registry. It is used by the Global inventory as the location for artifacts at a site.
        • if you are using a baldur service to manage file access permissions, you would put its resourceID in org.opencadc.minoc.readGrantProvider and org.opencadc.minoc.readGrantProvider. It is possible to have multiple instances of these providers, by specifing the GrantProvider options for each provider (each use of the option is additive to the previous ones).
        • org.opencadc.minoc.publicKeyFile:
          • (optional) this is the public key specified in the raven configuration key org.opencadc.raven.publicKeyFile.
      • catalina.properties (from cadc-tomcat config):
        • the org.opencadc.minoc.inventory.username database account is the 'Inventory admin user' configured when creating the database
    • when configuring the storage adapter for minoc to use (see Storage above) be sure to test that the containers deployed on your system can access the provided storage.
    • test with, e.g., curl https://www.example.org/minoc/availability
  8. luskan - Query service

    • Required for: Storage site and Global site
    • Container image: Use the latest storage-inventory/luskan image from images.opencadc.org
    • See the opencadc storage inventory luskan documentation for details.
    • Configuration notes:
      • luskan.properties:
        • org.opencadc.luskan.isStorageSite - for a storage site, this should be set to true. The content of the inventory database is different between a storage site and a global site.
        • org.opencadc.luskan.allowedGroup is an IVOA GMS group resourceID.
          • the GMS service must be configured in the available Registry service.
      • catalina.properties:
        • the org.opencadc.luskan.uws.username database account is generally the same as the 'TAP admin user' configured when creating the database.
        • the org.opencadc.luskan.tapadm.username database account is the same 'TAP admin user'.
        • the org.opencadc.luskan.query.username database account is the 'TAP query user' account.
      • cadc-tap-tmp.properties:
        • see the cadc-tap-tmp library documentation for more information.
        • org.opencadc.tap.tmp.TempStorageManager.baseURL is the URL for this luskan service, plus a path where query results can be retrieved from.
          • e.g. if your luskan service is at https://www.example.org/luskan, then this baseURL could be https://www.example.org/luskan/results
        • the above /results path will be mapped to the path in the container specified by org.opencadc.tap.tmp.TempStorageManager.baseStorageDir. Ideally, this path will be a file-system that is shared among all luskan instances for your site.
          • e.g. if baseStorageDir = /tmpdata in your configuration, the luskan will store query results here (e.g. /tmpdata/xyz.xml) and that result will be retrievable as https://www.example.org/luskan/results/xyz.xml.
    • test with, e.g., curl https://www.example.org/luskan/availability
  9. raven - File location service

    • Required for: Global site only
    • Container image: Use the latest storage-inventory/raven image from images.opencadc.org
    • See the opencadc storage inventory raven documentation for details.
    • Configuration notes:
      • raven.properties
        • (optional) org.opencadc.raven.publicKeyFile and org.opencadc.raven.privateKeyFile:
          • These are optional optimizations needed so that raven can generate 'pre-authorized' URLs for files, allowing the minocs that serve the file to skip this step before delivering the file. The authentication information is embedded in a specially encoded URL.

          • these are RSA public and private key files which can be generated using cadc-keygen or the commands below:

              ssh-keygen -b 2048 -t rsa -m pkcs8 -f temp_rsa
              ssh-keygen -e -m pkcs8 -f temp_rsa.pub > raven-public.key
              mv temp_rsa raven-private.key
              rm temp_rsa.pub
            
          • the publicKeyFile will be required by services which need to verify the pre-authorized URLs (minoc).

    • See the opencadc storage inventory raven documentation for more configuration details.
    • test with, e.g., curl https://www.example.org/raven/availability
  10. fenwick - Metadata sync application

    • Required for: Storage site and Global site
    • Container image: Use the latest storage-inventory/fenwick image from images.opencadc.org
    • See the opencadc storage inventory fenwick documentation for details.
    • Configuration notes:
      • fenwick.properites:
        • org.opencadc.fenwick.queryService:
          • fenwick is used to synchronise artifact metadata between a Storage site and a Global site. The queryService is the resourceID for the remote luskan service -- ie. if fenwick is running at a Storage site, queryService should refer to the remote Global site luskan; if fenwick is running at the Global site, queryService should refer to the remote Storage site luskan service. A Global site will need to run a fenwick instance for each Storage site.
  11. tantar - File validation application

    • Required for: Storage site only
    • Container image: Use the latest storage-inventory/tantar image from images.opencadc.org
    • See the opencadc Storage Inventory tantar documentation for details.
    • Configuration notes:
      • tantar.properties
        • org.opencadc.tantar.buckets:
          • See the description of buckets. Tantar operates on storage buckets.
          • If you're only running one instance of tantar it should be configured to operate on all buckets (0-f); for multiple instances of tantar, you would want to configure these to operate on non-overlapping subsets of buckets (e.g. 0-7, 8-f).
  12. critwall - File sync application

    • Required for: Storage site only
    • Container image: Use the latest storage-inventory/critwall image from images.opencadc.org
    • See the opencadc Storage Inventory critwall documentation for details.
    • Configuration notes:
      • critwall.properties
        • org.opencadc.critwall.locatorService:
          • This should be configured to point to the resourceID of your Global site instance of raven.
        • org.opencadc.critwall.buckets:
          • See the description of buckets. Critwall operates on URI buckets.
          • As with tantar, you can run one or more instances of critwall, specifying a single bucket or a range of buckets for each instance.
  13. ratik - Metadata validation

    • Required for: Storage site and Global site
    • Container image: Use the latest storage-inventory/ratik image from images.opencadc.org
    • See the opencadc Storage Inventory ratik documentation for details.
    • Configuration notes:
      • ratik.properties
        • org.opencadc.ratik.queryService:
          • ratik is used to validate the artifact metadata at one site against another site, usually a Storage site vs a Global site or vice versa. The queryService is the resourceID for the remote luskan service -- ie. if ratik is running at a Storage site, queryService should refer to the remote Global site luskan; if ratik is running at the Global site, queryService should refer to the remote Storage site luskan service. A Global site will need to run a ratik instance for each Storage site.
        • org.opencadc.ratik.buckets:
          • See the description of buckets. Ratik operates on URI buckets.
          • As with tantar, you can run one or more instances of ratik, specifying a single bucket or a range of buckets for each instance.

Healthchecks and Monitoring

FAQ

Additional FAQ can be found here