Skip to main content

Registry Metadata Stores

Overview

By default, an OpenComponents registry keeps its metadata index — the list of which components and versions exist — in two storage files: components.json and components-details.json. This index is derived: the registry rebuilds it by scanning the whole storage directory tree (componentsDir/<component>/<version>/package.json) on startup and again after every publish.

That scan is O(registry size). It also isn't transactional: concurrent publishes on different nodes each scan and then overwrite components.json (last-writer-wins), which only "works" because the next scan re-derives the truth from the immutable directory tree. Under heavy, multi-node publishing this drives CPU and GC pressure and makes publish cost grow with the size of the registry.

A metadata store is an opt-in feature that moves this index into a pluggable database that becomes the source of truth for what exists. With it:

  • Publish becomes an O(1) atomic row append instead of a full scan + blob rewrite.
  • Startup becomes a single O(registry size) query instead of a directory walk (and steady-state hydration is just one query).
  • Cross-node correctness is enforced by a PRIMARY KEY (component_name, version) constraint instead of relying on self-healing rescans. Concurrent publishes of the same version → one wins, the other gets a clear "already exists" error; different components never contend.

What it changes — and what it doesn't

The metadata store only owns the index of which name@version pairs exist. It does not change:

  • Static files — component bundles still live in the configured storage adapter (S3, GS, Azure Blob, …). Storage is still required.
  • package.json files — still fetched from storage by getComponentInfo.
  • The hot read path — component reads are still served from OC's in-memory cache and never hit the database.

The metadata store contains only four fields per version: component name, version, publish date, and template size — exactly what the in-memory caches need.

Metadata mode is opt-in and non-breaking. With no metadata block in the configuration, the registry behaves byte-for-byte as before (storage-only mode).

How it works

The store is a pluggable adapter, injected the same way as the storage.adapter. The registry core takes zero database dependencies. The contract each adapter implements is exported by the oc-metadata-adapters-utils package:

type ComponentRow = {
name: string;
version: string;
publishDate: number; // unix seconds
templateSize?: number;
};

interface MetadataStore {
adapterType: string;
isValid(): boolean; // synchronous config sanity check
initialise(): Promise<void>; // open pool; ensure/verify schema
getAllComponents(): Promise<ComponentRow[]>; // hydration — feeds the caches
addVersion(row: ComponentRow): Promise<void>; // commit point
close?(): Promise<void>; // optional: release pool on shutdown
}

Runtime behavior in metadata mode:

  • Startup — the registry initialises the store, then hydrates its in-memory list and details caches from a single getAllComponents() call. If the store cannot be initialised or queried, startup fails (fail-closed).
  • Reads — served entirely from the in-memory cache; hot component reads never touch the database.
  • Polling — the cache is re-hydrated from getAllComponents() on the polling interval. If a poll fails, the registry keeps serving the last good in-memory cache and retries on the next interval.
  • Publish uses a reservation state machine so a failed/concurrent publish can never clobber another:
    1. validate the publish
    2. write the package.json
    3. reserve the metadata row
    4. upload the statics to storage
    5. commit the metadata row A duplicate or in-progress reservation surfaces as the usual "component version already exists" publish error. If the upload or commit fails, the reservation is best-effort aborted; any orphaned statics are harmless unreferenced bytes and a re-publish is idempotent.
  • Shutdownregistry.close(callback) closes the HTTP server and then calls the adapter's optional close() hook so it can release its connection pool.

Configuration

Add a metadata block as a sibling to storage. Storage remains required.

const azureSqlMetadataAdapter = require("oc-azure-sql-metadata-adapter").default;
const s3StorageAdapter = require("oc-s3-storage-adapter");

registry.configure({
storage: {
adapter: s3StorageAdapter,
options: {
bucket: process.env.OC_STORAGE_BUCKET,
region: process.env.OC_STORAGE_REGION,
componentsDir: "components",
path: process.env.OC_STORAGE_BASE_URL,
},
},
metadata: {
adapter: azureSqlMetadataAdapter,
options: {
connectionString: process.env.OC_METADATA_SQL_CONNECTION_STRING,
},
manageSchema: true,
reconcileFromStorage: false,
exportLegacyFiles: false,
},
});

metadata options

ParameterTypeMandatoryDefaultDescription
metadataobjectno-Presence enables metadata mode. Absent → storage-only mode (default).
metadata.adapterfunctionyes*-Metadata adapter factory returning a MetadataStore (*required if using metadata).
metadata.optionsobjectyes*-Connection / pool options passed to the adapter.
metadata.manageSchemabooleannotrueWhen true, the adapter auto-creates its table/index if missing. Set false for locked-down databases where operators manage DDL; the adapter then only verifies the schema.
metadata.reconcileFromStoragebooleannofalseBake-in flag. On startup, scan storage and idempotently insert any name@version present in the directory tree but missing from the database (existing rows skipped).
metadata.exportLegacyFilesbooleannofalseBake-in / DR flag. On startup, write database-derived components.json and components-details.json projections back to storage.
metadata.exportLegacyFilesIntervalnumber (seconds)no-When set (and exportLegacyFiles is true), also refresh those projections on a non-overlapping background timer. Omit to export on startup only. The timer is cleared on shutdown.

The bake-in flags (reconcileFromStorage, exportLegacyFiles, exportLegacyFilesInterval) are only needed while migrating; see Migrating an existing registry.

The legacy file export is decoupled from the publish path — a publish never triggers a full-registry export, so publishing stays an O(1) append. The exported files are a one-directional projection of the database (DB → files, never read back to mutate the DB). They exist for rollback and as a cold-start / DR snapshot; they do not replace the storage adapter, which still holds all component statics.

Available adapters

Azure SQL / SQL Server — oc-azure-sql-metadata-adapter

A connection-pool-based (mssql) adapter.

npm install oc-azure-sql-metadata-adapter
metadata: {
adapter: require("oc-azure-sql-metadata-adapter").default,
options: {
connectionString: process.env.OC_METADATA_SQL_CONNECTION_STRING,
},
}

You can also pass object connection settings instead of a connection string (server, database, user, password, nested options, etc. — passed through to mssql). When no connectionString, password, or explicit authentication is provided, the adapter defaults to Microsoft Entra ID (azure-active-directory-default), so it can connect using an ambient managed identity with no secret in config (pass clientId for a user-assigned identity).

Adapter-specific options: manageSchema (default true), schemaName (default dbo), and tableName (default oc_components). Identifiers must match /^[A-Za-z_][A-Za-z0-9_]*$/.

With manageSchema: true the adapter creates (roughly):

CREATE TABLE [dbo].[oc_components] (
component_name NVARCHAR(255) NOT NULL,
version NVARCHAR(64) NOT NULL,
publish_date BIGINT NOT NULL,
template_size BIGINT NULL,
status NVARCHAR(16) NOT NULL DEFAULT N'committed',
publish_token NVARCHAR(64) NULL,
created_at DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME(),
updated_at DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME(),
PRIMARY KEY (component_name, version)
);
CREATE INDEX ix_oc_components_name ON [dbo].[oc_components] (component_name);

The primary key is the concurrency guard: same-version unique violations (2627 / 2601) are mapped to the shared duplicate / in-progress error codes before any storage upload happens.

Azure Table Storage — oc-azure-table-metadata-adapter

A schemaless, HTTP-based adapter (@azure/data-tables). If you already use Azure Blob Storage for statics, you can reuse the same storage account for the metadata table — no second database to provision. Its PartitionKey + RowKey uniqueness is exactly the concurrency model the metadata store needs.

npm install oc-azure-table-metadata-adapter
metadata: {
adapter: require("oc-azure-table-metadata-adapter").default,
options: {
connectionString: process.env.OC_METADATA_TABLE_CONNECTION_STRING,
},
}

Authentication precedence when connectionString is absent: accountName + accountKeysasToken → explicit credentialDefaultAzureCredential (managed identity / workload identity / az login), so the registry can run with no secret. Other options include endpoint, tableName (default occomponents), manageSchema (default true), and allowInsecureConnection (for Azurite / local development). The adapter maps a 409 Conflict to the shared VERSION_ALREADY_EXISTS code.

Writing a custom adapter

Implement the contract from oc-metadata-adapters-utils. Each adapter maps its driver's unique-violation to the shared error codes (e.g. SQL Server 2627/2601, Postgres 23505, MySQL 1062):

import type { ComponentRow, MetadataStore } from "oc-metadata-adapters-utils";
import {
VERSION_ALREADY_EXISTS,
VERSION_PUBLISH_IN_PROGRESS,
} from "oc-metadata-adapters-utils";

Migrating an existing registry

Migration is gradual and lossless — storage stays authoritative-enough throughout the window so you can roll back at any point.

1. Backfill the database

Before serving traffic in metadata mode, populate the store from your existing index using the CLI command:

oc registry migrate-metadata ./registry.config.js

The argument is a path to a module exporting the same config object you pass to registry.configure(). It must include both storage and metadata and pass registry config validation. The module may be CommonJS, an ES module default export, or an async function returning the config (useful for resolving secrets first):

// registry.config.js — CommonJS
const azureSqlMetadataAdapter = require("oc-azure-sql-metadata-adapter").default;
const s3StorageAdapter = require("oc-s3-storage-adapter");

module.exports = {
baseUrl: "http://my-registry.example.com/",
storage: {
adapter: s3StorageAdapter,
options: {
bucket: "my-bucket",
region: "us-east-1",
componentsDir: "components",
path: "https://cdn.example.com/",
},
},
metadata: {
adapter: azureSqlMetadataAdapter,
options: {
connectionString: process.env.OC_METADATA_SQL_CONNECTION_STRING,
},
},
};

The command initialises the adapter and backfills rows from ${componentsDir}/components-details.json (which already is the ComponentRow set). If that file is missing, it falls back to scanning ${componentsDir}/<component>/<version>/package.json. Existing rows are skipped, so the command is idempotent and safe to re-run across nodes. It logs { scanned, inserted, skipped } and closes the adapter pool on exit (even on failure).

2. Cut over

Deploy with the metadata block configured. During the migration window enable the bake-in flags:

metadata: {
adapter: azureSqlMetadataAdapter,
options: { /* ... */ },
reconcileFromStorage: true, // heal anything published by still-storage-mode nodes
exportLegacyFiles: true, // keep components.json fresh for rollback / external consumers
}

Nodes now hydrate from the database. A safe rollout order:

  1. Deploy the metadata config to a non-serving environment.
  2. Run oc registry migrate-metadata ./registry.config.js.
  3. Start one registry instance in metadata mode and verify reads.
  4. Roll out metadata mode to the remaining instances.

3. Bake-in

Run mixed / observe. While the bake-in flags are on:

  • reconcileFromStorage upserts, on each boot, any name@version that exists in the directory tree but is missing from the database — healing anything a node still running in storage mode published during the cutover.
  • exportLegacyFiles (optionally on exportLegacyFilesInterval) keeps components.json / components-details.json fresh in storage, so external consumers keep working and rollback to storage mode loses at most one export interval.

The directory tree remains authoritative-enough that the reconcile can heal any miss — the same self-healing principle storage mode relies on, applied deliberately once at the boundary.

4. Steady state

Once you're confident, drop the bake-in scaffolding:

  • Set reconcileFromStorage: false (the directory scan is now abandoned).
  • Optionally keep exportLegacyFiles: true permanently as a cheap, one-way DR snapshot / cold-start fallback so the database is never an absolute single point of failure for booting.

components.json is now a non-authoritative projection of the database.

Rolling back

Because the legacy files are kept fresh during bake-in, rolling back is simply removing the metadata block (or reverting the deploy): the registry returns to storage mode and reads the projected components.json, losing at most one export interval of updates.

Failure model

  • Startup, DB down — readiness fails and the node retries with backoff; already-running nodes keep serving from cache and the load balancer skips the not-ready node. The registry never starts silently empty.
  • Poll, DB blip — fully resilient: keep serving the in-memory cache, log, and retry next interval. The only effect is that new publishes propagate slightly later.
  • Publish, DB unreachable — the publish fails with a clear error; any statics uploaded are harmless orphans and the client can retry (idempotent). There is no buffering.

Net: reads survive any DB blip, and publishes correctly refuse during one.