Split Apache OAK Stores

Evgeniy Fitsner Software Engineer
3 min read
Split Apache OAK Stores

Introduction

As an AEM repository grows due to accumulated assets and binary files, regular compaction alone may not be sufficient to maintain performance. Separating the NodeStore and BlobStore components can significantly improve repository speed and reduce memory consumption.

When to Split

Consider splitting your Apache OAK repository if:

  • Repository size grows unexpectedly fast due to numerous assets
  • You need to improve local repository speed
  • You want to decrease memory consumption
  • Compaction takes increasingly long or provides diminishing returns

Implementation Steps

These steps are demonstrated for AEM 6.1 on Windows but apply to other versions and platforms with minor adjustments.

Prerequisites

  • Stop the Adobe AEM instance
  • Back up the existing repository
  • Run offline compaction on the repository

1. Create Directories

Create separate directories for the new SegmentStore and BlobStore:

1
2
crx-quickstart\repository_new
C:\dev\aem\blobstore

2. Clone the Repository with crx2oak

Use the crx2oak migration tool to clone the repository with binary separation:

1
2
3
4
java -jar crx2oak-1.4.6-standalone.jar --copy-binaries ^
  --src-datastore=crx-quickstart/repository/segmentstore ^
  --datastore=c:\dev\aem\author\blobstore ^
  crx-quickstart/repository crx-quickstart/repository_new -mmap

The --copy-binaries flag instructs crx2oak to move binary files to the external datastore.

3. Reorganize Directories

  • Create the crx-quickstart\install directory if it does not exist
  • Delete the old crx-quickstart\repository directory
  • Rename crx-quickstart\repository_new to crx-quickstart\repository

4. Create Configuration Files

Create org.apache.jackrabbit.oak.plugins.segment.SegmentNodeStoreService.cfg in crx-quickstart\install:

1
customBlobStore=true

Create org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStore.cfg in crx-quickstart\install:

1
2
3
path=C:\\dev\\aem\\author\\blobstore
cacheSize=...
minRecordLength=...

Adjust cacheSize and minRecordLength based on your needs. The cacheSize controls how many blob entries are cached in memory, and minRecordLength sets the minimum size for binaries stored in the datastore.

5. Restart and Verify

Restart the AEM instance and verify that “Custom BlobStore” is enabled in the OSGi configuration manager at /system/console/configMgr.

Benefits

  • Reduced repository size: Binary files are stored separately, making the segment store more compact
  • Faster compaction: Smaller segment store means faster online and offline compaction
  • Better memory usage: Fewer binaries in memory during repository operations
  • Easier backups: Segment store and blob store can be backed up independently

Contents