Aleph

How To Configure a Storage Provider

Whenever you upload a file to Aleph that file are stored in the so-called file archive. Aleph supports multiple storage backends for the file archive. By default, Aleph stores files on the local file system, but you can configure Aleph to use an object storage provider such as Google Cloud Storage or AWS S3.

File system

By default, Aleph stores files on the local file system. You do not have to change any configuration options to store the file archive on the local file system.

If necessary, you can configure the path of the file system by setting the ARCHIVE_PATH configuration option.

Files written inside of a Docker container are not persisted by default, i.e. they are lost when the container is deleted or recreated. When you run Aleph using the default storage driver, make sure that a volume is configured to persist the data even when the container is deleted or recreated.

When storing files on the file system you have to set up backups yourself. Please also refer to our guide on how to create a backup.

Amazon AWS S3

You can use Amazon AWS S3 or any other object storage provider that is compatible with S3 to store files. Many other providers such as Digital Ocean, Backblaze, OVH provide object storage compatible with S3.

  1. Create and configure a storage bucket with your provider. Depending on whether you use Amazon AWS S3 or a different provider the exact steps vary so we cannot provide step-by-step instructions in this guide. Please refer to the documentation of your provider for details.

    Make sure to configure your bucket to be private.

  2. After you have created your storage bucket, you need the following information:

    • Bucket URL (sometimes also referred to as the bucket endpoint)
    • Access key ID
    • Secret access key

    Your provider may use slightly different terms instead of “Access key ID” and “Secret access key”.

  3. Set the following configuration options:

    Configuration optionValue
    ARCHIVE_TYPEs3
    ARCHIVE_BUCKETBucket URL
    AWS_ACCESS_KEY_IDAccess key ID
    AWS_SECRET_ACCESS_KEYSecret access key

IAM roles

If you use Amazon AWS S3 as your object storage provider and host Aleph on Amazon AWS EC2 you can use IAM roles instead of explicitly specifying an access key ID and secret access key. If possible this is the preferred way to give Aleph access to a storage bucket as it doesn’t require you to manually manage the credentials.

Please refer to the Amazon AWS documentation for details. Once you’ve configured the necessary IAM, configure Aleph as outlined in the previous section but leave the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY configuration options empty.

Google Cloud Storage

  1. Create and configure a storage bucket. For details, please refer to the Google Cloud Storage documentation.

    Make sure to configure your bucket to be private.

  2. Assign the Storage Object User role to the service account created in the previous step.

  3. Create a service account key and download credentials file in JSON format.

    The credentials file gives access to the contents of the storage bucket. Store it savely to prevent accidentally exposing the credentials file.

  4. Make the credentials file available inside the Aleph containers:

    • If you deployed Aleph using Docker Compose create a volume to mount the credentials files in the services api, ingest-file, worker, and shell. Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path where the credentials file is mounted. For example, if you have mounted the credentials file at /var/secrets/google/service-account.json set GOOGLE_APPLICATION_CREDENTIALS=/var/secrets/google/service-account.json.

    • If you deployed Aleph on Kubernetes using the Aleph Helm chart create a secret named service-account-aleph with the contents of the credentials file. Then set the chart value global.google to true.

  5. Set the following configuration options:

    Configuration optionValue
    ARCHIVE_TYPEgs
    ARCHIVE_BUCKETName of the bucket created in the first step

Attached service accounts

If you use Google Cloud Storage as your object storage provider and host Aleph on the Google Cloud Platform you can use so-called attached service accounts instead of explicitly specifying a credentials files. If possible this is the prefferred way to give Aleph access to a storage bucket as it doesn’t require you to manually manage the credentials.

Please refer to the Google Cloud Platform documentation for details. Once you’ve configured an attached service account, you can configure Aleph to use your storage bucket as outlined above without mounting a credentials file or setting the Google_APPLICATION_CREDENTIALS environment variable.