How To Configure a Storage Provider
Whenever you upload a file to Aleph that file are stored in the so-called file archive. Aleph supports multiple storage backends for the file archive. By default, Aleph stores files on the local file system, but you can configure Aleph to use an object storage provider such as Google Cloud Storage or AWS S3.
File system
By default, Aleph stores files on the local file system. You do not have to change any configuration options to store the file archive on the local file system.
If necessary, you can configure the path of the file system by setting the ARCHIVE_PATH
configuration option.
Files written inside of a Docker container are not persisted by default, i.e. they are lost when the container is deleted or recreated. When you run Aleph using the default storage driver, make sure that a volume is configured to persist the data even when the container is deleted or recreated.
When storing files on the file system you have to set up backups yourself. Please also refer to our guide on how to create a backup.
Amazon AWS S3
You can use Amazon AWS S3 or any other object storage provider that is compatible with S3 to store files. Many other providers such as Digital Ocean, Backblaze, OVH provide object storage compatible with S3.
-
Create and configure a storage bucket with your provider. Depending on whether you use Amazon AWS S3 or a different provider the exact steps vary so we cannot provide step-by-step instructions in this guide. Please refer to the documentation of your provider for details.
Make sure to configure your bucket to be private.
-
After you have created your storage bucket, you need the following information:
- Bucket URL (sometimes also referred to as the bucket endpoint)
- Access key ID
- Secret access key
Your provider may use slightly different terms instead of “Access key ID” and “Secret access key”.
-
Set the following configuration options:
Configuration option Value ARCHIVE_TYPE
s3
ARCHIVE_BUCKET
Bucket URL AWS_ACCESS_KEY_ID
Access key ID AWS_SECRET_ACCESS_KEY
Secret access key
IAM roles
If you use Amazon AWS S3 as your object storage provider and host Aleph on Amazon AWS EC2 you can use IAM roles instead of explicitly specifying an access key ID and secret access key. If possible this is the preferred way to give Aleph access to a storage bucket as it doesn’t require you to manually manage the credentials.
Please refer to the Amazon AWS documentation for details. Once you’ve configured the necessary IAM, configure Aleph as outlined in the previous section but leave the AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
configuration options empty.
Google Cloud Storage
-
Create and configure a storage bucket. For details, please refer to the Google Cloud Storage documentation.
Make sure to configure your bucket to be private.
-
Assign the Storage Object User role to the service account created in the previous step.
-
Create a service account key and download credentials file in JSON format.
The credentials file gives access to the contents of the storage bucket. Store it savely to prevent accidentally exposing the credentials file.
-
Make the credentials file available inside the Aleph containers:
-
If you deployed Aleph using Docker Compose create a volume to mount the credentials files in the services
api
,ingest-file
,worker
, andshell
. Set the environment variableGOOGLE_APPLICATION_CREDENTIALS
to the path where the credentials file is mounted. For example, if you have mounted the credentials file at/var/secrets/google/service-account.json
setGOOGLE_APPLICATION_CREDENTIALS=/var/secrets/google/service-account.json
. -
If you deployed Aleph on Kubernetes using the Aleph Helm chart create a secret named
service-account-aleph
with the contents of the credentials file. Then set the chart valueglobal.google
totrue
.
-
-
Set the following configuration options:
Configuration option Value ARCHIVE_TYPE
gs
ARCHIVE_BUCKET
Name of the bucket created in the first step
Attached service accounts
If you use Google Cloud Storage as your object storage provider and host Aleph on the Google Cloud Platform you can use so-called attached service accounts instead of explicitly specifying a credentials files. If possible this is the prefferred way to give Aleph access to a storage bucket as it doesn’t require you to manually manage the credentials.
Please refer to the Google Cloud Platform documentation for details. Once you’ve configured an attached service account, you can configure Aleph to use your storage bucket as outlined above without mounting a credentials file or setting the Google_APPLICATION_CREDENTIALS
environment variable.