Aleph

Technical Documentation

Welcome to the Aleph Technical Documentation. The tech docs contain resources for developers, administrators, and data engineers.

Aleph is an open source toolkit for investigative data analysis. It allows generating, searching and analysing large graphs of heterogeneous data, including public records, structured databases and leaked evidence. The system can integrate data from both unstructured data formats (like PDF, Email, and other file types) and structured data such as CSV files, or SQL databases. Data that’s been loaded can be securely searched, cross-referenced with other datasets and exported to other systems.

At the core of Aleph’s capabilities is Follow the Money (FtM), a shared data model the encapsulates core concepts such as People, Companies, Documents or Contracts. Such data can be generated from tabular inputs, or via the ingest-file system that extracts data from dozens of input formats (including Word, Powerpoint, PDF, Access, E-Mail, ZIP Archives and so on).

The basics

Getting data in and out

The Aleph system also includes Memorious, a crawler framework that lets you write, manage and control a fleet of scrapers to maintain up-to-date copies of public records from the web.

Architecture overview

Contributing

We’re keen to consider pull requests for extensions or bug fixes in all components of the platform. An ideal submission would already follow common coding standards, such as PEP8, and, when significantly changing functionality, include a test case.

Please also consider dropping by in the Slack instance before to discuss your idea.