Technical introduction
Aleph is a toolkit of powerful components for processing knowledge graphs, focussed around the Aleph API server and document processing framework.
Aleph is an open source toolkit for investigative data analysis. It allows generating, searching and analysing large graphs of heterogeneous data, including public records, structured databases and leaked evidence. The system can integrate data from both unstructured data formats (like PDF, Email, and other file types) and structured data such as CSV files, or SQL databases. Data that’s been loaded can be securely searched, cross-referenced with other datasets and exported to other systems.
At the core of Aleph’s capabilities is Follow the Money (FtM), a shared data model the encapsulates core concepts such as People
, Companies
, Documents
or Contracts
. Such data can be generated from tabular inputs, or via the ingest-file
system that extracts data from dozens of input formats (including Word, Powerpoint, PDF, Access, E-Mail, ZIP Archives and so on).
The basics
Getting data in and out
The Aleph system also includes Memorious, a crawler framework that lets you write, manage and control a fleet of scrapers to maintain up-to-date copies of public records from the web.
Architecture overview
Contributing
We’re keen to consider pull requests for extensions or bug fixes in all components of the platform. An ideal submission would already follow common coding standards, such as PEP8, and, when significantly changing functionality, include a test case.
Please also consider dropping by in the Slack instance before to discuss your idea.