Developer tools
Aleph as a toolkit contains a number of Python libraries that can be used independently of the core tool for data parsing and normalisation.
All of the tools below are packaged as releases regularly and can be installed via the Python package registry using pip
:
fingerprints
is a Python library that heavily normalises names of companies and people before comparison. This includes transliteration, word order, and the normalisation of company type suffixes like Limited (Ltd) or Aktiengesellschaft (AG). fingerprints
depends on normality
and works best when pyicu
is installed.
msglite
is a fork of msg-extractor
, a parser for Microsoft Outlook MSG files. These binary email files are OLE containers (like old-style Word or Excel documents) and require some tickling before they will confess details about the contained email message.
countrynames
helps to turn country names into two-letter ISO codes representing that country. For example, United States
or Delaware
become us
, England
becomes gb
. Due to the work area of the OCCRP, this includes some exotic country designations, such as Yugoslavia, Transnistria and the Soviet Union (now deceased).
pantomime
is a simple tool for dealing with MIME type names (such as text/plain
). It contains both a parser and normaliser for MIME declarations, and many common types defined as constants.
languagecodes
is a Python library that handles the normalisation of language identifiers into ISO 3-letter codes. For example, en
becomes eng
, de
becomes deu
, etc.