How To Export Network Graphs
In some cases you may want to process data stored in Aleph as a graph, for example to run advanced network analysis. This guide describes how to export data as Cypher commands for Neo4j or in the GEXF format for Gephi/Sigma.js.
Transformation strategy
Data in Aleph is stored using FollowTheMoney. FollowTheMoney sees every unit of information as an entity with a set of properties. To analyze this information as a network with nodes and edges, we need to decide what logic should rule the transformation of entities into nodes and edges. Different strategies are available:
- Some entity schemata, such as
Directorship
,Ownership
,Family
orPayment
, contain annotations that define how they can be transformed into an edge with a source and target. - Entities also naturally reference others. For example, an
Email
has an emitters property that refers to aLegalEntity
, the sender. The emitters property connects the two entities and can also be turned into an edge. - Finally, some types of properties (e.g.
email
,iban
,names
) can be formed into nodes, with edges formed towards each node that derives from an entity with that property value. For example, an address node for “40 Wall Street” would show links to all the companies registered there, or a node representing the name “Frank Smith” would connect all the documents mentioning that name.
Prerequisites
This guide uses the alephclient CLI to export FollowTheMoney data from Aleph and the ftm CLI to convert FollowTheMoney data to Cypher commands. If you don’t have these two CLIs installed, please refer to How to install the ftm CLI and How to install the alephclient CLI for installation instructions.
Using the alephclient CLI, you can stream entities from an Aleph collection and write them to a file:
alephclient stream-entities --foreign-id 0bdf... --outfile entities.json
Replace 0bdf...
with the foreign ID of your collection. You can find a collection’s foreign ID in the Aleph UI. Navigate to the collection homepage. The foreign ID is listed in the sidebar on the right.
Streaming very large collections from an Aleph instance is a resource-consuming activity on the server side. Please only stream collections with more than 100,000 entities after making sure that the server administrators are OK with it.
Exporting Cypher commands for Neo4j
Neo4j is a popular open-source graph database that can be queried and edited using the Cypher language. It can be used as a database backend or queried directly to perform advanced analysis, e.g. to find all paths between two entities.
Run the following command to convert the FollowTheMoney entities streamed from Aleph to Cypher commands:
ftm export-cypher --infile entities.json --outfile entities.cypher
Alternatively, you can also stream the commands directly into Neo4J’s Cypher shell to load the data into a Neo4j database:
ftm export-cypher --infile entities.json | cypher-shell -u USER -p PASSWORD
Running the above commands will only make explicit edges based on entity references. If you want to reify specific property types as edges, use the --edge-types
/-e
option:
# Creates an edge between two entities that have address properties with the same value
ftm export-cypher --infile entities.json --edge-types address
# Creates an edge between two entities that have address or IBAN properties with the same value
ftm export-cypher --infile entities.json -e address -e iban
GEXF for Gephi/Sigma.js
The Graph Exchange XML Format (GEXF) is a file format used by the network analysis software Gephi and other tools developed in the periphery of the Media Lab at Sciences Po. Gephi is particularly suited to do quantitative analysis of graphs with tens of thousands of nodes. It can calculate network metrics like centrality or PageRank, or generate complex visual layouts.
Run the following command to convert the FollowTheMoney entities streamed from Aleph to GEXF:
ftm export-gexf --infile entities.json --outfile entities.gexf
Running the above commands will only make explicit edges based on entity references. If you want to reify specific property types as edges, use the --edge-types
/-e
option:
# Creates an edge between two entities that have address properties with the same value
ftm export-gexf --infile entities.json --edge-types address
# Creates an edge between two entities that have address or IBAN properties with the same value
ftm export-gexf --infile entities.json -e address -e iban