metaknowledge is a Python3 package for doing computational research in bibliometrics, scientometrics, and network analysis. It can also be easily used to simplify the process of doing systematic reviews in any disciplinary context.

metaknowledge reads a directory of plain text files containing meta-data on publications and citations, and writes to a variety of data structures that are suitable for longitudinal research, computational text analysis (e.g. topic models and burst analysis), Reference Publication Year Spectroscopy (RPYS), and network analysis (including multi-modal, multi-level, and dynamic). It handles large datasets (e.g. several million records) efficiently.

metaknowledge currently handles data from the Web of Science, PubMed, Scopus, Proquest Dissertations & Theses, and administrative data from the National Sciece Foundation and the Canadian tri-council granting agencies: SSHRC, CIHR, and NSERC.

Datasets created with metaknowledge can be analyzed using NetworkX and the standard libraries for data analysis in Python. It is also easy to write data to csv or graphml files for analysis and visualization in R, Stata, Visone, Gephi, or any other tools for data analysis.

metaknowledge also has a simple command line tool for extracting quantitative datasets and network files from Web of Science files. This makes the library more accessible to researchers who do not know Python, and makes it easier to quickly explore new datasets.


Reid McIlroy-Young,
University of Chicago, Chicago, IL, USA
John McLevey,
University of Waterloo, Waterloo, ON, Canada
Jillian Anderson,
University of Waterloo, Waterloo, ON, Canada


If you are using metaknowledge for research that will be published or publicly distributed, please acknowledge us with the following citation:

Reid McIlroy-Young, John McLevey, and Jillian Anderson. 2015. metaknowledge: open source software for social networks, bibliometrics, and sociology of knowledge research. URL:

Download .bib file: