Introduction

This is the documentation for tszip, a command line interface and Python API for compressing tskit tree sequence files used by msprime, SLiM, fwdpy11 and tsinfer. Tszip achieves much better compression than is possible using generic compression utilities by building on the zarr and numcodecs packages.

The command line interface follows the design of gzip closely, so should be immediately familiar. Here we compress a large tree sequence representing 1000 Genomes chromosome 22 using tszip and decompress it using tsunzip:

$ ls -lh
total 297M
-rw-r--r-- 1 jk jk 297M May 10 14:49 1kg_chr20.trees
$ tszip 1kg_chr20.trees
$ ls -lh
total 46M
-rw-r--r-- 1 jk jk 46M May 10 14:51 1kg_chr20.trees.tsz
$ tsunzip 1kg_chr20.trees.tsz
$ ls -lh
total 297M
-rw-r--r-- 1 jk jk 297M May 10 14:52 1kg_chr20.trees