OSX compress, zip64 and large file madness

(since we are talking about zipping and unzipping… image credit: 119)

Just a quick note to save you 3 hours of troubleshooting:

Do NOT use OSX built-in “compress” function unless you are dealing with less than 65,535 files or files smaller than 4GB. Or else you risk not being able to decompress (unzip) them in other applications (or other OS).

To fully understand this issue, we need to go through a few basics first.

  • ZIP, when it was originally introduced by Phil Katz in 1989, has a 4GB limit on total archive size and original file size, as well as 65,535 limit on the number of files allowed in an archive. These limits exists because the headers for filesize has only 4 bytes, while header for total file count has 2 bytes.
  • ZIP64 extension was later added to raise both limit to 2^64 – 1, around 16 Exbibytes, more than enough for most use cases today. In order to be backward compatible with ZIP standard, new headers with 8 bytes field were added.
  • An archiving tool with ZIP implementation would automatically switch to ZIP64 extension (ie. adds ZIP64 headers) when any of limits are reached.

But OS X, being unique as it is, has a different idea in mind.

  • As of OS X Mavericks (10.9.1), the default tool used for compress/decompress is Archive Utility, or ditto in terminal. Mavericks also has zip and unzip commands built-in.
  • However, ditto (dated 2008 December) doesn’t follow ZIP64 spec and continue to write headers in the old ZIP headers. The result? Your ZIP archive is corrupted on creation. Unless an archiving tool is written specifically to ignore CRC checksum and enumerate all files. You will end up losing files or obtain partially extracted file.
  • How about zip and unzip then? Well the zip binary (Info-ZIP) shipped with Mavericks is v3.0 (dated 2008) and has proper ZIP64 support. But unfortunately unzip is on v5.52 (dated 2005) and without ZIP64 support. So trying to zip any files/directories that require ZIP64 extension support, will result in an archive not supported by built-in unzip.

Just to prove these points, I tried to compress a directory with files over 4GB and check them against this command on OSX: unzip -l -v zipfile

df$ unzip -l -v ditto-pack.zip

Archive: ditto-pack.zip

Length Method Size Ratio Date Time CRC-32 Name
-------- ------ ------- ----- ---- ---- ------ ----
389792011 Defl:N 1855397934 -376% 01-19-14 16:08 5c4394fa large.file
-------- ------- --- -------
390195922 1855483050 -376% 34 files
df$ unzip -l -v zip-pack.zip

Archive: zip-pack.zip
warning [zip-pack.zip]: 76 extra bytes at beginning or within zipfile
(attempting to process anyway)
error [zip-pack.zip]: reported length of central directory is
-76 bytes too long (Atari STZip zipfile? J.H.Holm ZIPSPLIT 1.1
zipfile?). Compensating...

Length Method Size Ratio Date Time CRC-32 Name
-------- ------ ------- ----- ---- ---- ------ ----
4294967295 Defl:N 1858047072 57% 01-19-14 16:08 5c4394fa large.file
-------- ------- --- -------
4295370966 1858132732 57% 29 files

note: didn't find end-of-central-dir signature at end of central dir.

The archive utility zipfile result in incorrect CRC, filesize and compression ratio (-376% means archive is 2 times larger than original files). This means neither Windows Explorer nor 7-Zip will handle zipfile from OS X archive utility if ZIP64 extension is required, as they both enforce CRC checks (even if correct data exists).

The proper zip archive is not supported by built-in unzip (see warning message about missing EOCD header, as ZIP64 archive using a updated EOCD header; another hint is the 4GB limit for individual files, 4294967295 is 2^32 – 1). However they do work with Windows (Vista and above) as well as 7-Zip.

So in conclusion: Do NOT use OSX built-in “compress” function unless you use OSX exclusively and don’t care about corrupted ZIP files.

Further readings:

Author: 店长

The Master of BitInn

6 thoughts on “OSX compress, zip64 and large file madness”

  1. 刚点开进来时发现全篇都是英文,当时还被吓着了,读了一下后莫非是就如标题所写的那样,已经快被OSX给逼疯了么233
    另外,看到”But OS X, being unique as it is, has a different idea in mind.”时忍不住笑了出来(手动表情: http://staticbbs.acfun.tv//Images/Upload2/Images/2013-10-15/51a0355b-e52c-419d-a070-1986bcde051e.jpg)

  2. 实在是一个蛋疼的历史遗留问题….


  3. 扩展了一次size字段还是定长(64bit最大限制16EB,别跟我说这个完全够用,当年盖茨还说过640K内存未来完全够用呢),糟糕的设计啊。为什么不用VLQ呢?

Comments are closed.