Stuff I know about ZIP

January 13, 2024 | Updated January 14, 2024

Inspecting ZIP files

You might want to install zipdetails or check if it’s already installed on your system. It’s useful for better visualizing a ZIP file’s structure. If you have a corrupted ZIP, zipdetails will hint at that.

Another great tool for inspecting ZIP files is a trusty hex editor.

Details

I used the zip command to create a simple ZIP containing a file named file.txt with the text “Hello!” inside.

Here’s the zipdetails output for that file:

0000 LOCAL HEADER #1       04034B50
0004 Extract Zip Spec      0A '1.0'
0005 Extract OS            00 'MS-DOS'
0006 General Purpose Flag  0000
0008 Compression Method    0000 'Stored'
000A Last Mod Time         582DA5FB 'Sat Jan 13 20:47:54 2024'
000E CRC                   B042D89E
0012 Compressed Length     00000007
0016 Uncompressed Length   00000007
001A Filename Length       0008
001C Extra Length          001C
001E Filename              'file.txt'
0026 Extra ID #0001        5455 'UT: Extended Timestamp'
0028   Length              0009
002A   Flags               '03 mod access'
002B   Mod Time            65A3677A 'Sat Jan 13 20:47:54 2024'
002F   Access Time         65A372EF 'Sat Jan 13 21:36:47 2024'
0033 Extra ID #0002        7875 'ux: Unix Extra Type 3'
0035   Length              000B
0037   Version             01
0038   UID Size            04
0039   UID                 000003E8
003D   GID Size            04
003E   GID                 000003E8
0042 PAYLOAD               Hello!.

0049 CENTRAL HEADER #1     02014B50
004D Created Zip Spec      1E '3.0'
004E Created OS            03 'Unix'
004F Extract Zip Spec      0A '1.0'
0050 Extract OS            00 'MS-DOS'
0051 General Purpose Flag  0000
0053 Compression Method    0000 'Stored'
0055 Last Mod Time         582DA5FB 'Sat Jan 13 20:47:54 2024'
0059 CRC                   B042D89E
005D Compressed Length     00000007
0061 Uncompressed Length   00000007
0065 Filename Length       0008
0067 Extra Length          0018
0069 Comment Length        0000
006B Disk Start            0000
006D Int File Attributes   0001
     [Bit 0]               1 Text Data
006F Ext File Attributes   81B40000
0073 Local Header Offset   00000000
0077 Filename              'file.txt'
007F Extra ID #0001        5455 'UT: Extended Timestamp'
0081   Length              0005
0083   Flags               '03 mod access'
0084   Mod Time            65A3677A 'Sat Jan 13 20:47:54 2024'
0088 Extra ID #0002        7875 'ux: Unix Extra Type 3'
008A   Length              000B
008C   Version             01
008D   UID Size            04
008E   UID                 000003E8
0092   GID Size            04
0093   GID                 000003E8
ADD 0 7 CENTRAL HEADER ref Local #1: file.txt

0097 END CENTRAL HEADER    06054B50
009B Number of this disk   0000
009D Central Dir Disk no   0000
009F Entries in this disk  0001
00A1 Total Entries         0001
00A3 Size of Central Dir   0000004E
00A7 Offset to Central Dir 00000049
00AB Comment Length        0000
Done

Contents

Here’s what the zip looks like in a hex editor (xxd in this case):

00000000: 504b 0304 0a00 0000 0000 fba5 2d58 9ed8  PK..........-X..
00000010: 42b0 0700 0000 0700 0000 0800 1c00 6669  B.............fi
00000020: 6c65 2e74 7874 5554 0900 037a 67a3 65ef  le.txtUT...zg.e.
00000030: 72a3 6575 780b 0001 04e8 0300 0004 e803  r.eux...........
00000040: 0000 4865 6c6c 6f21 0a50 4b01 021e 030a  ..Hello!.PK.....
00000050: 0000 0000 00fb a52d 589e d842 b007 0000  .......-X..B....
00000060: 0007 0000 0008 0018 0000 0000 0001 0000  ................
00000070: 00b4 8100 0000 0066 696c 652e 7478 7455  .......file.txtU
00000080: 5405 0003 7a67 a365 7578 0b00 0104 e803  T...zg.eux......
00000090: 0000 04e8 0300 0050 4b05 0600 0000 0001  .......PK.......
000000a0: 0001 004e 0000 0049 0000 0000 00         ...N...I.....

You probably noticed that the first 4 bytes there contain the local file header signature. That first PK plus 2 of the other bytes that are part of the signature.

Structure

Local File Header

Contains information specific to the file. After the file is located, the local file header is used in decoding and extracting.

Field Bytes
Local file header signature 4
Version needed to extract 2
General purpose bit flag 2
Compression method 2
Last mod file time 2
Last mod file date 2
CRC-32 4
Compressed size 4
Uncompressed size 4
File name length 2
Extra field length 2

Data

Compressed or uncompressed. Variable size.

Data Descriptor

Used in cases where the CRC-32 and file sizes (compressed and uncompressed) aren’t known until the data is processed.

Field Bytes
Data descriptor signature (optional) 4
CRC-32 4
Compressed size 4
Uncompressed size 4

Central Directory File Header

Indexes contents, includes metadata, and allows for quick access of the files. Basically, the files don’t have to be read sequentially–which could become very slow.

Field Bytes
Central file header signature 4
Version made by 2
Version needed to extract 2
General purpose bit flag 2
Compression method 2
Last mod file time 2
Last mod file date 2
CRC-32 4
Compressed size 4
Uncompressed size 4
File name length 2
Extra field length 2
File comment length 2
Disk number start 2
Internal file attributes 2
External file attributes 4
Relative offset of local header 4

End of Central Directory Record

The end of the ZIP. Without this, the archive just wouldn’t work. It tells the program being used for extraction where the central directory starts, and how many central directory records there are.

Field Bytes
End of central directory signature 4
Number of this disk 2
Disk where central directory starts 2
Number of central directory records on this disk 2
Total number of central directory records 2
Size of central directory 4
Offset of start of central directory 4
ZIP file comment length 2

Read sequence

The utility used to open the ZIP seeks out the end of central directory; this, as you might imagine, is located at the end of the ZIP.

The end of central directory contains useful metadata that allows the utility to locate the files in the ZIP.

The end of central directory tells the utility where the central directory starts. The central directory contains the offset of the files’ local file headers in the archive.

Once the local file header for a file is located, its data can be read.

Write sequence

For each file, a local file header is written. After that the data is added.

There could be an optional data descriptor written after the file is added. See the data descriptor section for more information.

After the files are written, the central directory is added. There is 1 central directory header for each file in the archive.

Lastly, the end of central directory is written.

Encryption

This section only covers the very basics of WinZip AES encryption, specifically AE-2.

Key differences from an unencrypted ZIP

Encryption method

AES-CTR is used for encryption.

Salt

This is typically a random sequence of bytes. The size depends on the chosen key size. For example, a key size of 256 bits will require a 16 byte salt.

Key material

Using PBKDF2-HMAC-SHA1 with 1000 iterations and a desired key length of the AES key size in bytes, plus the authentication key length (same size as the AES key size) plus the length of the password verification value (2 bytes).

So for a ZIP with a key size of 256 bits, the key material looks like this:

  1. 32 bytes for the encryption key.
  2. 32 bytes for the authentication key.
  3. 2 bytes for the password verification value.

Authentication code

The authentication code is created from a portion of the key material using HMAC-SHA1.

The second block of 32 bytes (referred to above as the authentication key) is used to generate the authentication code.

The first 10 bytes of the resulting hash is used as the authentication code.

File data

The encryption occurs after compression. Only the file data is encrypted.

The data is encrypted in blocks of 16 bytes at a time. The last block may be smaller than 16 bytes.

References