Stuff I know about ZIP
Inspecting ZIP files
You might want to install zipdetails or check if it’s already installed on your system. It’s useful for better visualizing a ZIP file’s structure. If you have a corrupted ZIP, zipdetails will hint at that.
Another great tool for inspecting ZIP files is a trusty hex editor.
Details
I used the zip
command to create a simple ZIP containing a file named
file.txt
with the text “Hello!” inside.
Here’s the zipdetails output for that file:
0000 LOCAL HEADER #1 04034B50
0004 Extract Zip Spec 0A '1.0'
0005 Extract OS 00 'MS-DOS'
0006 General Purpose Flag 0000
0008 Compression Method 0000 'Stored'
000A Last Mod Time 582DA5FB 'Sat Jan 13 20:47:54 2024'
000E CRC B042D89E
0012 Compressed Length 00000007
0016 Uncompressed Length 00000007
001A Filename Length 0008
001C Extra Length 001C
001E Filename 'file.txt'
0026 Extra ID #0001 5455 'UT: Extended Timestamp'
0028 Length 0009
002A Flags '03 mod access'
002B Mod Time 65A3677A 'Sat Jan 13 20:47:54 2024'
002F Access Time 65A372EF 'Sat Jan 13 21:36:47 2024'
0033 Extra ID #0002 7875 'ux: Unix Extra Type 3'
0035 Length 000B
0037 Version 01
0038 UID Size 04
0039 UID 000003E8
003D GID Size 04
003E GID 000003E8
0042 PAYLOAD Hello!.
0049 CENTRAL HEADER #1 02014B50
004D Created Zip Spec 1E '3.0'
004E Created OS 03 'Unix'
004F Extract Zip Spec 0A '1.0'
0050 Extract OS 00 'MS-DOS'
0051 General Purpose Flag 0000
0053 Compression Method 0000 'Stored'
0055 Last Mod Time 582DA5FB 'Sat Jan 13 20:47:54 2024'
0059 CRC B042D89E
005D Compressed Length 00000007
0061 Uncompressed Length 00000007
0065 Filename Length 0008
0067 Extra Length 0018
0069 Comment Length 0000
006B Disk Start 0000
006D Int File Attributes 0001
[Bit 0] 1 Text Data
006F Ext File Attributes 81B40000
0073 Local Header Offset 00000000
0077 Filename 'file.txt'
007F Extra ID #0001 5455 'UT: Extended Timestamp'
0081 Length 0005
0083 Flags '03 mod access'
0084 Mod Time 65A3677A 'Sat Jan 13 20:47:54 2024'
0088 Extra ID #0002 7875 'ux: Unix Extra Type 3'
008A Length 000B
008C Version 01
008D UID Size 04
008E UID 000003E8
0092 GID Size 04
0093 GID 000003E8
ADD 0 7 CENTRAL HEADER ref Local #1: file.txt
0097 END CENTRAL HEADER 06054B50
009B Number of this disk 0000
009D Central Dir Disk no 0000
009F Entries in this disk 0001
00A1 Total Entries 0001
00A3 Size of Central Dir 0000004E
00A7 Offset to Central Dir 00000049
00AB Comment Length 0000
Done
Contents
Here’s what the zip looks like in a hex editor (xxd in this case):
00000000: 504b 0304 0a00 0000 0000 fba5 2d58 9ed8 PK..........-X..
00000010: 42b0 0700 0000 0700 0000 0800 1c00 6669 B.............fi
00000020: 6c65 2e74 7874 5554 0900 037a 67a3 65ef le.txtUT...zg.e.
00000030: 72a3 6575 780b 0001 04e8 0300 0004 e803 r.eux...........
00000040: 0000 4865 6c6c 6f21 0a50 4b01 021e 030a ..Hello!.PK.....
00000050: 0000 0000 00fb a52d 589e d842 b007 0000 .......-X..B....
00000060: 0007 0000 0008 0018 0000 0000 0001 0000 ................
00000070: 00b4 8100 0000 0066 696c 652e 7478 7455 .......file.txtU
00000080: 5405 0003 7a67 a365 7578 0b00 0104 e803 T...zg.eux......
00000090: 0000 04e8 0300 0050 4b05 0600 0000 0001 .......PK.......
000000a0: 0001 004e 0000 0049 0000 0000 00 ...N...I.....
You probably noticed that the first 4 bytes there contain the local
file header signature. That first PK
plus 2 of the other bytes that are
part of the signature.
Structure
Local File Header
Contains information specific to the file. After the file is located, the local file header is used in decoding and extracting.
Field | Bytes |
---|---|
Local file header signature | 4 |
Version needed to extract | 2 |
General purpose bit flag | 2 |
Compression method | 2 |
Last mod file time | 2 |
Last mod file date | 2 |
CRC-32 | 4 |
Compressed size | 4 |
Uncompressed size | 4 |
File name length | 2 |
Extra field length | 2 |
Data
Compressed or uncompressed. Variable size.
Data Descriptor
Used in cases where the CRC-32 and file sizes (compressed and uncompressed) aren’t known until the data is processed.
Field | Bytes |
---|---|
Data descriptor signature (optional) | 4 |
CRC-32 | 4 |
Compressed size | 4 |
Uncompressed size | 4 |
Central Directory File Header
Indexes contents, includes metadata, and allows for quick access of the files. Basically, the files don’t have to be read sequentially–which could become very slow.
Field | Bytes |
---|---|
Central file header signature | 4 |
Version made by | 2 |
Version needed to extract | 2 |
General purpose bit flag | 2 |
Compression method | 2 |
Last mod file time | 2 |
Last mod file date | 2 |
CRC-32 | 4 |
Compressed size | 4 |
Uncompressed size | 4 |
File name length | 2 |
Extra field length | 2 |
File comment length | 2 |
Disk number start | 2 |
Internal file attributes | 2 |
External file attributes | 4 |
Relative offset of local header | 4 |
End of Central Directory Record
The end of the ZIP. Without this, the archive just wouldn’t work. It tells the program being used for extraction where the central directory starts, and how many central directory records there are.
Field | Bytes |
---|---|
End of central directory signature | 4 |
Number of this disk | 2 |
Disk where central directory starts | 2 |
Number of central directory records on this disk | 2 |
Total number of central directory records | 2 |
Size of central directory | 4 |
Offset of start of central directory | 4 |
ZIP file comment length | 2 |
Read sequence
The utility used to open the ZIP seeks out the end of central directory; this, as you might imagine, is located at the end of the ZIP.
The end of central directory contains useful metadata that allows the utility to locate the files in the ZIP.
The end of central directory tells the utility where the central directory starts. The central directory contains the offset of the files’ local file headers in the archive.
Once the local file header for a file is located, its data can be read.
Write sequence
For each file, a local file header is written. After that the data is added.
There could be an optional data descriptor written after the file is added. See the data descriptor section for more information.
After the files are written, the central directory is added. There is 1 central directory header for each file in the archive.
Lastly, the end of central directory is written.
Encryption
This section only covers the very basics of WinZip AES encryption, specifically AE-2.
Key differences from an unencrypted ZIP
- An extra data field is added which contains metadata (includes the actual compression method).
- The general purpose bit flag is changed according to the standard.
- The compression method code is set to
99
in the local header and central directory file header. - The CRC-32 is set to
0
. - The file data will be encrypted.
- Before the file data, there is a variable salt value and a 2 byte password verification value.
- After the file data, there is a 10 byte authentication code.
Encryption method
AES-CTR is used for encryption.
Salt
This is typically a random sequence of bytes. The size depends on the chosen key size. For example, a key size of 256 bits will require a 16 byte salt.
Key material
Using PBKDF2-HMAC-SHA1 with 1000 iterations and a desired key length of the AES key size in bytes, plus the authentication key length (same size as the AES key size) plus the length of the password verification value (2 bytes).
So for a ZIP with a key size of 256 bits, the key material looks like this:
- 32 bytes for the encryption key.
- 32 bytes for the authentication key.
- 2 bytes for the password verification value.
Authentication code
The authentication code is created from a portion of the key material using HMAC-SHA1.
The second block of 32 bytes (referred to above as the authentication key) is used to generate the authentication code.
The first 10 bytes of the resulting hash is used as the authentication code.
File data
The encryption occurs after compression. Only the file data is encrypted.
The data is encrypted in blocks of 16 bytes at a time. The last block may be smaller than 16 bytes.