QCow2 File format (data recovery part 2)

26 Nov 2020

QCow2 image file is used by Qemu for virtual harddrive images. I will not go into any details about the file format, I will only scratch the surface enough to be able to read data from my broken QCow2 image file.

  1. Failed at backup…

If you want to read more in depth about everything in the QCow2 file format, there is a great text file in the Qemu repo over at github: https://github.com/qemu/qemu/blob/master/docs/interop/qcow2.txt

QCow2 format

QCow2 file format is a collection of clusters. These clusters is used to store the Guest virtual machines data, but some clusters also stores metadata for the file.

There are few different types of clusters in the QCow2 file format. I will only focus on the most important ones to be able to read a basic QCow2 file (no encryption or compression, ref-counting or snapshot or other stuff supported). A cluster is always the same size in one file. But a cluster does not have a hard coded size, it can be changed when you create a QCow2 file. So you have to read the header before you know the size of a cluster.

Figure 1 - Cluster blocks

  • Header - QCow2 header information. Always 1st Cluster block even though the header block is smaller than a cluster.
  • L1 Table - First table used to map Guest virtual drive offset to QCow2 offset.
  • L2 Table - The second table used to map Guest virtual drive offset to QCow2 offset.
  • Data cluster - The actual data mapped exactly as the data in the virtual machine.
  • Other cluster - There are other clusters in the file for different purpose, like reference counting and snapshot tracking etc. I will not cover them here.

Virtual drive offset mapping

When a virtual machine is reading from it’s virtual disk an offset is used. When Qemu is using a QCow2 file as storage the offset is divided into different parts.

Figure 2 - Guest offset bit mapping
The offset is 64 bits and the top part is the index in the L1 table, middle part is the index in the L2 table. Those two indices and the L1 & L2 tables are use to find the offset to the cluster storing the data. Offset in cluster is the location in the actual cluster where the data is.
Figure 3 - Offset mapping

L1 Table

The QCow2 Header has the offset to where the location of the L1 Table is and the number of entries stored in the table. Each entry is one 64 bit number used to as an offset to locate where the needed L2 table is.

But only the bits 9-55 is used for the actual file offset, the other bits are either reserved or used for different flags.

L2 Table

The offset to the needed L2 can be found inside the L1 table. And like the L1 table the entries in the L2 table is 64 bit numbers used as an offset to locate the actual cluster where the data is stored.

The L2 table entries only uses bits 0-61, the rest is reserved or used for different flags.

Reading the data in the Cluster

Finally we have the location of the Cluster we are looking for. To find the exact starting byte we need to add the cluster_offset from the L2 table and the offset_in_cluster from the lowest bits of the guest offset.

Figure 4 - Cluster offset and offset in cluster
Almost always you want to read more than 1 byte. But because the structure of the QCow2 file format you can’t continue to read outside the current cluster. The largest amount of bytes you can read in one chunk is cluster_size - offset_in_cluster.
Figure 5 - Cluster chunk, end of cluster and Unknown data
So if cluster_size is 65k and the offset_in_cluster is at 64k you can only read 1k of data (the rest of the cluster). If you need more data you need to restart the process of finding a data cluster.

In the next part

In the next part I will write some C code to actually read something from the QCow2 image file.

Read QCow2 file using C (data recovery part 3)