The logical place to use this damage data is in a video encoder (since even the damaged areas of a video frame are probably similar to the previous frame, especially in the case of animations).
Are there any video encoder libraries which allow a damage map to be input, to save encoding time when only a small non-rectangular part of the frame has been changed?
Couldnt you use hardware h264 encoder to do it for you for free in neat 8x8 blocks?
Most of your time is likely waiting for the GPU to complete rendering in glReadPixels. To avoid this, use glSyncFence to know when rendering has completed, and optionally use PBOs to set up the transfer ahead of time. Obviously if you have no important work to do on the CPU in the meantime then it won't help, but if you have other work to be doing you won't be waiting on the GPU. This is extra true if you're using it to get DMA-BUF contents to send across the wire.
There are other tricks you could be using as well (especially if the client gives you a damage rect already), like using a hash for tiles, and doing the vectorization manually on the CPU, but the VNC protocol makes it difficult to work with. Good luck!