Blog

svndiff / deltav algorithm

Posted by:

(This post refers to svndiffs V0 compressed strings).

The article at http://svn.collab.net/repos/svn/trunk/notes/svndiff explains how the svndiffalgorithm works. This post tries to give a more concrete idea of what happens, refering to the example that can be found at the end of the above linked article.

SVN delivers the text deltas base64 encoded, so you first have to decode the string to work with it. Next step ist to check the string for it’s first four bytes, which must represent the following four characters: ‘S’, ‘V’, ‘N’, followed by the byte that indicates the format’s version number (for example, the SVN servers over at tigris.org use still Version 0, so the last byte would represent a 0 in this case).

Next thing to do is to ‘parse’ the windows. The given example uses only one window:

  • Byte 1 holds the source offset
  • Byte 2 holds the view length
  • Byte 3 holds the target length
  • Byte 4 holds the length of the instructions of this window
  • Byte 5 holds the length of the new data (which cannot be copied from the source itself)
  • Byte 6-12 hold the instructions
  • Byte 13 holds the new data for this window which any of the instructions in Byte 6-12 may access

Our source is aaaabbbbcccc and the computed svndiff is 01010011 01010110 01001110 00000000 00000000 00001100 00010000 00000111 00000001 00000100 00000000 00000100 00001000 10000001 01000111 00001000 01100100.

Ignoring the first four bytes (which represent the header portion of the svndiff), we start with Byte 6 (10, if you count the header bytes with, but as mentioned we’ll skip those).

Our first instruction is 00000100 00000000. The first two higher order bits of the first byte translate to 00, the instruction to copy from the source. The next 6 bits hold the number of times you have to copy the bytes – which translates to ‘four’. The following byte represents the offset, from where the bytes in the source view have to be taken, which translates to ‘zero’ So we have:

00000100 00000000

Copy from source: Copy byte at offset 0, 4 times

-> target: aaaa

The next instruction is 00000100 00001000; this tells us again to copy from the source:

00000100 00001000

Copy from source: Copy byte at offset 8, 4 times.

-> target: aaaacccc

The third instruction translates to ‘copy from new’. The new data is represented by the last byte in our window and holds the ASCII-code for the character ‘d’. 10000001 simply tells us to copy one byte from the new data:

10000001

Copy from new data: Copy 1 byte, append it 1 time.

-> target: aaaaccccd

The last instruction 01000111 00001000 tells us to copy the byte at offset 8 in the target data (which is the data inserted in the previous instruction) and to append it 7 times:

01000111 00001000

Copy from target: Copy byte at offset 8, 7 times.

-> target: aaaaccccdddddddd

0


About the Author

Thorsten is the author of the conjoon open source project and the Ext.ux.Livegrid component. In this blog he writes more or less frequently about his current projects and web development in general.

Add a Comment


Refresh