Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][tlug] How to read / modify unified diffs
- Date: Mon, 28 Nov 2005 21:10:58 +0900
- From: Josh Glover <jmglov@example.com>
- Subject: [tlug] How to read / modify unified diffs
- References: <20050104201323.GH8879@example.com>
I mentioned at the last tech meeting that I had once written a document on how to modify unified diffs. I found it whilst doing an unrelated Gmail search, and decided (probably against my better judgement) to provide it to TLUG. It is probably of no actual use, but it might be an interesting case study on how to reverse-engineer file formats... or not. :) Here goes (don't worry about all the copyright notices scrolling by--they are just from example code): The unified diff format is the most common type of patch used in the Open Source world, and also happens to be the output of the 'cvs diff' and 'svn diff' commands. The easiest way to understand the unified diff format is an example. Let's say that your nephew, who idolises his cool coder aunt / uncle (i.e. you) tries his hand at a Perl version of Hello World. Trying to impress you, he makes it a bit more complex than is truly necessary and thus his final script (hello1.pl) contains a few bugs: #!/usr/bin/perl use strict; use warnings; use subs qw(helloWorld); hlloWorld; exit sub helloWorld { prnt "hell ,werl\\n"; } # helloWerld() You quickly correct his script: #!/usr/bin/perl use strict; use warnings; use subs qw(helloWorld); helloWorld; exit; sub helloWorld { print "Hello, world!\n"; } # helloWorld() As you are about to attach it to an email, you have a bright idea: why not teach him how to apply a patch! So, you save your corrected version of the script as hello2.pl and create a patch, using the diff(1) command: diff -u hello1.pl hello2.pl >hello1.pl.patch You attach the patch to an email, along with instructions on how to apply it: 1. Save the patch to your /tmp directory. 2. Change into the directory containing your script. 3. Run the patch(1) command: patch -p 0 </tmp/hello1.pl.patch Let's take a look at the unified diff-format patch that you created: --- hello1.pl 2005-01-04 11:26:22.855403968 -0500 +++ hello2.pl 2005-01-04 11:14:46.669240352 -0500 @@ -6,10 +6,11 @@ use subs qw(helloWorld); -hlloWorld; -exit +helloWorld; +exit; sub helloWorld { - prnt "hell ,werl\\n"; -} # helloWerld() + print "Hello, world!\n"; + +} # helloWorld() The first line (or rather, the first line that patch(1) cares about, more on this later) of the patch contains a description of the original file (i.e. the one to which your patch will be applied). It must start with '--- ', and may contain any descriptive text after that. You can see that the diff(1) command lists the filename and its ctime. The next line is a description of the new file (i.e. the file that will be produced by applying your patch to the original file), and must start with '+++ ", followed by any descriptive text. The remaining body of the patch contains one or more "hunks". These are described by a header: @@ -<old_start>,<old_lines> +<new_start>,<new_lines> @@ where: <old start> is the starting line of the context in the original file <old_lines> is the number of lines in the context in the original file <new start> is the starting line of the context in the new file <new_lines> is the number of lines in the context in the new file A context is what sets a unified diff apart from a regular one. (A regular 'diff hello1.pl hello2.pl' would look like this: 9,10c9,10 < hlloWorld; < exit --- > helloWorld; > exit; 14,15c14,16 < prnt "hell ,werl\\n"; < } # helloWerld() --- > print "Hello, world!\n"; > > } # helloWorld() But forget about regular diffs. They are nowhere near as useful (and thus, common) as unified diffs.) So, what *is* a context? It is the differences, surrounded by a few unchanged lines. By default, diff(1) uses three lines of context. This is why the first line after the hunk header in our example patch is the sixth line of the original file: 'use subs qw(helloWorld);'; because the first line in the new file that differs is line 9: 'hlloWorld;' in the original becomes: 'helloWorld;' in the new file. The last line in the hunk is the last line in the file. Had there been three un- changed lines after line 10, they would have appeared in the hunk as well. You can control the context size by using the -U argument to diff(1). For example, to set the context size to five lines: diff -u -U 5 hello1.pl hello2.pl Changing the context size should have no effect on the patch(1) programme, but it is probably not wise to set the context size lower than three lines, because patch(1) might not have enough context if applying the patch to a newer version of the original file. In fact, version 2.8.4 of GNU diff(1) ignores the -U argument entirely if you try to set it lower than three. OK, back to our example patch. The hunk header: @@ -6,10 +6,11 @@ means that this hunk's context starts at line 6 of the original file and goes on for 10 lines. The context in the new file also starts at line 6 but goes on for 11 lines (since we added a line in our patch). Finally, the hunk itself: use subs qw(helloWorld); -hlloWorld; -exit +helloWorld; +exit; sub helloWorld { - prnt "hell ,werl\\n"; -} # helloWerld() + print "Hello, world!\n"; + +} # helloWorld() The first character of each line tells the patch(1) programme how to process the line: a space means that the line remains unchanged in the new file; a minus means that the line is removed from the new file; a plus means that the line is added to the new file. So, this hunk means that patch(1) will: 1. Leave lines 6, 7, and 8 unchanged (since they start with a space) 2. Remove lines 9 and 10 from the new file (they start with a minus) 3. Add new lines 9 and 10 to the new file (they start with a plus) 4. Leave lines 11, 12, and 13 unchanged (space) 5. Remove lines 14 and 15 from the new file (minus) 6. Add new lines 14, 15, *and 16* to the new file (plus) The reason that the context in the new file (the second set of num- bers in the hunk header, remember) is 11 lines instead of the 10-line context in the original file is that you added a new line 15. Where the original file had the closing curly bracket of the helloWorld() subroutine immediately after the prnt[sic] statement, you added an empty line in your patch, because you are a professional, and pay attention to matters of style, unlike your clueless nephew, who was just aping a real coder's style. Now that a simple example has been presented, here is a more complex one, with multiple hunks: Index: fastq.c =================================================================== --- fastq.c (revision 20) +++ fastq.c (working copy) @@ -1,40 +1,39 @@ /* ======================================================================== - * Copyright 2003 Twenty First Century Communications <swdev@example.com> + * File: fastq.c + * * Copyright 2003 Josh Glover <josh.glover@example.com> * - * LICENCE: + * Licence: * * This file is distributed under the terms of the BSD License (version * 2). See the COPYING file, which should have been distributed with * this file, for details. If you did not receive the COPYING file, see: * - * http://jmglov.homeunix.net/opensource/licenses/bsd.html + * http://www.jmglov.net/opensource/licenses/bsd.txt * - * fastq.c + * Description: * - * DESCRIPTION: - * * Implements the FastQ object and functions. * - * USAGE: + * Usage: * * #include <fastq.h> * - * EXAMPLES: + * Examples: * - * See fastq.h + * See <fastq.h> * - * DEPENDENCIES: + * Dependencies: * * none * - * TODO: + * Todo: * * - Add thread-safe synchronisation stuff (protect with * #ifdef THREADSAFE) * - Nothing, this code is perfect * - * MODIFICATIONS: + * Modifications: * * Josh Glover <jmglov@example.com> (2003/11/06): Initial revision * Josh Glover <jmglov@example.com> (2003/12/05): added FastQGet() @@ -47,7 +46,11 @@ // Standard library headers #include <stdlib.h> -//! human-readable error codes +/** Structure: err + * + * Human-readable error codes + */ + static char *const err[FASTQ_ERR_BOGUS + 1] = { "FASTQ_SUCCESS", // 00 @@ -158,11 +161,17 @@ static inline _qnode *_fastqGetNode( FastQ *q, ULONG i ); -/** Flushes the queue, removing all nodes. +/** Method: _fastq_flush() * - * @example.com *q pointer to the FastQ object to flush + * Flushes the queue, removing all nodes. * - * @example.com FASTQ_SUCCESS on success, FastQ error code on error + * Parameters: + * + * *q - pointer to the FastQ object to flush + * + * Returns: + * + * FASTQ_SUCCESS on success, FastQ error code on error */ static inline UCHAR _fastq_flush( FastQ *q ) { @@ -450,6 +459,17 @@ } // FastQDequeueFrom() +/** Method: doNothing() + * + * Silly method that I added to demonstrate a slightly more complex + * unified diff. + */ + +void doNothing( FastQ *q ) { + +} // doNothing() + + /** Returns the object enqueued in the specified queue at the specified * index (as an offset from the front of the queue). * This patch was generated by the svn(1) command: svn diff fastq.c >/tmp/fastq.c.patch svn(1) is the command-line Subversion client, and its diff sub-programme does exactly what you would expect: it generates a diff (and a unified) one at that. The command as shown simply computes the differences be- tween the file as it appears in your working copy and the latest re- vision that exists in the repository. (If you do not know how revision control systems work, do not worry; this is not crucial to understanding this example. For a quick introduction to revision control, check out http://svnbook.red-bean.com/en/1.1/ch01.html if you are interested.) Here, the version of the file in our working copy is newer than the one in the repository. This is why Subversion uses the repository version as the original file and the working copy version as the new file, as the first few lines of the patch show: Index: fastq.c =================================================================== --- fastq.c (revision 20) +++ fastq.c (working copy) Notice here that Subversion puts two extra lines at the beginning of the file. This is fine; you may put anything you like at the top of your patches (e.g. copyright notices, 5h0u7z 70 j3r 1337 h0m13z, etc.), and patch(1) does not care. It just sits up and takes notice when it finds a line starting with '--- '. Also notice that your description of the original and new files is freeform: diff(1) put filenames and timestamps there, while svn(1) uses filenames and revision numbers instead. The first hunk is relatively straightforward: @@ -1,40 +1,39 @@ It informs patch(1) that the context of the hunk: a. starts at line 1 of the original file, b. continues for 40 lines in the original file, c. also starts at line 1 of the new file, d. but continues for only 39 lines in the new file. This is fairly mundane, as we just reorganised the header comment a bit. Hunk 2 is slightly more interesting: @@ -47,7 +46,11 @@ It starts at line 47 of the original file, but line 46 of the new file! Why is this? Simple: the first hunk *removed a line from the new file*. In fact, when we look at hunk 3, we should see an even bigger discrepancy between original and new files, because hunk 2's context is 7 lines in the original file, but 11 in the new file! Sure enough, hunk 3: @@ -158,11 +161,17 @@ shows us that the context starts at line 158 of the original file, but line 161 of the new one. Some simple arithmetic should assure us that this is correct: hunk 1 removed one line from the new file, while hunk 2 added four. So hunk 3 should start -1 + 4 == 3 lines later in the new file than the original. And sure enough, 161 (the starting line of hunk 3 in the new file) - 158 (the starting line of the hunk in the original) yields 3! The rest of the patch is trivial. Hunk 4: @@ -450,6 +459,17 @@ simply starts at line 450 of the original file and continues for six lines, but starts at line 459 (-1 + 4 + 6 == 9) of the new file and continues for 17 lines. To add a new twist to this example, let's decide that do not have commit access to this particular Subversion repository and so you email the patch to the author of the project (a very common occur- rence in the Open Source world). The author looks over the patch and decides that while he likes your elegant implementation of the new doNothing() method, he likes his Doxygen-style comments better than your NaturalDocs-style ones. So, he decides to apply *only* hunk 4 from your patch to his source tree. Being an uber-hacker, he decides that it will take him less time to edit your patch file than to read the manpage for patch(1) to see if Larry Wall has anticipated his needs and provided some switch to patch(1) that will apply only certain hunks. He leaves the first four lines of your patch alone, and then deletes all the way down to line 91 of the patch file, where hunk 4 begins. So, the patch file has been reduced to: Index: fastq.c =================================================================== --- fastq.c (revision 20) +++ fastq.c (working copy) @@ -450,6 +459,17 @@ } // FastQDequeueFrom() +/** Method: doNothing() + * + * Silly method that I added to demonstrate a slightly more complex + * unified diff. + */ + +void doNothing( FastQ *q ) { + +} // doNothing() + + /** Returns the object enqueued in the specified queue at the specified * index (as an offset from the front of the queue). * In a momentary fit of incompetence, he runs patch(1), thinking he is done: patch -p 0 </tmp/fastq.c.patch and miracle of miracles! patch(1) reports: "patching file fastq.c" and exits successfully. The author realises that he forgot one thing: the hunk header reads: @@ -450,6 +459,17 @@ which claims that the context begins at line 450 in the original file but line 459 in the new file, *even though he discarded the previous hunks that added nine lines to the new file*. This should not have worked, but the beauty of unified diffs is that they provide patch(1) with enough context to recover from momentary lapses. What the author should have done was changed the hunk header to: @@ -450,6 +450,17 @@ In even more complex situations than this, the author's failure to correct a header could result in patch(1) not applying the patch at all. And there you have it, in a nutshell: the mystery of unified diffs explained. -- Josh Glover <josh.glover@example.com> Software Engineer Twenty First Century Communications, Inc. http://www.tfcci.com/ GPG keyID 0x22111305 (E210 61C6 14DF B480 C211 AE2A 0F12 1D7B 2211 1305) gpg --keyserver pgp.mit.edu --recv-keys 22111305Attachment: pgpMMz0wgmUWI.pgp
Description: PGP signature
- Follow-Ups:
- Re: [tlug] How to read / modify unified diffs
- From: Stephen J. Turnbull
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Subversion (was: More Tech Meeting Torrents)
- Next by Date: Re: [tlug] Subversion (was: More Tech Meeting Torrents)
- Previous by thread: Re: [tlug] Heading to Akihabara
- Next by thread: Re: [tlug] How to read / modify unified diffs
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links