Fork me on GitHub

PHP Fine Diff

This page demonstrate the FineDiff class (as in “fine granularity diff”) I wrote – starting from scratch – to generate a lossless (won't eat your line breaks), compact opcodes string listing the sequence of atomic actions (copy/delete/insert) necessary to transform one string into another (thereafter referred as the “From” and “To” string). The “To” string can be rebuilt by running the opcodes string on the “From” string. The FineDiff class allows to specify the granularity, and up to character-level granularity is possible, in order to generate the smallest diff possible (at the potential cost of increased CPU cycles.)

Typical usage:

include 'finediff.php';
$opcodes = FineDiff::getDiffOpcodes($from_text, $to_text /* , default granularity is set to character */);
// store opcodes for later use...

Later, $to_text can be re-created from $from_text using $opcodes as follow:

include 'finediff.php';
$to_text = FineDiff::renderToTextFromOpcodes($from_text, $opcodes);

Try it by inserting your own text, or Use sample text, or Start from scratch, or just use the plain Online diff viewer:

From:

To:

Granularity: Paragraph/lines  Sentence  Word  Character Text_Diff lib (for comparison purpose) see notes

Diff stats:

Diff execution time: 0.000 sec
"From" size: 0 bytes
"To" size: 0 bytes
Diff opcodes size: 0 bytes (0.0 % of "To")
Diff opcodes ( =copy,  =delete,  =insert,  =replace):

Rendered Diff: Show Deletions only All Insertions only

Notes

The PHP-based engine of Text_Diff is forced, in order to meaningfully compare results with PHP-based FineDiff. Text_Diff is naturally geared toward line-level granularity, and to compute diff for a higher granularity (sequences, words, characters), line break characters (\n, \r) are replaced in order to avoid having Text_Diff from eating our line breaks — so extra steps are required.

FineDiff is natively better equipped to generate diff at granularity higher than line levels. An example of this is that using the above built-in sample text, for word and character-level granularity, FineDiff roughly executes in 25 ms and 30 ms, respectively, while Text_Diff roughly executes in 75 ms and 6.5 seconds, respectively (on my development computer, a run of the mill Intel i5 core desktop computer).

If you wish to comment on this page, head to the associated blog entry: FineDiff, a character-level diff algorithm in PHP