Sunday, May 04, 2008

Inside a Text Editor

Ever since my college days, after dabbling with vi and a few other editors, I always had an yearning to create my own. Now, I am still stuck with XEmacs and jEdit and had a chance to compile / study the sources and documentation of EMACS and a free editor component called Scintilla.

Until now, I was under the the belief that text editors used a doubly linked list to represent the text in memory. The advantages of this approach being insertions and deletions are much more easier which is just a matter of just un-linking a node off the list. But the shortcomming is that they tend to fragment memory with each node or line take a bit of memory.

The other alternative approach is to have a dynamic array which is a contiguous space of memory and can sometimes be directly written off to a file. The disadvantages are that insertion and deletion are costly and you need to reallocate quite frequently.

While goint throug the source and documentation of text editors, I chanced upon the Buffer Gap algorithm that provides the benefits of a contiguous space and yet minimizes the impact of insertions or deletions.

The algorithm is simple where the text is represented as an array of characters. There is a gap in the array that can be moved or repositioned. Every time a few new characters are inserted, the gap is moved to the location where the characters are to be inserted and it is just a matter of using the gap as a buffer to insert the new set of characters. The end result is, the gap size has reduced. Deletion is a similar affair where the gap is simply extended to include the area that needs to be deleted.

Wikipedia has a short description of the same algorithm.

1 comment:

Shoban Jayaraj said...

I had created my one generic version of the buffer gap algorithm. But also chanced upon a similar implementation at http://www.codeproject.com/KB/recipes/GenericGapBuffer.aspx just now...