Inside a Text Editor
Ever since my college days, after dabbling with vi and a few other editors, I always had an yearning to create my own. Now, I am still stuck with XEmacs and jEdit and had a chance to compile / study the sources and documentation of EMACS and a free editor component called Scintilla.
Until now, I was under the the belief that text editors used a doubly linked list to represent the text in memory. The advantages of this approach being insertions and deletions are much more easier which is just a matter of just un-linking a node off the list. But the shortcomming is that they tend to fragment memory with each node or line take a bit of memory.
The other alternative approach is to have a dynamic array which is a contiguous space of memory and can sometimes be directly written off to a file. The disadvantages are that insertion and deletion are costly and you need to reallocate quite frequently.
While goint throug the source and documentation of text editors, I chanced upon the Buffer Gap algorithm that provides the benefits of a contiguous space and yet minimizes the impact of insertions or deletions.
The algorithm is simple where the text is represented as an array of characters. There is a gap in the array that can be moved or repositioned. Every time a few new characters are inserted, the gap is moved to the location where the characters are to be inserted and it is just a matter of using the gap as a buffer to insert the new set of characters. The end result is, the gap size has reduced. Deletion is a similar affair where the gap is simply extended to include the area that needs to be deleted.
Wikipedia has a short description of the same algorithm.
Ever since my college days, after dabbling with vi and a few other editors, I always had an yearning to create my own. Now, I am still stuck with XEmacs and jEdit and had a chance to compile / study the sources and documentation of EMACS and a free editor component called Scintilla.
Until now, I was under the the belief that text editors used a doubly linked list to represent the text in memory. The advantages of this approach being insertions and deletions are much more easier which is just a matter of just un-linking a node off the list. But the shortcomming is that they tend to fragment memory with each node or line take a bit of memory.
The other alternative approach is to have a dynamic array which is a contiguous space of memory and can sometimes be directly written off to a file. The disadvantages are that insertion and deletion are costly and you need to reallocate quite frequently.
While goint throug the source and documentation of text editors, I chanced upon the Buffer Gap algorithm that provides the benefits of a contiguous space and yet minimizes the impact of insertions or deletions.
The algorithm is simple where the text is represented as an array of characters. There is a gap in the array that can be moved or repositioned. Every time a few new characters are inserted, the gap is moved to the location where the characters are to be inserted and it is just a matter of using the gap as a buffer to insert the new set of characters. The end result is, the gap size has reduced. Deletion is a similar affair where the gap is simply extended to include the area that needs to be deleted.
Wikipedia has a short description of the same algorithm.
Comments