<antirez>

antirez 16 minutes ago. 358 views.
Right now I'm working to an agent for my DS4 project. Local inference is token-poor, it's a battlefield where optimizations count. I was quite surprised by the fact the EDIT tool everybody is using right now forces the LLM to emit the old version of the text verbatim. This CAS (check and set) mode of operation, where I say EDIT old="foo" new="bar", is needed because there are often colliding edits (the user is editing as well, or checked out a different branch, and so forth) and because the LLM can just hallucinate that a given line had a given content.

This means, basically, that just using line numbers is very fragile: to say, change line 22 with new="foobar" is not good. Yet I don't want my local LLM to throw away tokens rewriting the old text each time, also because certain times the old text has a lot of special chars and spaces that the model may get wrong; in this case the tool would fail, forcing the LLM to do the same edit again. So I designed a tag-based EDIT tool that is still CAS style, but more tokens efficient.

The READ and SEARCH tools return something like that:

  10:Q8fA int count = 10;
  11:rA3_ if (count > limit) {
  12:Kq9z     count = limit;
  13:PX0b }

So there are line numbers and tags. The tag is 4 chars, on average 2.5 LLM tokens, representing a checksum of the line. Now the LLM can edit like this:

  {
    "tool": "edit",
    "path": "/tmp/example.c",
    "line": 10,
    "tag": "Q8fA",
    "new": "int count = 11;"
  }

Or, multi line, like this:

  {
    "tool": "edit",
    "path": "/tmp/example.c",
    "lines": "11:rA3_\n12:Kq9z\n13:PX0b",
    "new": "if (count > limit)\n    return limit;"
  }

The saving is significant especially when the agent is deleting big amounts of text, but also in the general case. However, there is some overhead due to the fact we have line numbers and tags. There are potential tradeoffs, maybe the tag should be 8 chars and include the line number in the hash, there is to check exactly collisions possibilities and tokenization to see how much this is a win, but I like the line:tag format as later the LLM is often able to exploit the line information in many ways, like to get ranges in successive tool calls. Maybe there are other ways to exploit the tag, too, like: is this line still dj4_?

The interesting thing is that DeepSeek v4 Flash is able to use this tool in a very effective way, so apparently it is natural for it. And while I did't measure the exact savings I saw in the field that edits are much faster and even more reliable.
blog comments powered by Disqus
: