

If you wanted to, you could even provide an option for saving the per-file encoding info to disk somewhere, thus making the user's encoding selection persistent between editing sessions.Īs a general-purpose hex editor, having an option to interpret an entire file in some particular text encoding seems like a good idea, since there are lots of text files that use only one encoding. If automatic detection can't work, than my personal opinion is that letting the user select a range of bytes and then say "display these bytes according to text encoding X" is the next best alternative both in general and specifically for ROM hacking.


In the ROM hacking world, generally speaking, it is not safe to assume that any ROM file is composed entirely of text (or even if it is, there's no guarantee that text will be encoded in any standard encoding), so even for fairly identifiable encodings like UTF-8, automatic text detection is pretty much doomed to failure from the start, which appears to be the conclusion you've also reached. If anybody has some insights into these issues please comment, or if I am in the wrong sub-forum please move it, or refer me to the right sites/people to ask. Quite a few hex editors claim to support Shift-JIS, but I wonder how this should be possible (without just basically displaying randomly changing results as you scroll or do other things like selecting etc.). And you can't just start at the beginning (or a little before) of the visible hex dump, since that would cause all kinds of oddities while scrolling (since you keep changing the point of reference). Where does a string start? Where do you start to parse? You can't always do it at the start of the file for performance reasons (and even if you did, you could get out of synch because of bad bytes, or because various strings are aligned differently). Not to mention that handling text entry is not trivial either. In other words, what text (that the file actually contains) you can see and what not is really pretty unpredictable, and so I wonder if there is even much use for showing Shift-JIS in a text column of a hex editor. So even if you would add some kind of correcting offset, to find the right start for a string, this would only be true for a portion of the visible file. And as that information is necessary to distinguish the lead bytes from the trail bytes, this is essentially a deal breaker.įurthermore, even if you find the start byte, and you can get out of synch again because of bad bytes (such as random binary data separating actual Shift-JIS strings), but you cannot recover either because most lead and trail bytes share the same values (i.e., are not distinguishable). So far however I found no way to represent Shift-JIS text in a hex editor, since there is no way to know where a string starts. While doing that I also had a look again at supporting Shift-JIS, which I have been asked about by a couple members in the ROM hacking community. I am the author of HxD, and currently looking into implementing UTF-8 support.
