[Tutorial] Fast Identification - Printable Version +- Support Forums (https://www.supportforums.net) +-- Forum: Categories (https://www.supportforums.net/forumdisplay.php?fid=87) +--- Forum: Coding Support Forums (https://www.supportforums.net/forumdisplay.php?fid=18) +---- Forum: Programming with C++ (https://www.supportforums.net/forumdisplay.php?fid=20) +---- Thread: [Tutorial] Fast Identification (/showthread.php?tid=4163) |
[Tutorial] Fast Identification - MrD. - 01-04-2010 This is an updated version of an article I originally posted on my website. Introduction Often you want to be able to refer to something in an application by name, say you had a scene-graph of nodes and wanted to be able to find a node by it's name for simplicity. The problem with this is that a string compare is a relatively slow operation, and when performance is key, should be avoided. However there is a way to be able to refer to something by name, but still do so in a timely manner; this tutorial will guide you through creating a class which uses a hashed string to make comparing a string almost as fast as comparing an integer. Hashing a String Hashing a string involves converting it from it's lexical form, into an integral form which should hopefully uniquely represent the string within the context that you need it to. There are a load of ways to hash a string, CRC and MD5 are just to name two; if you have a compiler with TR1 support, there is even a hash function in that. My preferred method of hashing a string in C++ is to use the MurmurHash2 algorithm by Austin Appleby (the code for which can be found here) and is the method that I will use in this tutorial to hash the string. The Identification Class The identification class itself is really simple, all it needs is two member variables (one for the string, and one for the hash), a function to get/set the string and another to get the hash, and finally some overloaded comparison operators to make it easier to compare the identification objects. The header for the class is shown below. Code: #ifndef CIDENT_H See, I told you it was simple (don't be intimated by all the comparison operators, they are all doing pretty much the same thing). All that's left to do is define the functions to set the identifier, and to compare the hashes. Code: // ------------------------------------------------------------------ And as you can see, aside from the constructors, there are only two functions, both of which are really small and simple. The setIdent function just sets the string to the string passed in, and then hashes the string with MurmurHash2 and stores the returned hash. The overloaded comparison operator just compares the hash of one string against the hash of the string passed in (the rest of the comparison operators are defined inline in the header file). Conclusion If you use that class where you would have used a string to identify something, then you will have a pre-computed hash available for comparison using an unsigned integer compare; which is much faster than a string compare even taking into consideration the overhead of hashing the string (which is why it is better to store the hash in the class than calculate it everytime*). Obviously, the more times you compare the hash without having to hash another string (say, iterating through 100 nodes looking for one by name), the greater the overall benefit. *If you are generating the hash every time for everything then this method will be even slower than a string compare due to the hashing overhead. You must store an instance of the identifier somewhere (usually on the object it relates to) to get the maximum efficiency. RE: [Tutorial] Fast Identification - Gaijin - 01-04-2010 Amazing, really good work MrD.! This will come in handy soon!... But what's with the TAB size.... I needed a bit to understand what where was, but it's nothing big RE: [Tutorial] Fast Identification - MrD. - 01-04-2010 Heh The tab spacing is something I acquired from the coding standards at work. One column for the return types, one column for the members (with parameters each on their own line) and then a final column for doxygen comments. It looks a bit weird when you first see it, but now I'm used to it I can read it just like English and use it all the time. RE: [Tutorial] Fast Identification - Gaijin - 01-04-2010 Ahhh, I get it... I does look bit weird but as you said, I'm now able to read it normal..... Lines like those was a small pain... but yeah it's history ;) Code: inline bool operator==( |