Yesterday while playing around with pythons regex library re, i came across the expand function of the Match-Object class. It uses a handy template syntax supporting named backreferences to capturing groups.
Example:
$ python >>> import re >>> re.match("(?P<fst>.*?) (?P<snd>.*)", "a b").expand( "\g<snd> \g<fst>" ) 'b a'
... "\g<fst>" expands to the first capturing group ...
... "\g<snd>" expands to the second capturing group ...
Using this kind of string expansion i my script, i encountered that the performance was rather poor. So i was searching for a better/faster solution and started to measure execution times:
inital version - re expand with named backreferences:
$ python -m timeit "import re" 're.match("(?P<fst>.)(?P<snd>.)", "ab").expand("\g<snd>\g<fst>")' 10000 loops, best of 3: 34.1 usec per loop
improvement 1 - re expand with numeric backreferences #1:
$ python -m timeit "import re" 're.match("(?P<fst>.)(?P<snd>.)", "ab").expand("\g<2>\g<1>")' 10000 loops, best of 3: 25.2 usec per loop
... this version shows a slight performance improvement. But "\g<2>" can be written as "\2".
improvement 2 - re expand with numeric backreferences #2:
$ python -m timeit "import re" 're.match("(?P<fst>.)(?P<snd>.)", "ab").expand("\2\1")' 100000 loops, best of 3: 12.8 usec per loop
... this version is performing approximately 2.5 times better than the first. Anyway the handy named-backreference syntax is gone. Therefore i tried a complete different approach using the str.format function and the groupdict() function.
improvement 3 - string format:
$ python -m timeit "import re" '"{b}{a}".format( **re.match("(?P<a>.)(?P<b>.)", "ab").groupdict())' 100000 loops, best of 3: 3.78 usec per loop