Warning: some of the topics in this article are quite detailed and beyond the level I normally blog at. If you don’t know some of the terms or meanings, just download and explore the examples — keep reading and playing!
The title of this change is “Speedup method calls 1.2x”, which is a bit misleading…
There are many ways to change CPython, either by modifying the execution of the Opcode, or by adding new Opcodes. Adding new Opcodes requires a lot of discussion and testing and this change introduces new Opcodes. Opcodes are selected by the compilation process in CPython. Once your code is converted into an Abstract-Syntax-Tree, the compiler explores each branch and converts it to Opcodes. The execution of your code goes through the Opcodes in a massive switch statement inside a loop and calls different C-functions for each opcode.
For reference, Python 3.6 has 3 Opcodes for calling functions. All of these were either added or modified in Python 3.6.
Python 3.7 adds 2 new Opcodes,
[CALL_METHOD](https://github.com/python/cpython/blob/master/Python/ceval.c#L3021-L3105) for when the compiler sees
x.method(...) it uses these new Opcodes.
As an example calling 3 functions with different signatures:
from dis import dis def function(): return 1 def function_args(arg1): return arg1 def function_kwargs(arg1=None): return arg1 def test(): function() function_args(1) function_kwargs(arg1=1) dis(test)
Running this on Python 3.6 and Python 3.7 we can see no change in the resulting code, or performance.
Another example with bound methods (ie those belonging to an instance of a class),
from dis import dis class TestClass(object): def function(): return 1 def function_args(arg1): return arg1 def function_kwargs(arg1=None): return arg1 def test(): i = TestClass() i.function() i.function_args(1) i.function_kwargs(arg1=1) dis(test)
The results of this show:
LOAD_METHODopcode replaces loading bound methods as attributes and just calling them as normal functions. Remember,
CALL_METHODare faster than
CALL_FUNCTIONfor instance methods.
LOAD_ATTR , which is essentially getting the BoundMethod instance on the object instance.
[LOAD_METHOD](https://github.com/python/cpython/blob/master/Objects/object.c#L1074-L1155) is a copy of the logic in
LOAD_ATTR but better optimised when the method hasn’t been overridden and it has positional arguments.
Coming out of this, you might have some questions
No, because this speed boost is to remove object-related slow-downs
Keyword arguments require special treatment in the execution loop because there is no equivalent in C (which CPython is written in), some extra code has to compile 2 tuples to pass to the method.
Variable arguments, whether positional or keyword also require special treatment.
This change should encourage you in class-design to follow the DRY (don’t repeat yourself) principle and add private methods that reduce duplication of logic across multiple public methods. Prior to 3.7, the performance hit would have been a strong consideration and copying+pasting code was an accepted practice where speed was required.
In future, we might see more scenarios undergo similar treatment.
Some unicode characters have an unfortunate issue when scanning a string for occurrences using
str.find(x), seeing up to 25x slow down.
$ ./python -m perf timeit -s 's = "一丁丂七丄丅丆万丈三上下丌不与丏丐丑丒专且丕世丗丘丙业丛东丝丞丟丠両丢丣两严並丧丨丩个丫丬中丮丯丰丱串丳临丵丶丷丸丹为主丼丽举丿乀乁乂乃乄久乆乇么义乊之乌乍乐乑乒乓乔乕乖乗乘乙乚乛乜九乞也习乡乢乣乤乥书乧乨乩乪乫乬乭乮乯买乱乲乳乴乵乶乷乸乹乺乻乼乽乾乿亀亁亂亃亄亅了亇予争 亊事二亍于亏亐云互亓五井亖亗亘亙亚些亜亝亞亟亠亡亢亣交亥亦产亨亩亪享京亭亮亯亰亱亲亳亴亵亶亷亸亹人亻亼亽亾亿什仁仂仃仄仅仆仇仈仉今介仌仍从仏仐仑仒仓仔仕他仗付仙仚仛仜 仝仞仟仠仡仢代令以仦仧仨仩仪仫们仭仮仯仰仱仲仳仴仵件价仸仹仺任仼份仾仿"*100' -- 's.find("乎")' Unpatched: Median +- std dev: 761 us +- 108 us Patched: Median +- std dev: 117 us +- 9 us
In Python 3.7, the expected Unicode code-point size is no longer hard-coded and the methods are optimised long (mostly unusual) characters.
These are still slower, but now 3x slower than ASCII characters instead of 25x!
fwalk function in the
os module (only in Python 3) is a directory-tree generator.
It behaves exactly like walk(), except that it yields a 4-tuple response
(dirpath, dirnames, filenames, dirfd)
The change was to modify the implementation to use the
scandir method instead of
listdir, which is Operating-System optimised and much faster.
In the regular-expression module (
re) there is a method
compile which compiles a regular-expression string and an optional set of flags. These flags can be RegEx flags, passed to the RegEx library.
A change was made in Python 3.6 which slowed down this call when flags were passed which were integers. Python 3.7 “fixes” the slowdown but is still not as fast as Python 3.5
Per change change log
Matching and searching case-insensitive regular expressions is much slower than matching and searching case-sensitive regular expressions. Case-insensitivity requires converting every character in input string to lower case and disables some optimizations. But there are only 2669 cased characters (52 in ASCII mode). For all other characters in the pattern we can use case-sensitive matching.
The speed improvement is significant, if you’re matching ASCII characters you can see up to a 20x improvements in matching time since it’s now doing a lookup instead of running
lower() over each character.
☞ <a href="http://learnstartup.net/p/TwTrVN03m?utm_sour