I'm looking to make a fast streaming download -> upload to move large files via HTTP from one server to another.
During this, I've noticed that httplib, that is used by urllib3 and therefore also requests, seems to hard code how much it fetches from a stream at a time to 8192 bytes
Why is this? What is the benefit of 8192 over other sizes?
From what I found, the block size should be resources's page size but since pagesize is only available on UNIX, this was hardcoded to 8192 so all other systems specially Windows do not get blocked on this. Otherwise there is no other reason to hardcode it.
This is from nginx
Syntax: client_body_buffer_size size; Default: client_body_buffer_size 8k|16k;
Sets buffer size for reading client request body. In case the request body is larger than the buffer, the whole body or only its part is written to a temporary file. By default, buffer size is equal to two memory pages. This is 8K on x86, other 32-bit platforms, and x86-64. It is usually 16K on other 64-bit platforms
ProxyIOBufferSize Directive Description: Determine size of internal data throughput buffer Syntax: ProxyIOBufferSize bytes Default: ProxyIOBufferSize 8192 Context: server config, virtual host Status: Extension Module: mod_proxy
So Apache also uses
8192 by default as the proxy buffer size.
The apache Java client documentation indicates
8192byte socket buffers.
In ruby the value is set by default
Then there are below thread
If you look at many of this the consensus lies on 8K/16K as the buffer size. And it is not that it should be fixed to that but configurable and 8k/16K should be good enough for most situations. So I don't see a problem with Python also using that 8K by default. But yes it should have been configurable
Python 3.7 will have it configurable as such but then that may not help your cause if you can't upgrade to the same.