PERFORMANCE COMPARISON USING ORM VERSUS RAW SQL

Recently used the so-called python web framework in the most powerful framework sanic, built the project’s basic environment and project framework, but whether to use ORM when committing to the difficulty of choice syndrome, for a framework that pursues performance, after using ORM, there must be some impact on performance, but how big is the impact? ORM exists for a reason, so can its advantages eliminate its performance loss?

I mainly use two ORMs, not so much two as one, but the sqlalchemy here is two different libraries, one is the sqlalchemy that comes with aiomysql, which I simply say, is the well-known sqlalchemy library.sqlalchemyaiomysql.sa

data preparation is a simple user table with id, username.

I was looking at the sanic tutorial, which mentioned how to use ORM, which is used as middleware to create a session to hook into the context of the request before processing each request, and then turn off the session when the request is closedsessionmaker

@app.middleware("request")async def inject_session(request):    request.ctx.session = sessionmaker(bind, AsyncSession, expire_on_commit=False)()    request.ctx.session_ctx_token = _base_model_session_ctx.set(request.ctx.session)

@app.middleware("response")async def close_session(request, response): if hasattr(request.ctx, "session_ctx_token"): _base_model_session_ctx.reset(request.ctx.session_ctx_token) await request.ctx.session.close()

copy the code

i have a vague feeling that this approach may bring some performance loss, if the request volume is large, each request has to recreate a session to process the database, and then release it, then if the request does not perform database operations, is it not wasted? and when a large number of requests come, the system also costs a certain amount of resources for this creation and closing, and these sessions can not be shared, in fact, can not say that can not be shared, i tried to hang the session to app.ctx, can also be used, but there will be a problem, if a request takes a long time, such as 5 seconds, then the next request if you want to use the session, then you have to wait until the end of the previous request can be reused.

so the following stress test is divided into two stages, the first is that there is no request and response middleware, then only the use of aiomysql library for stress testing.

the second stage is to add middleware before and after the request and response, where there will be three tests.

below are the results obtained from a stress test using wrk, 10 threads, 2000 connections, and a stress test of 10 seconds

the server opens 4 workers and uses uvloop

use a pure sql statement query

10 threads and 2000 connections  Thread Stats   Avg      Stdev     Max   +/- Stdev    Latency   558.19ms  200.95ms   1.32s    59.15%    Req/Sec   344.02    230.72     1.76k    68.29%  Latency Distribution     50%  583.50ms     75%  707.32ms     90%  819.57ms     99%    1.00s  33488 requests in 10.03s, 3.74MB readRequests/sec:   3337.65Transfer/sec:    381.35KB

copy the code

use aiomysql.sa stress test results

10 threads and 2000 connections  Thread Stats   Avg      Stdev     Max   +/- Stdev    Latency   905.52ms  206.57ms   1.40s    65.57%    Req/Sec   231.70    202.40     1.12k    72.10%  Latency Distribution     50%  909.34ms     75%    1.05s     90%    1.17s     99%    1.33s  20031 requests in 10.03s, 2.24MB readRequests/sec:   1997.86Transfer/sec:    228.27KB

copy the code

the following is the results of the stress test of adding a session hook before and after the request and response

stress test results when adding hooks before and after requests using sqlalchemy

10 threads and 2000 connections  Thread Stats   Avg      Stdev     Max   +/- Stdev    Latency   638.07ms  632.25ms   2.00s    82.41%    Req/Sec   148.81     61.53   380.00     71.04%  Latency Distribution     50%  126.46ms     75%    1.15s     90%    1.54s     99%    1.96s  14094 requests in 10.02s, 1.57MB read  Socket errors: connect 0, read 0, write 0, timeout 2510Requests/sec:   1406.67Transfer/sec:    160.72KB

copy the code

this time there is also a timeout error for timeout.

the results are not shown in detail, and i summarized a table

the following conclusions can be simply drawn:

  1. Measured purely in terms of throughput, pure SQL is 2.37 times the sqlalchemy that adds sessions
  2. adding a session before and after a request has some impact on performance, with a performance penalty of about 20%.

THE ABOVE IS JUST A SIMPLE COMPARISON OF THE PERFORMANCE DIFFERENCE BETWEEN PURE SQL AND ORM FROM THE THROUGHPUT, IN FACT, THERE IS SUCH A PERFORMANCE DIFFERENCE IS ALSO VERY UNDERSTANDABLE, ORM WILL DO SOME OBJECT PARSING WORK.

PERSONALLY, I STILL PREFER TO USE SQL DIRECTLY TO QUERY, I ALWAYS FEEL THAT ORM HAS TO BE FAMILIAR WITH ITS VARIOUS QUERY CONDITIONS, BUT ORM IS VERY CONVENIENT FOR SWITCHING DATABASES IN THE LATER STAGES, BUT HOW MANY PROJECTS WILL SWITCH DATABASES?

ORM is more difficult to control for some of the more complex SQL statements.

BUT IF YOU’RE JUST DOING A VERY SMALL SYSTEM AND THE REQUEST VOLUME ISN’T THAT LARGE, I THINK IT SHOULD BE OKAY TO USE ORM OR SQL.