Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

下了份今天最新的代码跑https客户端压力测试,跑一会崩了,重现了两次 #1835

Open
yuanfengyun opened this issue Dec 9, 2023 · 17 comments

Comments

@yuanfengyun
Copy link
Contributor

(gdb) bt
#0  0x00007f3952e317a5 in sha1_block_data_order_shaext () from /lib64/libcrypto.so.10
#1  0x00007f3952e2f50f in SHA1_Update () from /lib64/libcrypto.so.10
#2  0x00007f3952ee0537 in ssleay_rand_add.part.0 () from /lib64/libcrypto.so.10
#3  0x00007f3952ec90b4 in ECDSA_sign_ex () from /lib64/libcrypto.so.10
#4  0x00007f3952ec9139 in ECDSA_sign () from /lib64/libcrypto.so.10
#5  0x00007f3952ea6b70 in pkey_ec_sign () from /lib64/libcrypto.so.10
#6  0x00007f3952eeccbe in EVP_SignFinal () from /lib64/libcrypto.so.10
#7  0x00007f3952f5afbb in fips_pkey_signature_test () from /lib64/libcrypto.so.10
#8  0x00007f3952ea4055 in EC_KEY_generate_key () from /lib64/libcrypto.so.10
#9  0x00007f3953baba85 in ssl3_send_client_key_exchange () from /lib64/libssl.so.10
#10 0x00007f3953baf460 in ssl3_connect () from /lib64/libssl.so.10
#11 0x00007f3953dfd393 in _ltls_context_handshake (L=0x7f39261f8248) at lualib-src/ltls.c:180
#12 0x000000000042ce00 in precallC (f=0x7f3953dfd320 <_ltls_context_handshake>, nresults=1, func=<optimized out>, L=0x7f39261f8248) at ldo.c:529
#13 luaD_precall (L=L@entry=0x7f39261f8248, func=<optimized out>, nresults=1) at ldo.c:595
#14 0x000000000043b866 in luaV_execute (L=L@entry=0x7f39261f8248, ci=<optimized out>, ci@entry=0x7f3919618080) at lvm.c:1686
#15 0x000000000042c953 in unroll (L=0x7f39261f8248, ud=<optimized out>) at ldo.c:744
#16 0x000000000042c0aa in luaD_rawrunprotected (L=L@entry=0x7f39261f8248, f=0x42cf00 <resume>, ud=0x7f39581f9d7c) at ldo.c:144
#17 0x000000000042d1c0 in lua_resume (L=L@entry=0x7f39261f8248, from=from@entry=0x7f3933737ba8, nargs=nargs@entry=4, nresults=nresults@entry=0x7f39581f9dbc) at ldo.c:849
#18 0x00007f395edfcf8d in lua_resumeX (nresults=0x7f39581f9dbc, nargs=4, from=0x7f3933737ba8, L=0x7f39261f8248) at service-src/service_snlua.c:90
#19 auxresume (narg=4, co=0x7f39261f8248, L=0x7f3933737ba8) at service-src/service_snlua.c:146
#20 timing_resume (L=L@entry=0x7f3933737ba8, co_index=co_index@entry=1, n=4) at service-src/service_snlua.c:198
#21 0x00007f395edfd440 in luaB_coresume (L=0x7f3933737ba8) at service-src/service_snlua.c:217
#22 0x000000000042cb94 in precallC (f=0x7f395edfd410 <luaB_coresume>, nresults=-1, func=<optimized out>, L=0x7f3933737ba8) at ldo.c:529
#23 luaD_pretailcall (L=L@entry=0x7f3933737ba8, ci=ci@entry=0x7f3926160440, func=<optimized out>, narg1=<optimized out>, delta=delta@entry=6) at ldo.c:550
#24 0x000000000043b8e4 in luaV_execute (L=L@entry=0x7f3933737ba8, ci=<optimized out>) at lvm.c:1711
#25 0x000000000042d0eb in ccall (inc=65537, nResults=-1, func=0x7f392c64a8c0, L=0x7f3933737ba8) at ldo.c:637
#26 luaD_callnoyield (L=0x7f3933737ba8, func=<optimized out>, nResults=-1) at ldo.c:655
#27 0x000000000042c0aa in luaD_rawrunprotected (L=L@entry=0x7f3933737ba8, f=f@entry=0x4285b0 <f_call>, ud=ud@entry=0x7f39581fa0b0) at ldo.c:144
#28 0x000000000042d3de in luaD_pcall (L=L@entry=0x7f3933737ba8, func=func@entry=0x4285b0 <f_call>, u=u@entry=0x7f39581fa0b0, old_top=192, ef=<optimized out>) at ldo.c:953
#29 0x0000000000429d69 in lua_pcallk (L=L@entry=0x7f3933737ba8, nargs=<optimized out>, nresults=nresults@entry=-1, errfunc=errfunc@entry=0, ctx=ctx@entry=0, k=k@entry=0x444460 <finishpcall>) at lapi.c:1066
#30 0x00000000004444f0 in luaB_pcall (L=0x7f3933737ba8) at lbaselib.c:477
#31 0x000000000042ce00 in precallC (f=0x4444a0 <luaB_pcall>, nresults=2, func=<optimized out>, L=0x7f3933737ba8) at ldo.c:529
#32 luaD_precall (L=L@entry=0x7f3933737ba8, func=<optimized out>, nresults=2) at ldo.c:595
#33 0x000000000043b866 in luaV_execute (L=L@entry=0x7f3933737ba8, ci=<optimized out>) at lvm.c:1686
#34 0x000000000042d0eb in ccall (inc=65537, nResults=0, func=0x7f392c64a830, L=0x7f3933737ba8) at ldo.c:637
#35 luaD_callnoyield (L=0x7f3933737ba8, func=<optimized out>, nResults=0) at ldo.c:655
#36 0x000000000042c0aa in luaD_rawrunprotected (L=L@entry=0x7f3933737ba8, f=f@entry=0x4285b0 <f_call>, ud=ud@entry=0x7f39581fa370) at ldo.c:144
#37 0x000000000042d3de in luaD_pcall (L=L@entry=0x7f3933737ba8, func=func@entry=0x4285b0 <f_call>, u=u@entry=0x7f39581fa370, old_top=48, ef=<optimized out>) at ldo.c:953
#38 0x0000000000429d69 in lua_pcallk (L=L@entry=0x7f3933737ba8, nargs=nargs@entry=5, nresults=nresults@entry=0, errfunc=errfunc@entry=1, ctx=ctx@entry=0, k=k@entry=0x0) at lapi.c:1066
#39 0x00007f39547ddf9f in _cb (context=0x7f3959bd9480, ud=<optimized out>, type=6, session=0, source=0, msg=0x7f3905fc1340, sz=24) at lualib-src/lua-skynet.c:67
#40 0x00000000004209d7 in dispatch_message (ctx=ctx@entry=0x7f3959bd9480, msg=msg@entry=0x7f39581fa440) at skynet-src/skynet_server.c:275
#41 0x00000000004215d4 in skynet_context_message_dispatch (sm=sm@entry=0x7f3960008200, q=q@entry=0x7f39336c6800, weight=weight@entry=1) at skynet-src/skynet_server.c:335
#42 0x0000000000421d8d in thread_worker (p=<optimized out>) at skynet-src/skynet_start.c:163
#43 0x00007f39611e3ea5 in start_thread () from /lib64/libpthread.so.0
#44 0x00007f39605e8b0d in clone () from /lib64/libc.so.6
@cloudwu
Copy link
Owner

cloudwu commented Dec 9, 2023

信息不足,无法定位问题。建议自己进一步 debug 。

@Marskey
Copy link

Marskey commented Jun 2, 2024

@yuanfengyun 请问有找到问题么,我目前也遇到这样的问题

@hanxi
Copy link
Contributor

hanxi commented Jun 2, 2024

@yuanfengyun 请问有找到问题么,我目前也遇到这样的问题

能否提供一下复现的测试代码?

@Marskey
Copy link

Marskey commented Jun 3, 2024

@yuanfengyun 请问有找到问题么,我目前也遇到这样的问题

能否提供一下复现的测试代码?

我是外网正是环境出现的,内网还没复现

@firedtoad
Copy link

这是握手失败了,看起来是多线程的问题,你开tsan/asan看一下

@hanxi
Copy link
Contributor

hanxi commented Jun 3, 2024

@Marskey 你的 libssl 版本是多少?

@Marskey
Copy link

Marskey commented Jun 3, 2024

@Marskey 你的 libssl 版本是多少?

@hanxi OpenSSL 1.0.1e-fips

@Marskey
Copy link

Marskey commented Jun 3, 2024

#0  0x00007fb3410aba5c in ?? () from /usr/lib64/libcrypto.so.10
#1  0xca62c1d6ca62c1d6 in ?? ()
#2  0xca62c1d6ca62c1d6 in ?? ()
#3  0xca62c1d6ca62c1d6 in ?? ()
#4  0xca62c1d6ca62c1d6 in ?? ()
#5  0xca62c1d6ca62c1d6 in ?? ()
#6  0xca62c1d6ca62c1d6 in ?? ()
#7  0xca62c1d6ca62c1d6 in ?? ()
#8  0xca62c1d6ca62c1d6 in ?? ()
#9  0x00007fb34141680f in ?? () from /usr/lib64/libcrypto.so.10
#10 0x00007fb519a9ce40 in ?? ()
#11 0x0000000000000010 in ?? ()
#12 0x00007fb3410a8107 in SHA1_Update () from /usr/lib64/libcrypto.so.10
#13 0x00007fb34111c095 in ?? () from /usr/lib64/libcrypto.so.10
#14 0x00007fb3410e0538 in ?? () from /usr/lib64/libcrypto.so.10
#15 0x00007fb3410e01d8 in ?? () from /usr/lib64/libcrypto.so.10
#16 0x00007fb3410f5933 in EC_KEY_generate_key () from /usr/lib64/libcrypto.so.10
#17 0x00007fb341d9ba94 in ssl3_send_client_key_exchange () from /usr/lib64/libssl.so.10
#18 0x00007fb341d9c7a0 in ssl3_connect () from /usr/lib64/libssl.so.10
#19 0x00007fb341fe31d1 in _ltls_context_handshake (L=0x7fb4b0468288) at lualib-src/ltls.c:180
#20 0x00000000004165a1 in luaD_precall (L=L@entry=0x7fb4b0468288, func=func@entry=0x7fb3a6ffcda0, nresults=nresults@entry=1) at ldo.c:434
#21 0x00000000004218f7 in luaV_execute (L=L@entry=0x7fb4b0468288) at lvm.c:1125
#22 0x0000000000416330 in unroll (L=0x7fb4b0468288, ud=<optimized out>) at ldo.c:556
#23 0x0000000000415d0c in luaD_rawrunprotected (L=L@entry=0x7fb4b0468288, f=f@entry=0x416770 <resume>, ud=ud@entry=0x7fb57c3f5fbc) at ldo.c:142
#24 0x000000000041695f in lua_resume (L=L@entry=0x7fb4b0468288, from=from@entry=0x7fb4c053a808, nargs=nargs@entry=2) at ldo.c:664
#25 0x0000000000429aa7 in auxresume (L=L@entry=0x7fb4c053a808, co=co@entry=0x7fb4b0468288, narg=2) at lcorolib.c:39
#26 0x0000000000429dd7 in luaB_coresume (L=0x7fb4c053a808) at lcorolib.c:60
#27 0x00000000004165a1 in luaD_precall (L=L@entry=0x7fb4c053a808, func=func@entry=0x7fb530a4cc50, nresults=nresults@entry=-1) at ldo.c:434
#28 0x0000000000421656 in luaV_execute (L=L@entry=0x7fb4c053a808) at lvm.c:1141
#29 0x000000000041686f in luaD_call (L=L@entry=0x7fb4c053a808, func=<optimized out>, nResults=<optimized out>) at ldo.c:499
#30 0x00000000004168c1 in luaD_callnoyield (L=0x7fb4c053a808, func=<optimized out>, nResults=<optimized out>) at ldo.c:509
#31 0x0000000000415d0c in luaD_rawrunprotected (L=L@entry=0x7fb4c053a808, f=f@entry=0x412af0 <f_call>, ud=ud@entry=0x7fb57c3f62a0) at ldo.c:142
#32 0x0000000000416b8d in luaD_pcall (L=L@entry=0x7fb4c053a808, func=func@entry=0x412af0 <f_call>, u=u@entry=0x7fb57c3f62a0, old_top=176, ef=<optimized out>) at ldo.c:729
#33 0x0000000000413fdc in lua_pcallk (L=L@entry=0x7fb4c053a808, nargs=5, nresults=nresults@entry=-1, errfunc=errfunc@entry=0, ctx=ctx@entry=0, k=k@entry=0x428b30 <finishpcall>) at lapi.c:972
#34 0x0000000000428cc0 in luaB_pcall (L=0x7fb4c053a808) at lbaselib.c:424
#35 0x00000000004165a1 in luaD_precall (L=L@entry=0x7fb4c053a808, func=func@entry=0x7fb530a4ca90, nresults=nresults@entry=2) at ldo.c:434
#36 0x00000000004218f7 in luaV_execute (L=L@entry=0x7fb4c053a808) at lvm.c:1125
#37 0x000000000041686f in luaD_call (L=L@entry=0x7fb4c053a808, func=<optimized out>, nResults=<optimized out>) at ldo.c:499
#38 0x00000000004168c1 in luaD_callnoyield (L=0x7fb4c053a808, func=<optimized out>, nResults=<optimized out>) at ldo.c:509
#39 0x0000000000415d0c in luaD_rawrunprotected (L=L@entry=0x7fb4c053a808, f=f@entry=0x412af0 <f_call>, ud=ud@entry=0x7fb57c3f6540) at ldo.c:142
#40 0x0000000000416b8d in luaD_pcall (L=L@entry=0x7fb4c053a808, func=func@entry=0x412af0 <f_call>, u=u@entry=0x7fb57c3f6540, old_top=48, ef=<optimized out>) at ldo.c:729
#41 0x0000000000413fdc in lua_pcallk (L=L@entry=0x7fb4c053a808, nargs=nargs@entry=5, nresults=nresults@entry=0, errfunc=errfunc@entry=1, ctx=ctx@entry=0, k=k@entry=0x0) at lapi.c:972
#42 0x00007fb57a9dcdf9 in _cb (context=0x7fb490812780, ud=0x7fb4c053a808, type=6, session=0, source=0, msg=0x7faea2110060, sz=24) at lualib-src/lua-skynet.c:75
#43 0x000000000040bb25 in dispatch_message (ctx=ctx@entry=0x7fb490812780, msg=msg@entry=0x7fb57c3f6610) at skynet-src/skynet_server.c:274
#44 0x000000000040c6c0 in skynet_context_message_dispatch (sm=sm@entry=0x7fb57dc090e0, q=q@entry=0x7fb49081bd00, weight=weight@entry=-1) at skynet-src/skynet_server.c:334
#45 0x000000000040ce8d in thread_worker (p=<optimized out>) at skynet-src/skynet_start.c:163
#46 0x00007fb57edabd14 in start_thread (arg=0x7fb57c3f8700) at pthread_create.c:308
#47 0x00007fb57e1c0c4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

这个是我的堆栈信息和题主差不多

@hanxi
Copy link
Contributor

hanxi commented Jun 3, 2024

In OpenSSL 1.0.2 (and earlier), applications had to provide their own integration with locking and threads, as documented in the threads.pod file. This page starts with the following unfortunate text:

https://www.openssl.org/blog/blog/2017/02/21/threads/index.html

@Marskey 更新 openssl 试试?

@Marskey
Copy link

Marskey commented Jun 3, 2024

In OpenSSL 1.0.2 (and earlier), applications had to provide their own integration with locking and threads, as documented in the threads.pod file. This page starts with the following unfortunate text:

https://www.openssl.org/blog/blog/2017/02/21/threads/index.html

@Marskey 更新 openssl 试试?

不好意思,搞错了,线上环境是1.1.1, 然后我发现openssl有个相关的issue,不知道有没有关联openssl/openssl#12898

@huahua132
Copy link
Contributor

更新到3.0以上版本试试?

@Marskey
Copy link

Marskey commented Jun 4, 2024

已经知道问题了,就是多线程调用openssl的问题。给后续查类似问题的参考。不要在多个服务中来访问https协议

@huahua132
Copy link
Contributor

怎么复现?我尝试在本地虚拟机开多个服务调用https请求,并没有出现

@sniper00
Copy link
Contributor

sniper00 commented Jun 4, 2024

建议nginx 正反向代理,这样保持简洁,稳定,高性能

@Marskey
Copy link

Marskey commented Jun 4, 2024

@huahua132 接受端用 test/simpleweb.lua 把里面http改成https就可以了,openssl 版本1.0.1f 和 1.1.1都能复现

local skynet = require "skynet"
local httpc = require "http.httpc"

local mode = ...
local host = "https://127.0.0.1:8001"

if mode == "sub" then
   skynet.start(function()
   	skynet.dispatch("lua", function(_, _, cmd)
   		if cmd == "init" then
   			skynet.ret(skynet.pack(true))
   		elseif cmd == "https" then
               local respheader = {
                   time = os.time()
               }

               while true do
                   local status, body = httpc.post(host, "/", respheader)
                   print(status)
               end
   		end
   	end)
   end)
else
   skynet.start(function()
       if not pcall(require,"ltls.c") then
           print "No ltls module, https is not supported"
           return
       end

       local subs = {}
   	for _ = 1, 1000 do
   		local sub = skynet.newservice(SERVICE_NAME, "sub")
   		local ret = skynet.call(sub, "lua", "init")
           subs[sub] = ret
   	end

       for sub, ret in pairs(subs) do
           if ret then
               skynet.send(sub, "lua", "https")
           end
       end
   end)
end

@cloudwu
Copy link
Owner

cloudwu commented Jun 4, 2024

@lvzixun 是不是需要自己实现一个锁?

@davidhenrygao
Copy link

davidhenrygao commented Jun 5, 2024

还想提个 issue 反馈问题,看到这里在讨论,就在这里反馈好了。
我这边项目也遇到了, openssl 1.0.2n 版本,在多服务(多线程)使用 httpc 发送 https 请求会崩溃的问题。
因为 openssl 1.1 之前的版本,openssl 1.0.0 需要使用者调用这两 api 去设置自定义的锁:

CRYPTO_set_id_callback
CRYPTO_set_locking_callback

但目前 ltls 库并没有,那么对于使用 openssl 1.0 版本的项目来说,相关锁操作是空操作,多线程并发操作情况下,会导致某静态数组的静态变量索引超过限制值,再终会导致内存非法访问而崩溃。
OpenSSL and Threads
这边线上测试使用 1.1 或加上锁后使用 1.0,都暂时没崩溃了。另外在某些机器上,使用 1.0 不上锁,也不会触发崩溃,这个没去排查。core 文件堆栈就不贴了,如有需要我可以提供。
个人感觉,明确在代码注释或 wiki 里声明不支持 openssl 1.1 以下版本,然后把代码里如 SSLv23_method() 这些 deprecated 兼容接口改为 TLS_method(),可能更好一点。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants