libreoffice-online/net/SslSocket.hpp

473 lines
15 KiB
C++
Raw Normal View History

/* -*- Mode: C++; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4; fill-column: 100 -*- */
/*
* This Source Code Form is subject to the terms of the Mozilla Public
* License, v. 2.0. If a copy of the MPL was not distributed with this
* file, You can obtain one at http://mozilla.org/MPL/2.0/.
*/
#pragma once
#include <cerrno>
#include <memory>
wsd: reduce debug and generally redundant logs In 'debug' log-level we expect a detailed, but still readable output. Having one area with disproportionately large number of logs reduces the overall utility of the log output. This patch reduces a number of redundant log entries, including errors that are already logged. It also reduces the level of some others from 'information' to 'debug' and from 'debug' to 'trace'. The goal is to make 'debug' level as useful as possible to read the progress and be able to understand what was going on, such that one is able to decide which area to dig deeper into. Then, trace level could be used to get more insight into that area, if necessary. For example, when investigating a test failure, one first enables 'debug' logs and reads through. Once a section between two debug entries is identified as being of interest, enabling 'trace' level logs becomes more productive as it's now possible to easily reach the first DBG entry and read through until the second one. It's unfortunate that we don't have per-area control for enabling/disabling logs, so it is common to see more and more 'debug' log entries added all around, making logs less and less readable. It is also a limitation of the levels we have that we really only have 3 usable levels: one, two, many. That is, 'information' for the most important events, 'debug' for technical details needed to investigate issues, and 'trace' for everything else. ('warning' and 'error' aren't really 'levels'; they have semantics that makes them special-cases.) So we have to avoid degrading one into the other, or have differences without distinction. If any of these entries are needed to be displayed more frequently, changing them back to 'debug' or even 'information' should be done. Though for me they seem special cases that don't benefit most log readings. Change-Id: Id2c6a9dc027483b81a066b0b4b50a298c5eff449 Signed-off-by: Ashod Nakashian <ashod.nakashian@collabora.co.uk>
2022-02-19 07:30:26 -06:00
#include <sstream>
#include <string>
wsd: reduce polling when blocked on ssl handshake While SSL is handshaking, there can be no general application data communication. During that early stage of connecting we have data to send (the request, headers, etc.) and so we poll on POLLOUT. Naturally, we also always want to poll on POLLIN, because we can never know when there is data to read (especially true for web-sockets). The problem is when SSL will not send data just yet because it is handshaking. It is typically waiting for handshake negotiation data to read, so when we POLLOUT, poll immediately returns, but writing (via SSL_write) fails with WANTS_READ error. This goes on in a busy-loop until the negotiation data is available for read and the handshake is completed. Very inefficient. The solution is to poll on whatever SSL needs during the handshake, exclusively. Once the handshake is complete, we poll on whatever we need. However, SSL can renegotiate at any time, so we also merge with what it needs. In addition, we avoid the unnecessary read when poll doesn't give us POLLIN in revents, since the read will more likely than not fail (except in the rare case when data becomes available in the interim). Notice that SSL_read will return SSL_WANTS_READ when there is no data, which is misleading (since SSL isn't in need of data to read at all, nor are we, for that matter). Best not to do noisy reads unnecessarily. These changes are disabled by default and can be enabled via the experimental_features option. Change-Id: I6a7ed7d871ed257b30062cc720a8b8c7acbab3b7 Signed-off-by: Ashod Nakashian <ashod.nakashian@collabora.co.uk>
2022-03-16 06:13:02 -05:00
#include "Common.hpp"
#include "Ssl.hpp"
#include "Socket.hpp"
/// An SSL/TSL, non-blocking, data streaming socket.
class SslStreamSocket final : public StreamSocket
{
public:
SslStreamSocket(const std::string& host, const int fd, bool isClient,
std::shared_ptr<ProtocolHandlerInterface> responseClient,
ReadType readType = NormalRead)
: StreamSocket(host, fd, isClient, std::move(responseClient), readType)
, _bio(nullptr)
, _ssl(nullptr)
, _sslWantsTo(SslWantsTo::Neither)
, _doHandshake(true)
{
wsd: reduce debug and generally redundant logs In 'debug' log-level we expect a detailed, but still readable output. Having one area with disproportionately large number of logs reduces the overall utility of the log output. This patch reduces a number of redundant log entries, including errors that are already logged. It also reduces the level of some others from 'information' to 'debug' and from 'debug' to 'trace'. The goal is to make 'debug' level as useful as possible to read the progress and be able to understand what was going on, such that one is able to decide which area to dig deeper into. Then, trace level could be used to get more insight into that area, if necessary. For example, when investigating a test failure, one first enables 'debug' logs and reads through. Once a section between two debug entries is identified as being of interest, enabling 'trace' level logs becomes more productive as it's now possible to easily reach the first DBG entry and read through until the second one. It's unfortunate that we don't have per-area control for enabling/disabling logs, so it is common to see more and more 'debug' log entries added all around, making logs less and less readable. It is also a limitation of the levels we have that we really only have 3 usable levels: one, two, many. That is, 'information' for the most important events, 'debug' for technical details needed to investigate issues, and 'trace' for everything else. ('warning' and 'error' aren't really 'levels'; they have semantics that makes them special-cases.) So we have to avoid degrading one into the other, or have differences without distinction. If any of these entries are needed to be displayed more frequently, changing them back to 'debug' or even 'information' should be done. Though for me they seem special cases that don't benefit most log readings. Change-Id: Id2c6a9dc027483b81a066b0b4b50a298c5eff449 Signed-off-by: Ashod Nakashian <ashod.nakashian@collabora.co.uk>
2022-02-19 07:30:26 -06:00
LOG_TRC("SslStreamSocket ctor #" << fd);
_bio = BIO_new(BIO_s_socket());
if (_bio == nullptr)
{
throw std::runtime_error("Failed to create SSL BIO.");
}
BIO_set_fd(_bio, fd, BIO_NOCLOSE);
_ssl = isClient ? ssl::Manager::newClientSsl(_verification)
: ssl::Manager::newServerSsl(_verification);
if (!_ssl)
{
BIO_free(_bio);
_bio = nullptr;
throw std::runtime_error("Failed to create SSL.");
}
if (!hostname().empty())
{
if (!SSL_set_tlsext_host_name(_ssl, hostname().c_str()))
LOG_WRN("Failed to set hostname for Server Name Indication [" << hostname() << ']');
else
LOG_TRC("Set [" << hostname() << "] as TLS hostname.");
}
SSL_set_bio(_ssl, _bio, _bio);
if (isClient)
{
LOG_TRC("Setting SSL into connect state");
SSL_set_connect_state(_ssl);
if (SSL_connect(_ssl) == 0)
LOG_DBG("SslStreamSocket connect #" << getFD() << " failed ");
// else -1 is quite possibly SSL_ERROR_WANT_READ
}
else // We are a server-side socket.
{
LOG_TRC("Setting SSL into accept state");
SSL_set_accept_state(_ssl);
}
}
~SslStreamSocket()
{
wsd: reduce debug and generally redundant logs In 'debug' log-level we expect a detailed, but still readable output. Having one area with disproportionately large number of logs reduces the overall utility of the log output. This patch reduces a number of redundant log entries, including errors that are already logged. It also reduces the level of some others from 'information' to 'debug' and from 'debug' to 'trace'. The goal is to make 'debug' level as useful as possible to read the progress and be able to understand what was going on, such that one is able to decide which area to dig deeper into. Then, trace level could be used to get more insight into that area, if necessary. For example, when investigating a test failure, one first enables 'debug' logs and reads through. Once a section between two debug entries is identified as being of interest, enabling 'trace' level logs becomes more productive as it's now possible to easily reach the first DBG entry and read through until the second one. It's unfortunate that we don't have per-area control for enabling/disabling logs, so it is common to see more and more 'debug' log entries added all around, making logs less and less readable. It is also a limitation of the levels we have that we really only have 3 usable levels: one, two, many. That is, 'information' for the most important events, 'debug' for technical details needed to investigate issues, and 'trace' for everything else. ('warning' and 'error' aren't really 'levels'; they have semantics that makes them special-cases.) So we have to avoid degrading one into the other, or have differences without distinction. If any of these entries are needed to be displayed more frequently, changing them back to 'debug' or even 'information' should be done. Though for me they seem special cases that don't benefit most log readings. Change-Id: Id2c6a9dc027483b81a066b0b4b50a298c5eff449 Signed-off-by: Ashod Nakashian <ashod.nakashian@collabora.co.uk>
2022-02-19 07:30:26 -06:00
LOG_TRC("SslStreamSocket dtor #" << getFD());
if (!isShutdownSignalled())
{
setShutdownSignalled();
SslStreamSocket::closeConnection();
}
SSL_free(_ssl);
}
/// Returns the servername per the SSL, as set by
/// SSL_set_tlsext_host_name, if called. For testing.
std::string getSslServername() const
{
const int type = SSL_get_servername_type(_ssl);
if (type != -1)
{
const char* name = SSL_get_servername(_ssl, type);
if (name)
{
return std::string(name);
}
}
return std::string();
}
/// Shutdown the TLS/SSL connection properly.
void closeConnection() override
{
LOG_DBG("SslStreamSocket::closeConnection() #" << getFD());
if (SSL_shutdown(_ssl) == 0)
{
// Complete the bidirectional shutdown.
SSL_shutdown(_ssl);
}
// Close the TCP Socket.
Socket::shutdown();
}
int readIncomingData() override
{
ASSERT_CORRECT_SOCKET_THREAD(this);
const int rc = doHandshake();
if (rc <= 0)
wsd: return SSL_do_handshake error-codes unmodified For some reason we only checked for the closed socket case as failure and returned 0 in that case, for error cases, we returned 1. Likely the API had been modified in the early days, but this return code was left lopsided. This meant that even when the handshake failed, we still called readIncomingData or writeOutgoindData, depending on whether we wanted to read or write, causing a rare race-condition. When a client (HTTP request) connects to a server, it needs to send the request, right after the SSL handshake. SSL_do_handshake could need data from the socket to complete the handshake. In such a case it returns WANT_READ. Unfortunately, because we always called SSL_read, the missing data could have arrived between the SSL_do_handshake call and the SSL_read call (a rather short duration, to be sure, but an open window all the same). SSL_read would of course read said data from the socket and, since it still needs to finish the handshake, will buffer it. It then returns the very same error that the SSL_do_handshake returned: WANT_READ. Of course we will oblige by polling with POLLIN, which will time out (there is no more data to come, and the server is waiting for *our* request and has nothing to send us). The only way this deadlock could break if SSL_do_handshake was called (which will consume the buffered data, return 1 to indicate handshake has completed). Since we wouldn't call it unless and until we get POLLIN, per WANT_READ, which won't happen in this case. And since SSL_read doesn't call SSL_do_handshake either, the request times out and that's the end of it. The fix is to not call SSL_read when the handshake isn't complete and needs more data, which we do now. Fortunately, we have very few SSL clients, outside of unit-tests. Most notably the WOPI client. But even then it's not a heavily used connection and might not even be SSL-enabled (for LAN servers). Change-Id: I04fd3dae151904194f3d7579dbf8c671b2580ffb Signed-off-by: Ashod Nakashian <ashod.nakashian@collabora.co.uk>
2023-04-15 09:27:11 -05:00
{
// 0 means shutdown.
return rc;
wsd: return SSL_do_handshake error-codes unmodified For some reason we only checked for the closed socket case as failure and returned 0 in that case, for error cases, we returned 1. Likely the API had been modified in the early days, but this return code was left lopsided. This meant that even when the handshake failed, we still called readIncomingData or writeOutgoindData, depending on whether we wanted to read or write, causing a rare race-condition. When a client (HTTP request) connects to a server, it needs to send the request, right after the SSL handshake. SSL_do_handshake could need data from the socket to complete the handshake. In such a case it returns WANT_READ. Unfortunately, because we always called SSL_read, the missing data could have arrived between the SSL_do_handshake call and the SSL_read call (a rather short duration, to be sure, but an open window all the same). SSL_read would of course read said data from the socket and, since it still needs to finish the handshake, will buffer it. It then returns the very same error that the SSL_do_handshake returned: WANT_READ. Of course we will oblige by polling with POLLIN, which will time out (there is no more data to come, and the server is waiting for *our* request and has nothing to send us). The only way this deadlock could break if SSL_do_handshake was called (which will consume the buffered data, return 1 to indicate handshake has completed). Since we wouldn't call it unless and until we get POLLIN, per WANT_READ, which won't happen in this case. And since SSL_read doesn't call SSL_do_handshake either, the request times out and that's the end of it. The fix is to not call SSL_read when the handshake isn't complete and needs more data, which we do now. Fortunately, we have very few SSL clients, outside of unit-tests. Most notably the WOPI client. But even then it's not a heavily used connection and might not even be SSL-enabled (for LAN servers). Change-Id: I04fd3dae151904194f3d7579dbf8c671b2580ffb Signed-off-by: Ashod Nakashian <ashod.nakashian@collabora.co.uk>
2023-04-15 09:27:11 -05:00
}
// Default implementation.
return StreamSocket::readIncomingData();
}
int writeOutgoingData() override
{
ASSERT_CORRECT_SOCKET_THREAD(this);
const int rc = doHandshake();
if (rc <= 0)
{
wsd: return SSL_do_handshake error-codes unmodified For some reason we only checked for the closed socket case as failure and returned 0 in that case, for error cases, we returned 1. Likely the API had been modified in the early days, but this return code was left lopsided. This meant that even when the handshake failed, we still called readIncomingData or writeOutgoindData, depending on whether we wanted to read or write, causing a rare race-condition. When a client (HTTP request) connects to a server, it needs to send the request, right after the SSL handshake. SSL_do_handshake could need data from the socket to complete the handshake. In such a case it returns WANT_READ. Unfortunately, because we always called SSL_read, the missing data could have arrived between the SSL_do_handshake call and the SSL_read call (a rather short duration, to be sure, but an open window all the same). SSL_read would of course read said data from the socket and, since it still needs to finish the handshake, will buffer it. It then returns the very same error that the SSL_do_handshake returned: WANT_READ. Of course we will oblige by polling with POLLIN, which will time out (there is no more data to come, and the server is waiting for *our* request and has nothing to send us). The only way this deadlock could break if SSL_do_handshake was called (which will consume the buffered data, return 1 to indicate handshake has completed). Since we wouldn't call it unless and until we get POLLIN, per WANT_READ, which won't happen in this case. And since SSL_read doesn't call SSL_do_handshake either, the request times out and that's the end of it. The fix is to not call SSL_read when the handshake isn't complete and needs more data, which we do now. Fortunately, we have very few SSL clients, outside of unit-tests. Most notably the WOPI client. But even then it's not a heavily used connection and might not even be SSL-enabled (for LAN servers). Change-Id: I04fd3dae151904194f3d7579dbf8c671b2580ffb Signed-off-by: Ashod Nakashian <ashod.nakashian@collabora.co.uk>
2023-04-15 09:27:11 -05:00
// 0 means shutdown.
return rc;
}
// Default implementation.
return StreamSocket::writeOutgoingData();
}
virtual int readData(char* buf, int len) override
{
ASSERT_CORRECT_SOCKET_THREAD(this);
// avoided in readIncomingData
if (ignoringInput())
return -1;
#if ENABLE_DEBUG
if (simulateSocketError(true))
return -1;
#endif
return handleSslState(SSL_read(_ssl, buf, len), "read");
}
virtual int writeData(const char* buf, const int len) override
{
ASSERT_CORRECT_SOCKET_THREAD(this);
assert (len > 0); // Never write 0 bytes.
#if ENABLE_DEBUG
if (simulateSocketError(false))
return -1;
#endif
return handleSslState(SSL_write(_ssl, buf, len), "write");
}
int getPollEvents(std::chrono::steady_clock::time_point now,
int64_t & timeoutMaxMicroS) override
{
ASSERT_CORRECT_SOCKET_THREAD(this);
wsd: reduce polling when blocked on ssl handshake While SSL is handshaking, there can be no general application data communication. During that early stage of connecting we have data to send (the request, headers, etc.) and so we poll on POLLOUT. Naturally, we also always want to poll on POLLIN, because we can never know when there is data to read (especially true for web-sockets). The problem is when SSL will not send data just yet because it is handshaking. It is typically waiting for handshake negotiation data to read, so when we POLLOUT, poll immediately returns, but writing (via SSL_write) fails with WANTS_READ error. This goes on in a busy-loop until the negotiation data is available for read and the handshake is completed. Very inefficient. The solution is to poll on whatever SSL needs during the handshake, exclusively. Once the handshake is complete, we poll on whatever we need. However, SSL can renegotiate at any time, so we also merge with what it needs. In addition, we avoid the unnecessary read when poll doesn't give us POLLIN in revents, since the read will more likely than not fail (except in the rare case when data becomes available in the interim). Notice that SSL_read will return SSL_WANTS_READ when there is no data, which is misleading (since SSL isn't in need of data to read at all, nor are we, for that matter). Best not to do noisy reads unnecessarily. These changes are disabled by default and can be enabled via the experimental_features option. Change-Id: I6a7ed7d871ed257b30062cc720a8b8c7acbab3b7 Signed-off-by: Ashod Nakashian <ashod.nakashian@collabora.co.uk>
2022-03-16 06:13:02 -05:00
// We cannot communicate with a client when SSL wants
// to negotiate the handshake. Here, we give priority to
// SSL's needs. Only when SSL is done negotiating
// (i.e. wants neither to read, nor to write) do we see
// what data activity we have.
if (_doHandshake)
wsd: reduce polling when blocked on ssl handshake While SSL is handshaking, there can be no general application data communication. During that early stage of connecting we have data to send (the request, headers, etc.) and so we poll on POLLOUT. Naturally, we also always want to poll on POLLIN, because we can never know when there is data to read (especially true for web-sockets). The problem is when SSL will not send data just yet because it is handshaking. It is typically waiting for handshake negotiation data to read, so when we POLLOUT, poll immediately returns, but writing (via SSL_write) fails with WANTS_READ error. This goes on in a busy-loop until the negotiation data is available for read and the handshake is completed. Very inefficient. The solution is to poll on whatever SSL needs during the handshake, exclusively. Once the handshake is complete, we poll on whatever we need. However, SSL can renegotiate at any time, so we also merge with what it needs. In addition, we avoid the unnecessary read when poll doesn't give us POLLIN in revents, since the read will more likely than not fail (except in the rare case when data becomes available in the interim). Notice that SSL_read will return SSL_WANTS_READ when there is no data, which is misleading (since SSL isn't in need of data to read at all, nor are we, for that matter). Best not to do noisy reads unnecessarily. These changes are disabled by default and can be enabled via the experimental_features option. Change-Id: I6a7ed7d871ed257b30062cc720a8b8c7acbab3b7 Signed-off-by: Ashod Nakashian <ashod.nakashian@collabora.co.uk>
2022-03-16 06:13:02 -05:00
{
if (_sslWantsTo == SslWantsTo::Write)
{
return POLLOUT;
}
if (_sslWantsTo == SslWantsTo::Read)
{
return POLLIN;
}
// The first call will have SslWantsTo::Neither and
// we will fall-through and call the base.
}
int events = StreamSocket::getPollEvents(now, timeoutMaxMicroS); // Default to base.
if (_sslWantsTo == SslWantsTo::Write) // If OpenSSL wants to write (and we don't).
events |= POLLOUT;
return events;
}
private:
/// The possible next I/O operation that SSL want to do.
enum class SslWantsTo
{
Neither,
Read,
Write
};
int doHandshake()
{
ASSERT_CORRECT_SOCKET_THREAD(this);
if (_doHandshake)
{
int rc;
do
{
rc = SSL_do_handshake(_ssl);
}
while (rc < 0 && errno == EINTR);
if (rc <= 0)
{
rc = handleSslState(rc, "handshake");
if (rc <= 0)
wsd: return SSL_do_handshake error-codes unmodified For some reason we only checked for the closed socket case as failure and returned 0 in that case, for error cases, we returned 1. Likely the API had been modified in the early days, but this return code was left lopsided. This meant that even when the handshake failed, we still called readIncomingData or writeOutgoindData, depending on whether we wanted to read or write, causing a rare race-condition. When a client (HTTP request) connects to a server, it needs to send the request, right after the SSL handshake. SSL_do_handshake could need data from the socket to complete the handshake. In such a case it returns WANT_READ. Unfortunately, because we always called SSL_read, the missing data could have arrived between the SSL_do_handshake call and the SSL_read call (a rather short duration, to be sure, but an open window all the same). SSL_read would of course read said data from the socket and, since it still needs to finish the handshake, will buffer it. It then returns the very same error that the SSL_do_handshake returned: WANT_READ. Of course we will oblige by polling with POLLIN, which will time out (there is no more data to come, and the server is waiting for *our* request and has nothing to send us). The only way this deadlock could break if SSL_do_handshake was called (which will consume the buffered data, return 1 to indicate handshake has completed). Since we wouldn't call it unless and until we get POLLIN, per WANT_READ, which won't happen in this case. And since SSL_read doesn't call SSL_do_handshake either, the request times out and that's the end of it. The fix is to not call SSL_read when the handshake isn't complete and needs more data, which we do now. Fortunately, we have very few SSL clients, outside of unit-tests. Most notably the WOPI client. But even then it's not a heavily used connection and might not even be SSL-enabled (for LAN servers). Change-Id: I04fd3dae151904194f3d7579dbf8c671b2580ffb Signed-off-by: Ashod Nakashian <ashod.nakashian@collabora.co.uk>
2023-04-15 09:27:11 -05:00
{
// 0 means shutdown.
return rc;
}
}
if (rc == 1)
{
// Successful handshake; TLS/SSL connection established.
wsd: reduce polling when blocked on ssl handshake While SSL is handshaking, there can be no general application data communication. During that early stage of connecting we have data to send (the request, headers, etc.) and so we poll on POLLOUT. Naturally, we also always want to poll on POLLIN, because we can never know when there is data to read (especially true for web-sockets). The problem is when SSL will not send data just yet because it is handshaking. It is typically waiting for handshake negotiation data to read, so when we POLLOUT, poll immediately returns, but writing (via SSL_write) fails with WANTS_READ error. This goes on in a busy-loop until the negotiation data is available for read and the handshake is completed. Very inefficient. The solution is to poll on whatever SSL needs during the handshake, exclusively. Once the handshake is complete, we poll on whatever we need. However, SSL can renegotiate at any time, so we also merge with what it needs. In addition, we avoid the unnecessary read when poll doesn't give us POLLIN in revents, since the read will more likely than not fail (except in the rare case when data becomes available in the interim). Notice that SSL_read will return SSL_WANTS_READ when there is no data, which is misleading (since SSL isn't in need of data to read at all, nor are we, for that matter). Best not to do noisy reads unnecessarily. These changes are disabled by default and can be enabled via the experimental_features option. Change-Id: I6a7ed7d871ed257b30062cc720a8b8c7acbab3b7 Signed-off-by: Ashod Nakashian <ashod.nakashian@collabora.co.uk>
2022-03-16 06:13:02 -05:00
LOG_TRC("SSL handshake completed successfully");
_doHandshake = false;
_sslWantsTo = SslWantsTo::Neither; // Reset until we are told otherwise.
wsd: reduce polling when blocked on ssl handshake While SSL is handshaking, there can be no general application data communication. During that early stage of connecting we have data to send (the request, headers, etc.) and so we poll on POLLOUT. Naturally, we also always want to poll on POLLIN, because we can never know when there is data to read (especially true for web-sockets). The problem is when SSL will not send data just yet because it is handshaking. It is typically waiting for handshake negotiation data to read, so when we POLLOUT, poll immediately returns, but writing (via SSL_write) fails with WANTS_READ error. This goes on in a busy-loop until the negotiation data is available for read and the handshake is completed. Very inefficient. The solution is to poll on whatever SSL needs during the handshake, exclusively. Once the handshake is complete, we poll on whatever we need. However, SSL can renegotiate at any time, so we also merge with what it needs. In addition, we avoid the unnecessary read when poll doesn't give us POLLIN in revents, since the read will more likely than not fail (except in the rare case when data becomes available in the interim). Notice that SSL_read will return SSL_WANTS_READ when there is no data, which is misleading (since SSL isn't in need of data to read at all, nor are we, for that matter). Best not to do noisy reads unnecessarily. These changes are disabled by default and can be enabled via the experimental_features option. Change-Id: I6a7ed7d871ed257b30062cc720a8b8c7acbab3b7 Signed-off-by: Ashod Nakashian <ashod.nakashian@collabora.co.uk>
2022-03-16 06:13:02 -05:00
if (!verifyCertificate())
{
LOG_WRN("Failed to verify the certificate of [" << hostname() << ']');
closeConnection();
wsd: return SSL_do_handshake error-codes unmodified For some reason we only checked for the closed socket case as failure and returned 0 in that case, for error cases, we returned 1. Likely the API had been modified in the early days, but this return code was left lopsided. This meant that even when the handshake failed, we still called readIncomingData or writeOutgoindData, depending on whether we wanted to read or write, causing a rare race-condition. When a client (HTTP request) connects to a server, it needs to send the request, right after the SSL handshake. SSL_do_handshake could need data from the socket to complete the handshake. In such a case it returns WANT_READ. Unfortunately, because we always called SSL_read, the missing data could have arrived between the SSL_do_handshake call and the SSL_read call (a rather short duration, to be sure, but an open window all the same). SSL_read would of course read said data from the socket and, since it still needs to finish the handshake, will buffer it. It then returns the very same error that the SSL_do_handshake returned: WANT_READ. Of course we will oblige by polling with POLLIN, which will time out (there is no more data to come, and the server is waiting for *our* request and has nothing to send us). The only way this deadlock could break if SSL_do_handshake was called (which will consume the buffered data, return 1 to indicate handshake has completed). Since we wouldn't call it unless and until we get POLLIN, per WANT_READ, which won't happen in this case. And since SSL_read doesn't call SSL_do_handshake either, the request times out and that's the end of it. The fix is to not call SSL_read when the handshake isn't complete and needs more data, which we do now. Fortunately, we have very few SSL clients, outside of unit-tests. Most notably the WOPI client. But even then it's not a heavily used connection and might not even be SSL-enabled (for LAN servers). Change-Id: I04fd3dae151904194f3d7579dbf8c671b2580ffb Signed-off-by: Ashod Nakashian <ashod.nakashian@collabora.co.uk>
2023-04-15 09:27:11 -05:00
return 0; // Connection is closed.
}
}
wsd: return SSL_do_handshake error-codes unmodified For some reason we only checked for the closed socket case as failure and returned 0 in that case, for error cases, we returned 1. Likely the API had been modified in the early days, but this return code was left lopsided. This meant that even when the handshake failed, we still called readIncomingData or writeOutgoindData, depending on whether we wanted to read or write, causing a rare race-condition. When a client (HTTP request) connects to a server, it needs to send the request, right after the SSL handshake. SSL_do_handshake could need data from the socket to complete the handshake. In such a case it returns WANT_READ. Unfortunately, because we always called SSL_read, the missing data could have arrived between the SSL_do_handshake call and the SSL_read call (a rather short duration, to be sure, but an open window all the same). SSL_read would of course read said data from the socket and, since it still needs to finish the handshake, will buffer it. It then returns the very same error that the SSL_do_handshake returned: WANT_READ. Of course we will oblige by polling with POLLIN, which will time out (there is no more data to come, and the server is waiting for *our* request and has nothing to send us). The only way this deadlock could break if SSL_do_handshake was called (which will consume the buffered data, return 1 to indicate handshake has completed). Since we wouldn't call it unless and until we get POLLIN, per WANT_READ, which won't happen in this case. And since SSL_read doesn't call SSL_do_handshake either, the request times out and that's the end of it. The fix is to not call SSL_read when the handshake isn't complete and needs more data, which we do now. Fortunately, we have very few SSL clients, outside of unit-tests. Most notably the WOPI client. But even then it's not a heavily used connection and might not even be SSL-enabled (for LAN servers). Change-Id: I04fd3dae151904194f3d7579dbf8c671b2580ffb Signed-off-by: Ashod Nakashian <ashod.nakashian@collabora.co.uk>
2023-04-15 09:27:11 -05:00
else
{
LOG_ERR("Unexpected return code from SSL_do_handshake: " << rc);
return rc;
}
}
// Handshake complete.
return 1;
}
/// Verify the peer's certificate.
/// Return true iff the certificate matches the hostname.
bool verifyCertificate();
/// Handles the state of SSL after read or write.
int handleSslState(const int rc, const char* context)
{
const auto last_errno = errno;
ASSERT_CORRECT_SOCKET_THREAD(this);
if (rc > 0)
{
// Success: Reset so we can do either.
_sslWantsTo = SslWantsTo::Neither;
return rc;
}
// Last operation failed. Find out if SSL was trying
// to do something different that failed, or not.
const int sslError = SSL_get_error(_ssl, rc);
switch (sslError)
{
case SSL_ERROR_ZERO_RETURN:
// Shutdown complete, we're disconnected.
LOG_TRC("SSL error (" << context << "): ZERO_RETURN (" << sslError
<< "): " << getBioError(rc));
errno = last_errno; // Restore errno.
return 0;
case SSL_ERROR_WANT_READ:
#if OPENSSL_VERSION_NUMBER > 0x10100000L
LOG_TRC("SSL error (" << context << "): WANT_READ (" << sslError << ") has "
<< (SSL_has_pending(_ssl) ? "" : "no")
<< " pending data to read: " << SSL_pending(_ssl) << ". "
<< getBioError(rc));
#else
LOG_TRC("SSL error (" << context << "): WANT_READ (" << sslError << ").");
#endif
_sslWantsTo = SslWantsTo::Read;
errno = last_errno; // Restore errno.
return rc;
case SSL_ERROR_WANT_WRITE:
#if OPENSSL_VERSION_NUMBER > 0x10100000L
LOG_TRC("SSL error (" << context << "): WANT_WRITE (" << sslError << ") has "
<< (SSL_has_pending(_ssl) ? "" : "no")
<< " pending data to read: " << SSL_pending(_ssl) << ". "
<< getBioError(rc));
#else
LOG_TRC("SSL error: WANT_WRITE (" << sslError << ").");
#endif
_sslWantsTo = SslWantsTo::Write;
errno = last_errno; // Restore errno.
return rc;
case SSL_ERROR_WANT_CONNECT:
LOG_TRC("SSL error (" << context << "): WANT_CONNECT (" << sslError
<< "): " << getBioError(rc));
errno = last_errno; // Restore errno.
return rc;
case SSL_ERROR_WANT_ACCEPT:
LOG_TRC("SSL error (" << context << "): WANT_ACCEPT (" << sslError
<< "): " << getBioError(rc));
errno = last_errno; // Restore errno.
return rc;
case SSL_ERROR_WANT_X509_LOOKUP:
// Unexpected.
LOG_TRC("SSL error (" << context << "): WANT_X509_LOOKUP (" << sslError
<< "): " << getBioError(rc));
errno = last_errno; // Restore errno.
return rc;
case SSL_ERROR_SYSCALL:
if (last_errno != 0)
{
// Posix API error, let the caller handle.
LOG_TRC("SSL error (" << context << "): SYSCALL error " << sslError << " ("
<< Util::symbolicErrno(last_errno) << ": "
<< std::strerror(last_errno) << "): " << getBioError(rc));
errno = last_errno; // Restore errno.
return rc;
}
// Fallthrough...
default:
{
// Effectively an EAGAIN error at the BIO layer
if (BIO_should_retry(_bio))
{
#if OPENSSL_VERSION_NUMBER > 0x10100000L
LOG_TRC("BIO asks for retry - underlying EAGAIN? ("
<< context << "): " << SSL_get_error(_ssl, rc) << " has_pending "
<< SSL_has_pending(_ssl) << " bytes: " << SSL_pending(_ssl) << ". "
<< getBioError(rc));
#else
LOG_TRC("BIO asks for retry - underlying EAGAIN? " << SSL_get_error(_ssl, rc)
<< ". " << getBioError(rc));
#endif
errno = last_errno ? last_errno : EAGAIN; // Restore errno.
return -1; // poll is used to detect real errors.
}
if (sslError == SSL_ERROR_SSL)
LOG_TRC("SSL error (" << context << "): SSL (" << sslError << ") "
<< getBioError(rc));
else if (sslError == SSL_ERROR_SYSCALL)
LOG_TRC("SSL error (" << context << "): SYSCALL (" << sslError << ") "
<< getBioError(rc));
#if OPENSSL_VERSION_NUMBER > 0x10100000L
else if (sslError == SSL_ERROR_WANT_ASYNC)
LOG_TRC("SSL error (" << context << "): WANT_ASYNC (" << sslError << ") "
<< getBioError(rc));
else if (sslError == SSL_ERROR_WANT_ASYNC_JOB)
LOG_TRC("SSL error (" << context << "): WANT_ASYNC_JOB (" << sslError << ") "
<< getBioError(rc));
#endif
else
LOG_TRC("SSL error (" << context << "): UNKNOWN (" << sslError << ") "
<< getBioError(rc));
// The error is coming from BIO. Find out what happened.
const long bioError = ERR_peek_error();
std::ostringstream oss;
oss << '#' << getFD();
if (bioError == 0)
{
if (rc == 0)
{
// Socket closed. Not an error.
oss << " (" << context << "): closed. " << getBioError(rc);
LOG_INF(oss.str());
errno = last_errno; // Restore errno.
return 0;
}
else if (rc == -1)
{
oss << " (" << context << "): socket closed unexpectedly. ";
}
else
{
oss << " (" << context << "): unknown. ";
}
}
else
{
oss << " (" << context << "): unknown. ";
}
oss << getBioError(rc);
const std::string msg = oss.str();
LOG_TRC("Throwing SSL Error ("
<< context << "): " << msg); // Locate the source of the exception.
errno = last_errno; // Restore errno.
throw std::runtime_error(msg);
}
break;
}
errno = last_errno; // Restore errno.
return rc;
}
std::string getBioError(const int rc) const
{
// The error is coming from BIO. Find out what happened.
const long bioError = ERR_peek_error();
std::ostringstream oss;
oss << "BIO error: " << bioError << ", rc: " << rc;
char buf[512];
ERR_error_string_n(bioError, buf, sizeof(buf));
oss << ": " << buf << ':';
auto cb = [](const char* str, size_t len, void* u) -> int
{
std::ostringstream& os = *reinterpret_cast<std::ostringstream*>(u);
os << '\n' << std::string(str, len);
return 1; // Apparently 0 means failure here.
};
ERR_print_errors_cb(cb, &oss);
return oss.str();
}
private:
BIO* _bio;
SSL* _ssl;
ssl::CertificateVerification _verification; //< The certificate verification requirement.
/// During handshake SSL might want to read
/// on write, or write on read.
SslWantsTo _sslWantsTo;
/// We must do the handshake during the first
/// read or write in non-blocking.
bool _doHandshake;
};
/* vim:set shiftwidth=4 softtabstop=4 expandtab: */