-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add improved handling for TLS certificates for static builds. #17605
base: master
Are you sure you want to change the base?
Conversation
Flagging @netdata/agent as a reviewer for help with testing. |
select_internal_certs() { | ||
echo "Using bundled TLS configuration and certificates" | ||
ln -sf /opt/netdata/share/ssl /opt/netdata/etc/ssl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work because if a directory or symlink to a directory with the target name already exists, the symlink will be created inside it.
There is -n
option (fixes the problem)
-n, --no-dereference
treat LINK_NAME as a normal file if it is a symbolic link to a directory
but POSIX ln doesn't have it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# ln -sfv /opt/netdata/share/ssl /opt/netdata/etc/ssl
'/opt/netdata/etc/ssl/ssl' -> '/opt/netdata/share/ssl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be fixed by the latest commit.
test_certs() { | ||
/opt/netdata/bin/curl --fail --silent --output /dev/null "${NETDATA_CERT_TEST_URL}" || return 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check for 60 exit code only?
60 Peer certificate cannot be authenticated with known CA
certificates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failing to check at all means we run in to the same potential issue we have now, so I would prefer to just act as if the check failed in that case instead of risking claiming not working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, we only need to use the bundled version when the error is "The peer certificate cannot be authenticated using known CA certificates." Why doing it on any other error? If someone set wrong "test claim URL" or any other not "cant auth using known CA certs" error. Can you provide an example of other error you think we need to use bundle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are literally hundreds of reasons that the cURL command could fail that have nothing to do with certificates. Any of those cases mean that we have no idea if the system certificates work or not.
Unless it’s an offline install though, it’s safer in that case to use our bundled certificates instead of the system certificates, because we know that will work in the common case of trying to connect to the Cloud, while we do not know if the system certificates will.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are literally hundreds of reasons that the cURL command could fail that have nothing to do with certificates.
That was my point. I don't understand why we need to use bundled for any cURL error if we check for certificate validation.
because we know that will work in the common case of trying to connect to the Cloud
Then why not do it by default?
Anyway, I understand your reasoning, but I don't find it convincing. That is not a blocker from my side.
else | ||
replace_symlink() { | ||
target="${1}" | ||
name="${2}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ferroin ,
please, take a look on this warning.
Summary
Our current static builds include a bundled copy of the CA certificates available in the build environment, and will automatically fall back to using those if they can’t find a usable set of CA certificates on the target system.
This works in a majority of cases, but it has a couple of issues:
This PR works to remedy those issues. It adds additional logic to the installation code embedded in the static builds that handles the TLS certificates, adding support both for checking the usability of the selected certificates and for skipping the system certificates and just using the bundled ones. This functionality is controlled by two new options and associated environment variables:
--certificates
andNETDATA_CERT_MODE
allow specifying the certificate handling mode.auto
andsystem
have the current behavior.check
expands the current behavior by testing the detected system certificate store for usability by connecting to a well-known URL.bundled
ignores any system certificates and unconditionally uses the bundled certificates. Defaults toauto
if unspecified (thus preserving the existing behavior).--certificate-test-url
andNETDATA_CERT_TEST_URL
allow overriding the URL used bycheck
mode for testing whether the system certificates are usable. Defaults tohttps://app.netdata.cloud
if unspecified.The options override any value specified by the environment variables.
Additionally, the kickstart script is updated to pass the claiming URL as the certificate test URL when installing a static build, and to make the certificate handling mode default to
check
for installs other than offline installs andauto
mode for offline installs. The values passed by the kickstart script can still be overridden by using the--static-install-options
option to add the--certificates
or--certificate-test-url
options as needed.Test Plan
Testing requires manually installing static builds produced from this PR on a variety of systems with various values for the
--certificates
and--certificate-test-url
options and observing how the symlink at/opt/netdata/etc/ssl
is updated.