Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

caddytls: fix permission requirement with AutomationPolicy #6328

Merged
merged 1 commit into from
May 20, 2024

Conversation

willnorris
Copy link
Contributor

Certificate automation has permission modules that are designed to prevent inappropriate issuance of unbounded or wildcard certificates. When an explicit cert manager is used, no additional permission should be necessary. For example, this should be a valid caddyfile:

https:// {
  tls {
    get_certificate tailscale
  }
  respond OK
}

This is accomplished when provisioning an AutomationPolicy by tracking whether there were explicit managers configured directly on the policy (in the ManagersRaw field). Only when a number of potentially unsafe conditions are present AND no explicit cert managers are configured is an error returned.

The problem arises from the fact that ctx.LoadModule deletes the raw bytes after loading in order to save memory. The first time an AutomationPolicy is provisioned, the ManagersRaw field is populated, and everything is fine.

An AutomationPolicy with no subjects is treated as a special "catch-all" policy. App.createAutomationPolicies ensures that this catch-all policy has an ACME issuer, and then calls its Provision method again because it may have changed. This second time Provision is called, ManagesRaw is no longer populated, and the permission check fails because it appears as though the policy has no explicit managers.

Address this by storing a new boolean on AutomationPolicy recording whether it had explicit cert managers configured on it.

Also fix an inverted boolean check on this value when setting failClosed.

Updates #6060
Updates #6229
Updates #6327

Certificate automation has permission modules that are designed to
prevent inappropriate issuance of unbounded or wildcard certificates.
When an explicit cert manager is used, no additional permission should
be necessary. For example, this should be a valid caddyfile:

    https:// {
      tls {
        get_certificate tailscale
      }
      respond OK
    }

This is accomplished when provisioning an AutomationPolicy by tracking
whether there were explicit managers configured directly on the policy
(in the ManagersRaw field). Only when a number of potentially unsafe
conditions are present AND no explicit cert managers are configured is
an error returned.

The problem arises from the fact that ctx.LoadModule deletes the raw
bytes after loading in order to save memory. The first time an
AutomationPolicy is provisioned, the ManagersRaw field is populated, and
everything is fine.

An AutomationPolicy with no subjects is treated as a special "catch-all"
policy. App.createAutomationPolicies ensures that this catch-all policy
has an ACME issuer, and then calls its Provision method again because it
may have changed. This second time Provision is called, ManagesRaw is no
longer populated, and the permission check fails because it appears as
though the policy has no explicit managers.

Address this by storing a new boolean on AutomationPolicy recording
whether it had explicit cert managers configured on it.

Also fix an inverted boolean check on this value when setting
failClosed.

Updates caddyserver#6060
Updates caddyserver#6229
Updates caddyserver#6327

Signed-off-by: Will Norris <will@tailscale.com>
@francislavoie
Copy link
Member

francislavoie commented May 19, 2024

Oh wow, that's super subtle and tricky. Nice find.

I wonder if we should be resolving the underlying problem here though, i.e. it being Provision'd twice. That really should only happen one time. I'm not sure if we have a good way of doing that though. Maybe the catch-all one would need to be a totally fresh instance instead of re-provisioning? Maybe too complex.

But this workaround is probably sufficient for now. Thanks!

@francislavoie francislavoie added the bug 🐞 Something isn't working label May 19, 2024
@francislavoie francislavoie added this to the v2.8.0 milestone May 19, 2024
@@ -262,9 +265,9 @@ func (ap *AutomationPolicy) Provision(tlsApp *TLS) error {
// prevent issuance from Issuers (when Managers don't provide a certificate) if there's no
// permission module configured
noProtections := ap.isWildcardOrDefault() && !ap.onlyInternalIssuer() && (tlsApp.Automation == nil || tlsApp.Automation.OnDemand == nil || tlsApp.Automation.OnDemand.permission == nil)
failClosed := noProtections && hadExplicitManagers // don't allow on-demand issuance (other than implicit managers) if no managers have been explicitly configured
failClosed := noProtections && !ap.hadExplicitManagers // don't allow on-demand issuance (other than implicit managers) if no managers have been explicitly configured
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this change based on the inline comment, and it seems to make sense. However, now looking at it again, this is a noop since it is the same check that results in the returned error directly below. So either the inline comment here and/or I am a little confused, or we can just remove this failClosed var entirely. Admittedly, this is my first time digging into this section of code, so I may just be misunderstanding the intent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably need more testing here 😅

@willnorris
Copy link
Contributor Author

I wonder if we should be resolving the underlying problem here though, i.e. it being Provision'd twice.

Yeah, I had that thought as well. I wasn't sure if it's generally best practice to avoid provisioning a module multiple times. Though for what it's worth, this particular logic regarding re-provisioning the catch-all automation policy is from 2020, so it's pretty well established.

@mholt
Copy link
Member

mholt commented May 20, 2024

Ahh okay, so yeah... we do reprovision APs at times because we need to update the underlying CertMagic configs.

Let's give this a try... not really sure a cleaner/simpler way to do it right now, but would be interested with revisiting it if we find one more elegant :)

@mholt mholt merged commit db3e19b into caddyserver:master May 20, 2024
23 checks passed
@willnorris willnorris deleted the will/automation branch June 2, 2024 05:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants