[cf-dev] cf-deployment 3.0

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

[cf-dev] cf-deployment 3.0

Josh Collins
Hey Y'all,

Cf-deployment 3.0 is around the corner. 
We're going to go 3.0 in 2-3 weeks.

We released cf-deployment 2.0 on June 18th and included 'breaking' changes.

Breaking changes in the context of cf-d are changes which would require special attention from operators for the deployment to succeed. Executing the same bosh deploy command/args run used in the previous deployment may fail depending on which ops files and features operators had deployed with in the past.

Going forward, we'd like to introduce a more regular (~monthly) cadence to major point releases of cf-deployment.

The goal is two-fold and in-order-of-importance:
  1. provide a reliable mechanism for cf component teams to integrate and release major  changes
  2. mitigate fear of major point releases in the minds of operators/cf-consumers

As of today, we've got one PR that includes breaking changes and I'm putting out a call to y'all.
If you've got what you'd consider to be a breaking change that warrants going out in a major point release of cf-deployment, please submit your PRs and reach out to the RelInt team as soon as you're able to so we can come up to speed and support you!

Cheers,

Josh





_._,_._,_

Links:

You receive all messages sent to this group.

View/Reply Online (#8122) | [hidden email] | [hidden email] | Mute This Topic | New Topic

Your Subscription | [hidden email] | Unsubscribe [[hidden email]]

_._,_._,_
Reply | Threaded
Open this post in threaded view
|

Re: [cf-dev] cf-deployment 3.0

Franks, Geoff

How long will 1.x, and 2.x cf-deployments be maintained with security patches? Without that, it sounds like there’s potential for a lot of organizations to be faced with breaking changes and instability every time they upgrade (if upgrade cycles internally take a month or two, and major versions are coming out as often or more), not to mention the difficulties of jumping multiple major versions at once.

 

From: <[hidden email]> on behalf of Josh Collins <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Tuesday, July 3, 2018 at 6:16 PM
To: cf-eng <[hidden email]>, cf-pm <[hidden email]>, "[hidden email]" <[hidden email]>, CF Dev <[hidden email]>
Subject: [External] [cf-dev] cf-deployment 3.0

 

Hey Y'all,

 

Cf-deployment 3.0 is around the corner. 

We're going to go 3.0 in 2-3 weeks.

 

We released cf-deployment 2.0 on June 18th and included 'breaking' changes.

 

Breaking changes in the context of cf-d are changes which would require special attention from operators for the deployment to succeed. Executing the same bosh deploy command/args run used in the previous deployment may fail depending on which ops files and features operators had deployed with in the past.

 

Going forward, we'd like to introduce a more regular (~monthly) cadence to major point releases of cf-deployment.

 

The goal is two-fold and in-order-of-importance:

  1. provide a reliable mechanism for cf component teams to integrate and release major  changes
  2. mitigate fear of major point releases in the minds of operators/cf-consumers

 

As of today, we've got one PR that includes breaking changes and I'm putting out a call to y'all.

If you've got what you'd consider to be a breaking change that warrants going out in a major point release of cf-deployment, please submit your PRs and reach out to the RelInt team as soon as you're able to so we can come up to speed and support you!

 

Cheers,

 

Josh

 

 

 

 

 

_._,_._,_

Links:

You receive all messages sent to this group.

View/Reply Online (#8126) | [hidden email] | [hidden email] | Mute This Topic | New Topic

Your Subscription | [hidden email] | Unsubscribe [[hidden email]]

_._,_._,_
Reply | Threaded
Open this post in threaded view
|

Re: [cf-dev] cf-deployment 3.0

Josh Collins
The Release Integration team hasn't provided security releases in the past -- for neither cf-release nor cf-deployment -- and doing so would be burdensome and impede the evolution of cf-deployment. Therefore, we're not currently planning to start providing security patches. But we appreciate the feedback and will keep an eye on the problem.

Because the RelInt team's primary goal is to support the CF Foundation engineering teams and their ability to validate their commits in CI, we need to focus more on keeping up-to-date with their changes. We want to set a release cadence that's aligned with, and ideally increases, the velocity of the teams. Take a look at the what happened with container networking when they wanted to ship 2.0...

Thanks for reaching out Geoff!
_._,_._,_

Links:

You receive all messages sent to this group.

View/Reply Online (#8134) | [hidden email] | [hidden email] | Mute This Topic | New Topic

Your Subscription | [hidden email] | Unsubscribe [[hidden email]]

_._,_._,_
Reply | Threaded
Open this post in threaded view
|

Re: [cf-dev] cf-deployment 3.0

Marco Voelz

Dear Josh,

 

You are correct, in the past the RelInt team hasn't provided security releases. Instead, the credo was to go forward with the regular releases to also get the newest security fixes. This, however, was only easily possible because *the newer version did not introduce breaking changes with potentially big impact at the same time*.

 

I understand your mission of helping other teams increase their velocity. Maintaining multiple branches with fixes is certainly not fun, and I agree that it makes sense to try to avoid this if possible. I'm not sure I get the container networking 2.0 reference, though. Could you elaborate a bit more on this and how it is related to the current discussion?

 

Thanks and warm regards

Marco

 

From: <[hidden email]> on behalf of Josh Collins <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Wednesday, 11. July 2018 at 20:43
To: "[hidden email]" <[hidden email]>
Subject: Re: [cf-dev] cf-deployment 3.0

 

The Release Integration team hasn't provided security releases in the past -- for neither cf-release nor cf-deployment -- and doing so would be burdensome and impede the evolution of cf-deployment. Therefore, we're not currently planning to start providing security patches. But we appreciate the feedback and will keep an eye on the problem.


Because the RelInt team's primary goal is to support the CF Foundation engineering teams and their ability to validate their commits in CI, we need to focus more on keeping up-to-date with their changes. We want to set a release cadence that's aligned with, and ideally increases, the velocity of the teams. Take a look at the what happened with container networking when they wanted to ship 2.0...

Thanks for reaching out Geoff!

_._,_._,_

Links:

You receive all messages sent to this group.

View/Reply Online (#8137) | [hidden email] | [hidden email] | Mute This Topic | New Topic

Your Subscription | [hidden email] | Unsubscribe [[hidden email]]

_._,_._,_
Reply | Threaded
Open this post in threaded view
|

Re: [cf-dev] cf-deployment 3.0

Josh Collins
Hi Marco,

I'm happy to provide more context on the container networking 2.0 reference.
The container networking team submitted a PR to cf-deployment with changes required for them to ship v2.0. 
RelInt deferred the container networking team's PR for a few weeks due to competing priorities including multiple CVE's fixes.
During the deferral time, a few other PRs were submitted which included breaking changes.
These additional changes took much more time to integrate and validate than anticipated and in the end, the container networking team's 2.0 release was published in cf-d about 5 weeks after it was ready to go.
The introduction of a regular cadence aims to mitigate this type of delay in the future. Had we had one at the time, the networking team would have timed it's PR to align and we would have been poised to accept and publish it quickly.
We believe this will help teams confidently plan for, communicate about, and negotiate integrating their releases into cf-deployment.
And hopefully enable the RelInt team to integrate and ship major releases more seamlessly.

This is an evolving process so we'll see how things roll in the coming months and make adjustments where it makes sense to do so. 
I appreciate and welcome any and all feedback along the way.

Thanks very much,

Josh

_._,_._,_

Links:

You receive all messages sent to this group.

View/Reply Online (#8145) | [hidden email] | [hidden email] | Mute This Topic | New Topic

Your Subscription | [hidden email] | Unsubscribe [[hidden email]]

_._,_._,_
Reply | Threaded
Open this post in threaded view
|

Re: [cf-dev] cf-deployment 3.0

Marco Voelz

Dear Josh,


Thanks for the context, I wasn't aware of what happened before the release of networking 2.0. To stick with your example, though: From what you are saying I have understood that you would rather have done it this way – please correct me here if I'm wrong:

  • integrate networking release 2.0 into cf-deployment, 
  • integrate other PRs with breaking changes
  • bumping cf-deployment to a new major version, given above changes
  • merging the CVE fixes only into the new major version of cf-deployment

With this process, you would have achieved the following:
  • the development teams are happy, because they shipped as soon as they were ready to
  • operators are grumpy, because they have to bump networking to a new major version and adopt to other breaking changes in order to fix CVEs

I'm not saying you have to turn this tradeoff the other way around, but in my opinion this doesn't seem very consumer friendly. 

In your team's mission, you have clearly stated that your goal is to enable development teams to maintain a high velocity. I'd like to stress that we shouldn't leave the operators and users out of the picture here. In the end, you're developing for them, not for yourself. 

I'm not sure if the consumer/operator persona is a thing for RelInt, but if that's the case, here's something I'd like to hold true for whatever change RelInt makes to their process:
"As an operator of CF, I'd like to consume CVE fixes with as little changes to my existing installation as possible, such that I close known vulnerabilities as soon as possible"

Does that sound reasonable?

Warm regards
Marco


From: [hidden email] <[hidden email]> on behalf of Josh Collins <[hidden email]>
Sent: Friday, July 13, 2018 11:39:30 PM
To: [hidden email]
Subject: Re: [cf-dev] cf-deployment 3.0
 
Hi Marco,

I'm happy to provide more context on the container networking 2.0 reference.
The container networking team submitted a PR to cf-deployment with changes required for them to ship v2.0. 
RelInt deferred the container networking team's PR for a few weeks due to competing priorities including multiple CVE's fixes.
During the deferral time, a few other PRs were submitted which included breaking changes.
These additional changes took much more time to integrate and validate than anticipated and in the end, the container networking team's 2.0 release was published in cf-d about 5 weeks after it was ready to go.
The introduction of a regular cadence aims to mitigate this type of delay in the future. Had we had one at the time, the networking team would have timed it's PR to align and we would have been poised to accept and publish it quickly.
We believe this will help teams confidently plan for, communicate about, and negotiate integrating their releases into cf-deployment.
And hopefully enable the RelInt team to integrate and ship major releases more seamlessly.

This is an evolving process so we'll see how things roll in the coming months and make adjustments where it makes sense to do so. 
I appreciate and welcome any and all feedback along the way.

Thanks very much,

Josh

_._,_._,_

Links:

You receive all messages sent to this group.

View/Reply Online (#8149) | [hidden email] | [hidden email] | Mute This Topic | New Topic

Your Subscription | [hidden email] | Unsubscribe [[hidden email]]

_._,_._,_
Reply | Threaded
Open this post in threaded view
|

Re: [cf-dev] cf-deployment 3.0

Franks, Geoff
In reply to this post by Josh Collins

I’m going to agree with Marco’s concerns here. Making life harder and less stable for the end users of CF has a real potential to alienate and push away the CF userbase altogether, even if it’s just in appearance (seeing monthly major releases of a product may cause new organizations to hesitate to migrate, until the release process appears more stable.

 

 

From: <[hidden email]> on behalf of Marco Voelz <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Monday, July 16, 2018 at 1:34 AM
To: "[hidden email]" <[hidden email]>
Subject: [External] Re: [cf-dev] cf-deployment 3.0

 

Dear Josh,

 

Thanks for the context, I wasn't aware of what happened before the release of networking 2.0. To stick with your example, though: From what you are saying I have understood that you would rather have done it this way – please correct me here if I'm wrong:

  • integrate networking release 2.0 into cf-deployment, 
  • integrate other PRs with breaking changes
  • bumping cf-deployment to a new major version, given above changes
  • merging the CVE fixes only into the new major version of cf-deployment

 

With this process, you would have achieved the following:

  • the development teams are happy, because they shipped as soon as they were ready to
  • operators are grumpy, because they have to bump networking to a new major version and adopt to other breaking changes in order to fix CVEs

 

I'm not saying you have to turn this tradeoff the other way around, but in my opinion this doesn't seem very consumer friendly. 

 

In your team's mission, you have clearly stated that your goal is to enable development teams to maintain a high velocity. I'd like to stress that we shouldn't leave the operators and users out of the picture here. In the end, you're developing for them, not for yourself. 

 

I'm not sure if the consumer/operator persona is a thing for RelInt, but if that's the case, here's something I'd like to hold true for whatever change RelInt makes to their process:

"As an operator of CF, I'd like to consume CVE fixes with as little changes to my existing installation as possible, such that I close known vulnerabilities as soon as possible"

 

Does that sound reasonable?

 

Warm regards

Marco


From: [hidden email] <[hidden email]> on behalf of Josh Collins <[hidden email]>
Sent: Friday, July 13, 2018 11:39:30 PM
To: [hidden email]
Subject: Re: [cf-dev] cf-deployment 3.0

 

Hi Marco,

I'm happy to provide more context on the container networking 2.0 reference.
The container networking team submitted a PR to cf-deployment with changes required for them to ship v2.0. 
RelInt deferred the container networking team's PR for a few weeks due to competing priorities including multiple CVE's fixes.
During the deferral time, a few other PRs were submitted which included breaking changes.
These additional changes took much more time to integrate and validate than anticipated and in the end, the container networking team's 2.0 release was published in cf-d about 5 weeks after it was ready to go.
The introduction of a regular cadence aims to mitigate this type of delay in the future. Had we had one at the time, the networking team would have timed it's PR to align and we would have been poised to accept and publish it quickly.
We believe this will help teams confidently plan for, communicate about, and negotiate integrating their releases into cf-deployment.
And hopefully enable the RelInt team to integrate and ship major releases more seamlessly.

This is an evolving process so we'll see how things roll in the coming months and make adjustments where it makes sense to do so. 
I appreciate and welcome any and all feedback along the way.

Thanks very much,

Josh

_._,_._,_

Links:

You receive all messages sent to this group.

View/Reply Online (#8151) | [hidden email] | [hidden email] | Mute This Topic | New Topic

Your Subscription | [hidden email] | Unsubscribe [[hidden email]]

_._,_._,_
Reply | Threaded
Open this post in threaded view
|

Re: [cf-dev] cf-deployment 3.0

Chip Childers
Food for thought: One of the challenges here is that maintaining patches for past coordinated releases is expensive (both in time and CI costs). In the CF ecosystem, this has traditionally been the responsibility of the downstream commercial distributions.

This isn't to say that there isn't a solution that can help all downstream users (including non-commercial users AND the distros), yet not burden the Rel Int team too much. I'm not sure what that solution is though...

On Mon, Jul 16, 2018 at 9:47 AM Franks, Geoff <[hidden email]> wrote:

I’m going to agree with Marco’s concerns here. Making life harder and less stable for the end users of CF has a real potential to alienate and push away the CF userbase altogether, even if it’s just in appearance (seeing monthly major releases of a product may cause new organizations to hesitate to migrate, until the release process appears more stable.

 

 

From: <[hidden email]> on behalf of Marco Voelz <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Monday, July 16, 2018 at 1:34 AM
To: "[hidden email]" <[hidden email]>
Subject: [External] Re: [cf-dev] cf-deployment 3.0

 

Dear Josh,

 

Thanks for the context, I wasn't aware of what happened before the release of networking 2.0. To stick with your example, though: From what you are saying I have understood that you would rather have done it this way – please correct me here if I'm wrong:

  • integrate networking release 2.0 into cf-deployment, 
  • integrate other PRs with breaking changes
  • bumping cf-deployment to a new major version, given above changes
  • merging the CVE fixes only into the new major version of cf-deployment

 

With this process, you would have achieved the following:

  • the development teams are happy, because they shipped as soon as they were ready to
  • operators are grumpy, because they have to bump networking to a new major version and adopt to other breaking changes in order to fix CVEs

 

I'm not saying you have to turn this tradeoff the other way around, but in my opinion this doesn't seem very consumer friendly. 

 

In your team's mission, you have clearly stated that your goal is to enable development teams to maintain a high velocity. I'd like to stress that we shouldn't leave the operators and users out of the picture here. In the end, you're developing for them, not for yourself. 

 

I'm not sure if the consumer/operator persona is a thing for RelInt, but if that's the case, here's something I'd like to hold true for whatever change RelInt makes to their process:

"As an operator of CF, I'd like to consume CVE fixes with as little changes to my existing installation as possible, such that I close known vulnerabilities as soon as possible"

 

Does that sound reasonable?

 

Warm regards

Marco


From: [hidden email] <[hidden email]> on behalf of Josh Collins <[hidden email]>
Sent: Friday, July 13, 2018 11:39:30 PM
To: [hidden email]
Subject: Re: [cf-dev] cf-deployment 3.0

 

Hi Marco,

I'm happy to provide more context on the container networking 2.0 reference.
The container networking team submitted a PR to cf-deployment with changes required for them to ship v2.0. 
RelInt deferred the container networking team's PR for a few weeks due to competing priorities including multiple CVE's fixes.
During the deferral time, a few other PRs were submitted which included breaking changes.
These additional changes took much more time to integrate and validate than anticipated and in the end, the container networking team's 2.0 release was published in cf-d about 5 weeks after it was ready to go.
The introduction of a regular cadence aims to mitigate this type of delay in the future. Had we had one at the time, the networking team would have timed it's PR to align and we would have been poised to accept and publish it quickly.
We believe this will help teams confidently plan for, communicate about, and negotiate integrating their releases into cf-deployment.
And hopefully enable the RelInt team to integrate and ship major releases more seamlessly.

This is an evolving process so we'll see how things roll in the coming months and make adjustments where it makes sense to do so. 
I appreciate and welcome any and all feedback along the way.

Thanks very much,

Josh

--
Chip Childers
CTO, Cloud Foundry Foundation
1.267.250.0815
_._,_._,_

Links:

You receive all messages sent to this group.

View/Reply Online (#8157) | [hidden email] | [hidden email] | Mute This Topic | New Topic

Your Subscription | [hidden email] | Unsubscribe [[hidden email]]

_._,_._,_
Reply | Threaded
Open this post in threaded view
|

Re: [cf-dev] cf-deployment 3.0

Jesse T. Alford
I don't agree with the claim that we didn't introduce major breaking changes in the past - we did. Routinely.

`cf-release` was sem-ver only insofar as every version was a major version. Changes just as dramatic as this were made on some but not all arbitrary major releases.

The major thing cf-d brings here is real semver, so it's _clear_ that some versions are major changes.

The credo remains the same - forward, always.

Chip's point about long-term support/backported fixes is exactly on-point. It's a major support burden, and is one of the principle pieces of work done by commercial distributors.

Jesse Alford
_Formerly of_ CF Release Integration


On Wed, Jul 18, 2018 at 11:38 AM Chip Childers <[hidden email]> wrote:
Food for thought: One of the challenges here is that maintaining patches for past coordinated releases is expensive (both in time and CI costs). In the CF ecosystem, this has traditionally been the responsibility of the downstream commercial distributions.

This isn't to say that there isn't a solution that can help all downstream users (including non-commercial users AND the distros), yet not burden the Rel Int team too much. I'm not sure what that solution is though...

On Mon, Jul 16, 2018 at 9:47 AM Franks, Geoff <[hidden email]> wrote:

I’m going to agree with Marco’s concerns here. Making life harder and less stable for the end users of CF has a real potential to alienate and push away the CF userbase altogether, even if it’s just in appearance (seeing monthly major releases of a product may cause new organizations to hesitate to migrate, until the release process appears more stable.

 

 

From: <[hidden email]> on behalf of Marco Voelz <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Monday, July 16, 2018 at 1:34 AM
To: "[hidden email]" <[hidden email]>
Subject: [External] Re: [cf-dev] cf-deployment 3.0

 

Dear Josh,

 

Thanks for the context, I wasn't aware of what happened before the release of networking 2.0. To stick with your example, though: From what you are saying I have understood that you would rather have done it this way – please correct me here if I'm wrong:

  • integrate networking release 2.0 into cf-deployment, 
  • integrate other PRs with breaking changes
  • bumping cf-deployment to a new major version, given above changes
  • merging the CVE fixes only into the new major version of cf-deployment

 

With this process, you would have achieved the following:

  • the development teams are happy, because they shipped as soon as they were ready to
  • operators are grumpy, because they have to bump networking to a new major version and adopt to other breaking changes in order to fix CVEs

 

I'm not saying you have to turn this tradeoff the other way around, but in my opinion this doesn't seem very consumer friendly. 

 

In your team's mission, you have clearly stated that your goal is to enable development teams to maintain a high velocity. I'd like to stress that we shouldn't leave the operators and users out of the picture here. In the end, you're developing for them, not for yourself. 

 

I'm not sure if the consumer/operator persona is a thing for RelInt, but if that's the case, here's something I'd like to hold true for whatever change RelInt makes to their process:

"As an operator of CF, I'd like to consume CVE fixes with as little changes to my existing installation as possible, such that I close known vulnerabilities as soon as possible"

 

Does that sound reasonable?

 

Warm regards

Marco


From: [hidden email] <[hidden email]> on behalf of Josh Collins <[hidden email]>
Sent: Friday, July 13, 2018 11:39:30 PM
To: [hidden email]
Subject: Re: [cf-dev] cf-deployment 3.0

 

Hi Marco,

I'm happy to provide more context on the container networking 2.0 reference.
The container networking team submitted a PR to cf-deployment with changes required for them to ship v2.0. 
RelInt deferred the container networking team's PR for a few weeks due to competing priorities including multiple CVE's fixes.
During the deferral time, a few other PRs were submitted which included breaking changes.
These additional changes took much more time to integrate and validate than anticipated and in the end, the container networking team's 2.0 release was published in cf-d about 5 weeks after it was ready to go.
The introduction of a regular cadence aims to mitigate this type of delay in the future. Had we had one at the time, the networking team would have timed it's PR to align and we would have been poised to accept and publish it quickly.
We believe this will help teams confidently plan for, communicate about, and negotiate integrating their releases into cf-deployment.
And hopefully enable the RelInt team to integrate and ship major releases more seamlessly.

This is an evolving process so we'll see how things roll in the coming months and make adjustments where it makes sense to do so. 
I appreciate and welcome any and all feedback along the way.

Thanks very much,

Josh

--
Chip Childers
CTO, Cloud Foundry Foundation
<a href="tel:(267)%20250-0815" value="+12672500815" target="_blank">1.267.250.0815

_._,_._,_

Links:

You receive all messages sent to this group.

View/Reply Online (#8158) | [hidden email] | [hidden email] | Mute This Topic | New Topic

Your Subscription | [hidden email] | Unsubscribe [[hidden email]]

_._,_._,_
Reply | Threaded
Open this post in threaded view
|

Re: [cf-dev] cf-deployment 3.0

Krannich, Bernd
In reply to this post by Chip Childers

I was about to mention that I indeed enjoyed the existing CF model of releases which roughly translated to “you better run fast” for consumers.

 

The thing I found needed some tweaking in the existing model was the approach to including fixes for prio very high CVEs. Often times, in our quest to run fast and keep systems secure as fast as possible, we ended up pulling in a bunch of features which required additional validation and essentially slowed us down in our effort of rolling things out to production.

 

I felt that the better approach to support people that can keep the speed would have been to always provide fixes for prio very high CVEs as cherry-picks based on the latest released version (and then of course also include those fixes into the next “regular” release, too).

 

Based on the comments so far, it sounds like for consumers “you better run fast” will actually be harder with the newly proposed approach. But maybe I’m not fully understanding the concepts, so it would be great to get some more details on the plans.

 

Regards,

Bernd

 

From: <[hidden email]> on behalf of Chip Childers <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Wednesday, 18. July 2018 at 19:38
To: "[hidden email]" <[hidden email]>
Subject: Re: [cf-dev] cf-deployment 3.0

 

Food for thought: One of the challenges here is that maintaining patches for past coordinated releases is expensive (both in time and CI costs). In the CF ecosystem, this has traditionally been the responsibility of the downstream commercial distributions.

 

This isn't to say that there isn't a solution that can help all downstream users (including non-commercial users AND the distros), yet not burden the Rel Int team too much. I'm not sure what that solution is though...

 

On Mon, Jul 16, 2018 at 9:47 AM Franks, Geoff <[hidden email]> wrote:

I’m going to agree with Marco’s concerns here. Making life harder and less stable for the end users of CF has a real potential to alienate and push away the CF userbase altogether, even if it’s just in appearance (seeing monthly major releases of a product may cause new organizations to hesitate to migrate, until the release process appears more stable.

 

 

From: <[hidden email]> on behalf of Marco Voelz <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Monday, July 16, 2018 at 1:34 AM
To: "[hidden email]" <[hidden email]>
Subject: [External] Re: [cf-dev] cf-deployment 3.0

 

Dear Josh,

 

Thanks for the context, I wasn't aware of what happened before the release of networking 2.0. To stick with your example, though: From what you are saying I have understood that you would rather have done it this way – please correct me here if I'm wrong:

  • integrate networking release 2.0 into cf-deployment, 
  • integrate other PRs with breaking changes
  • bumping cf-deployment to a new major version, given above changes
  • merging the CVE fixes only into the new major version of cf-deployment

 

With this process, you would have achieved the following:

  • the development teams are happy, because they shipped as soon as they were ready to
  • operators are grumpy, because they have to bump networking to a new major version and adopt to other breaking changes in order to fix CVEs

 

I'm not saying you have to turn this tradeoff the other way around, but in my opinion this doesn't seem very consumer friendly. 

 

In your team's mission, you have clearly stated that your goal is to enable development teams to maintain a high velocity. I'd like to stress that we shouldn't leave the operators and users out of the picture here. In the end, you're developing for them, not for yourself. 

 

I'm not sure if the consumer/operator persona is a thing for RelInt, but if that's the case, here's something I'd like to hold true for whatever change RelInt makes to their process:

"As an operator of CF, I'd like to consume CVE fixes with as little changes to my existing installation as possible, such that I close known vulnerabilities as soon as possible"

 

Does that sound reasonable?

 

Warm regards

Marco


From: [hidden email] <[hidden email]> on behalf of Josh Collins <[hidden email]>
Sent: Friday, July 13, 2018 11:39:30 PM
To: [hidden email]
Subject: Re: [cf-dev] cf-deployment 3.0

 

Hi Marco,

I'm happy to provide more context on the container networking 2.0 reference.
The container networking team submitted a PR to cf-deployment with changes required for them to ship v2.0. 
RelInt deferred the container networking team's PR for a few weeks due to competing priorities including multiple CVE's fixes.
During the deferral time, a few other PRs were submitted which included breaking changes.
These additional changes took much more time to integrate and validate than anticipated and in the end, the container networking team's 2.0 release was published in cf-d about 5 weeks after it was ready to go.
The introduction of a regular cadence aims to mitigate this type of delay in the future. Had we had one at the time, the networking team would have timed it's PR to align and we would have been poised to accept and publish it quickly.
We believe this will help teams confidently plan for, communicate about, and negotiate integrating their releases into cf-deployment.
And hopefully enable the RelInt team to integrate and ship major releases more seamlessly.

This is an evolving process so we'll see how things roll in the coming months and make adjustments where it makes sense to do so. 
I appreciate and welcome any and all feedback along the way.

Thanks very much,

Josh

--

Chip Childers
CTO, Cloud Foundry Foundation
1.267.250.0815

_._,_._,_

Links:

You receive all messages sent to this group.

View/Reply Online (#8159) | [hidden email] | [hidden email] | Mute This Topic | New Topic

Your Subscription | [hidden email] | Unsubscribe [[hidden email]]

_._,_._,_
Reply | Threaded
Open this post in threaded view
|

Re: [cf-dev] cf-deployment 3.0

David Sabeti
As the previous project lead for RelInt, I want to speak to Marco's concerns directly. We _definitely_ considered the operator as an important persona during any decision-making; if anything, we were overcommitted to that persona, evidenced by the fact that we became at times an obstacle to CFF dev teams out of fear of making a breaking changes for operators.

There's clearly some concern that operators won't be able to keep up with breaking changes. However, one impact of making breaking changes more frequently -- and, even better, on a schedule -- is to reduce the difficulty of adapting to them. To build a bit on what Josh said earlier in his example about cf-networking 2.0, as we pushed off releasing a major version of cf-deployment, more backwards-incompatible updates were stockpiled in the backlog. In the end, cf-deployment 2.0 included **seven** breaking changes instead of merely one or two.

To link this back to Marco's story -- "As an operator of CF, I'd like to consume CVE fixes with as little changes to my existing installation as possible, such that I close known vulnerabilities as soon as possible" -- this is already a problem with cf-deployment. As others have mentioned, there's no back-porting of cf-deployment after major version bumps, so operators already have to accommodate breaking changes in order to get CVE fixes. I understand that the proposal means that this happens more often, but it also means that major version bumps will be more predictable and less risky.[0]

I wasn't sure if it was worth rehashing the days of cf-release or not, but since Jesse broached the subject, I'd give his comments a +1 all around. One of the ways I understood Josh's proposal was as an important course correction. If cf-release was too free-wheeling in making breaking changes, cf-deployment has been too conservative. The proposal for a regular cadence of breaking changes seems like a balance between those two. Similarly, this is a re-balancing with regards to the personas as well: based on experience, the RelInt team has learned that it should be more willing to release breaking changes for operators in order to empower the CFF dev teams.

Sabeti
Also _formerly_ of the RelInt team


[0] Bernd has an interesting point about providing patch updates only to the latest release of cf-deployment, as a way to provide operators with a CVE-fix-only release. Providing such releases is also non-trivial work that I'm not sure the RelInt team would prioritize. Also, RelInt ships minor releases twice per week, so the changesets are typically small. Still, it seems a bit more palatable than any kind of LTS because it assists operators in living up to the "you better run fast."



On Wed, Jul 18, 2018 at 10:59 AM Krannich, Bernd <[hidden email]> wrote:

I was about to mention that I indeed enjoyed the existing CF model of releases which roughly translated to “you better run fast” for consumers.

 

The thing I found needed some tweaking in the existing model was the approach to including fixes for prio very high CVEs. Often times, in our quest to run fast and keep systems secure as fast as possible, we ended up pulling in a bunch of features which required additional validation and essentially slowed us down in our effort of rolling things out to production.

 

I felt that the better approach to support people that can keep the speed would have been to always provide fixes for prio very high CVEs as cherry-picks based on the latest released version (and then of course also include those fixes into the next “regular” release, too).

 

Based on the comments so far, it sounds like for consumers “you better run fast” will actually be harder with the newly proposed approach. But maybe I’m not fully understanding the concepts, so it would be great to get some more details on the plans.

 

Regards,

Bernd

 

From: <[hidden email]> on behalf of Chip Childers <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Wednesday, 18. July 2018 at 19:38
To: "[hidden email]" <[hidden email]>


Subject: Re: [cf-dev] cf-deployment 3.0

 

Food for thought: One of the challenges here is that maintaining patches for past coordinated releases is expensive (both in time and CI costs). In the CF ecosystem, this has traditionally been the responsibility of the downstream commercial distributions.

 

This isn't to say that there isn't a solution that can help all downstream users (including non-commercial users AND the distros), yet not burden the Rel Int team too much. I'm not sure what that solution is though...

 

On Mon, Jul 16, 2018 at 9:47 AM Franks, Geoff <[hidden email]> wrote:

I’m going to agree with Marco’s concerns here. Making life harder and less stable for the end users of CF has a real potential to alienate and push away the CF userbase altogether, even if it’s just in appearance (seeing monthly major releases of a product may cause new organizations to hesitate to migrate, until the release process appears more stable.

 

 

From: <[hidden email]> on behalf of Marco Voelz <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Monday, July 16, 2018 at 1:34 AM
To: "[hidden email]" <[hidden email]>
Subject: [External] Re: [cf-dev] cf-deployment 3.0

 

Dear Josh,

 

Thanks for the context, I wasn't aware of what happened before the release of networking 2.0. To stick with your example, though: From what you are saying I have understood that you would rather have done it this way – please correct me here if I'm wrong:

  • integrate networking release 2.0 into cf-deployment, 
  • integrate other PRs with breaking changes
  • bumping cf-deployment to a new major version, given above changes
  • merging the CVE fixes only into the new major version of cf-deployment

 

With this process, you would have achieved the following:

  • the development teams are happy, because they shipped as soon as they were ready to
  • operators are grumpy, because they have to bump networking to a new major version and adopt to other breaking changes in order to fix CVEs

 

I'm not saying you have to turn this tradeoff the other way around, but in my opinion this doesn't seem very consumer friendly. 

 

In your team's mission, you have clearly stated that your goal is to enable development teams to maintain a high velocity. I'd like to stress that we shouldn't leave the operators and users out of the picture here. In the end, you're developing for them, not for yourself. 

 

I'm not sure if the consumer/operator persona is a thing for RelInt, but if that's the case, here's something I'd like to hold true for whatever change RelInt makes to their process:

"As an operator of CF, I'd like to consume CVE fixes with as little changes to my existing installation as possible, such that I close known vulnerabilities as soon as possible"

 

Does that sound reasonable?

 

Warm regards

Marco


From: [hidden email] <[hidden email]> on behalf of Josh Collins <[hidden email]>
Sent: Friday, July 13, 2018 11:39:30 PM
To: [hidden email]
Subject: Re: [cf-dev] cf-deployment 3.0

 

Hi Marco,

I'm happy to provide more context on the container networking 2.0 reference.
The container networking team submitted a PR to cf-deployment with changes required for them to ship v2.0. 
RelInt deferred the container networking team's PR for a few weeks due to competing priorities including multiple CVE's fixes.
During the deferral time, a few other PRs were submitted which included breaking changes.
These additional changes took much more time to integrate and validate than anticipated and in the end, the container networking team's 2.0 release was published in cf-d about 5 weeks after it was ready to go.
The introduction of a regular cadence aims to mitigate this type of delay in the future. Had we had one at the time, the networking team would have timed it's PR to align and we would have been poised to accept and publish it quickly.
We believe this will help teams confidently plan for, communicate about, and negotiate integrating their releases into cf-deployment.
And hopefully enable the RelInt team to integrate and ship major releases more seamlessly.

This is an evolving process so we'll see how things roll in the coming months and make adjustments where it makes sense to do so. 
I appreciate and welcome any and all feedback along the way.

Thanks very much,

Josh

--

Chip Childers
CTO, Cloud Foundry Foundation
<a href="tel:(267)%20250-0815" value="+12672500815" target="_blank">1.267.250.0815

_._,_._,_

Links:

You receive all messages sent to this group.

View/Reply Online (#8162) | [hidden email] | [hidden email] | Mute This Topic | New Topic

Your Subscription | [hidden email] | Unsubscribe [[hidden email]]

_._,_._,_
Reply | Threaded
Open this post in threaded view
|

Re: [cf-dev] cf-deployment 3.0

Jesse T. Alford
Another point: most (certainly not all, but most) CVEs are stemcell, buildpack, or rootfs bumps that can be consumed safely/have minimal integration concerns. Even those that are in more substantive releases, such as routing and UAA, can be bumped-ahead fairly easily with a manual edit or an ops file, and are often fairly safe iterations over the releases that came before them, though it can be admittedly hard to tell.

If, somewhere along the spectrum of risk and difficulty I just described, the risk becomes too great for you to feel safe going straight to prod without relint's blessing, I recommend you test them first. If testing these adjustments to your particular environment is too burdensome, well, yes, it does become so, doesn't it?

Maintaining a whole passel of integration environments is a significant engineering and infrastructure burden, but happily it is one you can pay commercial integrators to shoulder for you.

On Wed, Jul 18, 2018 at 2:46 PM David Sabeti <[hidden email]> wrote:
As the previous project lead for RelInt, I want to speak to Marco's concerns directly. We _definitely_ considered the operator as an important persona during any decision-making; if anything, we were overcommitted to that persona, evidenced by the fact that we became at times an obstacle to CFF dev teams out of fear of making a breaking changes for operators.

There's clearly some concern that operators won't be able to keep up with breaking changes. However, one impact of making breaking changes more frequently -- and, even better, on a schedule -- is to reduce the difficulty of adapting to them. To build a bit on what Josh said earlier in his example about cf-networking 2.0, as we pushed off releasing a major version of cf-deployment, more backwards-incompatible updates were stockpiled in the backlog. In the end, cf-deployment 2.0 included **seven** breaking changes instead of merely one or two.

To link this back to Marco's story -- "As an operator of CF, I'd like to consume CVE fixes with as little changes to my existing installation as possible, such that I close known vulnerabilities as soon as possible" -- this is already a problem with cf-deployment. As others have mentioned, there's no back-porting of cf-deployment after major version bumps, so operators already have to accommodate breaking changes in order to get CVE fixes. I understand that the proposal means that this happens more often, but it also means that major version bumps will be more predictable and less risky.[0]

I wasn't sure if it was worth rehashing the days of cf-release or not, but since Jesse broached the subject, I'd give his comments a +1 all around. One of the ways I understood Josh's proposal was as an important course correction. If cf-release was too free-wheeling in making breaking changes, cf-deployment has been too conservative. The proposal for a regular cadence of breaking changes seems like a balance between those two. Similarly, this is a re-balancing with regards to the personas as well: based on experience, the RelInt team has learned that it should be more willing to release breaking changes for operators in order to empower the CFF dev teams.

Sabeti
Also _formerly_ of the RelInt team


[0] Bernd has an interesting point about providing patch updates only to the latest release of cf-deployment, as a way to provide operators with a CVE-fix-only release. Providing such releases is also non-trivial work that I'm not sure the RelInt team would prioritize. Also, RelInt ships minor releases twice per week, so the changesets are typically small. Still, it seems a bit more palatable than any kind of LTS because it assists operators in living up to the "you better run fast."



On Wed, Jul 18, 2018 at 10:59 AM Krannich, Bernd <[hidden email]> wrote:

I was about to mention that I indeed enjoyed the existing CF model of releases which roughly translated to “you better run fast” for consumers.

 

The thing I found needed some tweaking in the existing model was the approach to including fixes for prio very high CVEs. Often times, in our quest to run fast and keep systems secure as fast as possible, we ended up pulling in a bunch of features which required additional validation and essentially slowed us down in our effort of rolling things out to production.

 

I felt that the better approach to support people that can keep the speed would have been to always provide fixes for prio very high CVEs as cherry-picks based on the latest released version (and then of course also include those fixes into the next “regular” release, too).

 

Based on the comments so far, it sounds like for consumers “you better run fast” will actually be harder with the newly proposed approach. But maybe I’m not fully understanding the concepts, so it would be great to get some more details on the plans.

 

Regards,

Bernd

 

From: <[hidden email]> on behalf of Chip Childers <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Wednesday, 18. July 2018 at 19:38
To: "[hidden email]" <[hidden email]>


Subject: Re: [cf-dev] cf-deployment 3.0

 

Food for thought: One of the challenges here is that maintaining patches for past coordinated releases is expensive (both in time and CI costs). In the CF ecosystem, this has traditionally been the responsibility of the downstream commercial distributions.

 

This isn't to say that there isn't a solution that can help all downstream users (including non-commercial users AND the distros), yet not burden the Rel Int team too much. I'm not sure what that solution is though...

 

On Mon, Jul 16, 2018 at 9:47 AM Franks, Geoff <[hidden email]> wrote:

I’m going to agree with Marco’s concerns here. Making life harder and less stable for the end users of CF has a real potential to alienate and push away the CF userbase altogether, even if it’s just in appearance (seeing monthly major releases of a product may cause new organizations to hesitate to migrate, until the release process appears more stable.

 

 

From: <[hidden email]> on behalf of Marco Voelz <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Monday, July 16, 2018 at 1:34 AM
To: "[hidden email]" <[hidden email]>
Subject: [External] Re: [cf-dev] cf-deployment 3.0

 

Dear Josh,

 

Thanks for the context, I wasn't aware of what happened before the release of networking 2.0. To stick with your example, though: From what you are saying I have understood that you would rather have done it this way – please correct me here if I'm wrong:

  • integrate networking release 2.0 into cf-deployment, 
  • integrate other PRs with breaking changes
  • bumping cf-deployment to a new major version, given above changes
  • merging the CVE fixes only into the new major version of cf-deployment

 

With this process, you would have achieved the following:

  • the development teams are happy, because they shipped as soon as they were ready to
  • operators are grumpy, because they have to bump networking to a new major version and adopt to other breaking changes in order to fix CVEs

 

I'm not saying you have to turn this tradeoff the other way around, but in my opinion this doesn't seem very consumer friendly. 

 

In your team's mission, you have clearly stated that your goal is to enable development teams to maintain a high velocity. I'd like to stress that we shouldn't leave the operators and users out of the picture here. In the end, you're developing for them, not for yourself. 

 

I'm not sure if the consumer/operator persona is a thing for RelInt, but if that's the case, here's something I'd like to hold true for whatever change RelInt makes to their process:

"As an operator of CF, I'd like to consume CVE fixes with as little changes to my existing installation as possible, such that I close known vulnerabilities as soon as possible"

 

Does that sound reasonable?

 

Warm regards

Marco


From: [hidden email] <[hidden email]> on behalf of Josh Collins <[hidden email]>
Sent: Friday, July 13, 2018 11:39:30 PM
To: [hidden email]
Subject: Re: [cf-dev] cf-deployment 3.0

 

Hi Marco,

I'm happy to provide more context on the container networking 2.0 reference.
The container networking team submitted a PR to cf-deployment with changes required for them to ship v2.0. 
RelInt deferred the container networking team's PR for a few weeks due to competing priorities including multiple CVE's fixes.
During the deferral time, a few other PRs were submitted which included breaking changes.
These additional changes took much more time to integrate and validate than anticipated and in the end, the container networking team's 2.0 release was published in cf-d about 5 weeks after it was ready to go.
The introduction of a regular cadence aims to mitigate this type of delay in the future. Had we had one at the time, the networking team would have timed it's PR to align and we would have been poised to accept and publish it quickly.
We believe this will help teams confidently plan for, communicate about, and negotiate integrating their releases into cf-deployment.
And hopefully enable the RelInt team to integrate and ship major releases more seamlessly.

This is an evolving process so we'll see how things roll in the coming months and make adjustments where it makes sense to do so. 
I appreciate and welcome any and all feedback along the way.

Thanks very much,

Josh

--

Chip Childers
CTO, Cloud Foundry Foundation
<a href="tel:(267)%20250-0815" value="+12672500815" target="_blank">1.267.250.0815

_._,_._,_

Links:

You receive all messages sent to this group.

View/Reply Online (#8163) | [hidden email] | [hidden email] | Mute This Topic | New Topic

Your Subscription | [hidden email] | Unsubscribe [[hidden email]]

_._,_._,_
Reply | Threaded
Open this post in threaded view
|

Re: [cf-dev] cf-deployment 3.0

Josh Collins

Thanks Geoff, Marco, Chip, Jesse, Bernd, and David for sharing your feedback and thoughts. You’ve expressed valid concerns and provided valuable context that I take to heart. I really appreciate the time and effort required for meaningful dialogue about the impacts of the proposed release cadence.


While the RelInt team's primary goal remains supporting the CF Foundation engineering teams and their ability to validate their commits in CI, your points underscore a tension we’re acutely aware of.


We’re trying to meet the needs of both the CFF Contributor and Operator and the ‘trick’ is to find a sustainable balance between the two. However, on occasions where we must prioritize one over the other we’re going to favor the CFF Contributor.  


I mentioned this earlier, but it’s worth restating that the RelInt team doesn’t have any plans provide LTS support and as Chip and Jesse pointed out that has traditionally been a value-added service provided by commercial vendors.  


In the spirit of iteration, I’d like to propose we proceed with the release cadence I originally outlined and see how it goes.


Again, thank you for providing such valuable feedback.


Cheers,


Josh Collins

_Current_ PM of CF Release Integration
_._,_._,_

Links:

You receive all messages sent to this group.

View/Reply Online (#8164) | [hidden email] | [hidden email] | Mute This Topic | New Topic

Your Subscription | [hidden email] | Unsubscribe [[hidden email]]

_._,_._,_
Reply | Threaded
Open this post in threaded view
|

Re: [cf-dev] cf-deployment 3.0

Marco Voelz

Dear Josh, dear David,


Thanks David for sharing your past experiences in the RelInt team. I can sympathize with the stories you shared and understand the motivation for the planned changes better.


Now that cf-deployment 3.0 is there, let me tell you "how it went": It now means you have to switch to bosh-dns to receive security updates.


There is a number of reasons why we didn't introduce bosh-dns yet in our production system:

  • This ~200 lines of .yml just for aliasing DNS names [1], as the story making this obsolete isn't done yet [2]
  • This needs to be replicated e.g. in the ops-file to rename the network [3] which makes it even more terrible to maintain
  • There were open issues [4] that are important for larger-scale deployments. I give you that this is fixed now with dns-release 1.8.0 – but this came after you released cf-deployment 3.0
  • Parts of the above issue try a fix by introducing an experimental flag to get feedback from teams. Given this actually *is* an issue, I'd want to wait what comes out of this.
  • Other teams are still surprised from time to time by bosh-dns behavior and are looking into whether this might have implications they need to deal with [5]

Moreover, I think although everyone knew that bosh-dns was going to be a requirement at some point in time, the fact that this would be 3.0 was poorly communicated (I might have missed a mail there, but I cannot remember this. I would have raised my concerns earlier if that would have been the case).

For our production environment, we now have a few choices, all of them bring me headaches:
  • adopt bosh-dns *right now* although we don't feel good about it, 
  • try to bring back consul for a while (not even sure that's possible) and otherwise follow cf-deployment 3.0
  • backport security fixes only to a cf-deployment 2.x based production env

All of this brings me back to: As someone responsible for a cf-deployment production environment, I find it incredibly difficult having to deal with breaking changes like this on a regular basis to get security fixes. 

Warm regards
Marco




From: [hidden email] <[hidden email]> on behalf of Josh Collins <[hidden email]>
Sent: Wednesday, July 18, 2018 11:54:06 PM
To: [hidden email]
Subject: Re: [cf-dev] cf-deployment 3.0
 

Thanks Geoff, Marco, Chip, Jesse, Bernd, and David for sharing your feedback and thoughts. You’ve expressed valid concerns and provided valuable context that I take to heart. I really appreciate the time and effort required for meaningful dialogue about the impacts of the proposed release cadence.


While the RelInt team's primary goal remains supporting the CF Foundation engineering teams and their ability to validate their commits in CI, your points underscore a tension we’re acutely aware of.


We’re trying to meet the needs of both the CFF Contributor and Operator and the ‘trick’ is to find a sustainable balance between the two. However, on occasions where we must prioritize one over the other we’re going to favor the CFF Contributor.  


I mentioned this earlier, but it’s worth restating that the RelInt team doesn’t have any plans provide LTS support and as Chip and Jesse pointed out that has traditionally been a value-added service provided by commercial vendors.  


In the spirit of iteration, I’d like to propose we proceed with the release cadence I originally outlined and see how it goes.


Again, thank you for providing such valuable feedback.


Cheers,


Josh Collins

_Current_ PM of CF Release Integration
_._,_._,_

Links:

You receive all messages sent to this group.

View/Reply Online (#8191) | [hidden email] | [hidden email] | Mute This Topic | New Topic

Your Subscription | [hidden email] | Unsubscribe [[hidden email]]

_._,_._,_