|
Bjoern Michaelsen |
|
|
Hi all,
this is a proposal to get as much focused tinderboxing as possible with limited resources and unreliable build clients. This also assumes patches under consideration to be independant (that is: without conflicts) and not dependant on a specific order. gerrit allow us to have patches independant of order as they are not yet merged/picked to the branch. We can also easily exclude any patch on gerrit that conflicts against the existing patches under consideration: To do so we cherry-pick the new patch on the tip of the branch and then cherry-pick all patches under consideration on top of that. If it conflicts, we can report that back.(*) == Testcase == Tinderboxes will be grouped in "testcases": All tinderboxes in belonging to one "testcase" are assumed to provide the same results for the same input. The simplest testcase is: "completing a build on platform foo" while more complex testcases include "completed a build on platform foo and run checks". A testcase is assumed to have a rather stable long term empiric value that one patch does not breaks it: p-innocent(testcase). Whenever a new patch is commited for consideration we assume its inital "innocence" to be average: p-innocent(patch) = p-innocent(testcase) == Giving a tinderbox a subset to test == There are two possibly useful results out of a tinderbox run: - If the tinderbox run is successful, we can mark all patches as good and remove them from the patches under consideration - If the tinderbox run is a failure and only contains one patch, we can mark that patch as bad and remove it from the patches under consideration - All other cases have no directly deductable action === optimistic tinderbox run === A completed tinderbox run provides us with the most information, if we assume it to fail half of the time (biggest insecurity). So, this is the first thing to give a tinderbox to test: - Pick the most innocent looking patches (highest p-innocent(patch)) - for patches with the same innocence prefer the oldest ones first - Add patches as long as the innocence of the patchset: p-innocent(patchset) = product(p-innocent(patchn) ...) is above 0.5 After sending off the tinderbox with this patchset, do: p-innocent(patch) *= p-innocent(testcase) reducing the innocence of the patches without waiting for results. So we assume the patches of the set to be partially guilty until proven innocent. If the tinderbox succeeds, mark all patches as good and remove them from the patches under consideration. === pessimistic tinderbox run === A failed tinderbox run provides us with most information, if it contains only one patch. So if the innocence of one or more patches drops below 0.5: p-innocent(dirtypatch)<= 0.5 or if there is currently only one patch under consideration instead of doing the optimistic approach above, do a pessimistic tinderbox run with just one patch. Select the patch with the lowest p-innocent(patch) and, if there are multiple patches with equal p-innocent(patch), select the oldest candidate. After sending off the tinderbox with just this one patch, do: p-innocent(patch)/=p-innocent(testcase) for all patches in each tinderbox run with this patch (once for each run) without waiting for the tinderbox to return with a result. So we assume most of those patches to be more innocent now (hoping the blame is on the dirtypatch). The suspicous patch should now be back to the initial value: p-innocent(dirtypatch)=p-innocent(testcase) The tinderbox will hopefully mark that patch as good or bad soon anyway and remove it from the patches under consideration. == advantages == This approach: - Does not rely on the tinderboxes to ever come back with a result -- it should be robust against lost results. - does not need to test each patch individually, if the average patch quality is not too bad - requires only little bookkeeping on the side of the dispatcher - should allow us to confidently mark all patches as 'doesnt break master for this testcase' Opinions/Additions/Corrections? Objections? If not, we only need somebody to script this ;) Best, Bjoern P.S.: Dont show my careless use of 'propabilities' and such to the physics department at the university where I got my degree. It will just result in snorty laughs and cruel teasing. (*) Conflicts might be rare though, so we might even get away with the optimistic approach: just hoping that considered patches never conflict.Hi all, this is a proposal to get as much focused tinderboxing as possible with limited resources and unreliable build clients. This also assumes patches under consideration to be independant (that is: without conflicts) and not dependant on a specific order. gerrit allow us to have patches independant of order as they are not yet merged/picked to the branch. We can also easily exclude any patch on gerrit that conflicts against the existing patches under consideration: To do so we cherry-pick the new patch on the tip of the branch and then cherry-pick all patches under consideration on top of that. If it conflicts, we can report that back.(*) == Testcase == Tinderboxes will be grouped in "testcases": All tinderboxes in belonging to one "testcase" are assumed to provide the same results for the same input. The simplest testcase is: "completing a build on platform foo" while more complex testcases include "completed a build on platform foo and run checks". A testcase is assumed to have a rather stable long term empiric value that one patch does not breaks it: p-innocent(testcase). Whenever a new patch is commited for consideration we assume its inital "innocence" to be average: p-innocent(patch) = p-innocent(testcase) == Giving a tinderbox a subset to test == There are two possibly useful results out of a tinderbox run: - If the tinderbox run is successful, we can mark all patches as good and remove them from the patches under consideration - If the tinderbox run is a failure and only contains one patch, we can mark that patch as bad and remove it from the patches under consideration - All other cases have no directly deductable action === optimistic tinderbox run === A completed tinderbox run provides us with the most information, if we assume it to fail half of the time (biggest insecurity). So, this is the first thing to give a tinderbox to test: - Pick the most innocent looking patches (highest p-innocent(patch)) - for patches with the same innocence prefer the oldest ones first - Add patches as long as the innocence of the patchset: p-innocent(patchset) = product(p-innocent(patchn) ...) is above 0.5 After sending of the tinderbox with this patchset, do: p-innocent(patch) *= p-innocent(testcase) reducing the innocence of the patches without waiting for results. So we assume the patches of the set to be partially guilty until proven innocent. If the tinderbox succeeds, mark all patches as good and remove them from the patches under consideration. === pessimistic tinderbox run === A failed tinderbox run provides us with most information, if it contains only one patch. So if the innocence of one or more patches drops below 0.5: p-innocent(dirtypatch)<= 0.5 instead of doing the optimistic approach above, do a pessimistic tinderbox run with just one patch. Select the patch with the lowest p-innocent(patch) and, if there are multiple patches with equal p-innocent(patch), select the oldest candidate. After sending of the tinderbox with just this one patch, do: p-innocent(patch)/=p-innocent(testcase) for all patches in each tinderbox run with this patch (once for each run) without waiting for the tinderbox to return with a result. So we assume most of those patches to be more innocent now (hoping the blame is on the dirtypatch). The suspicous patch should now be back to the initial value: p-innocent(dirtypatch)=p-innocent(testcase) The tinderbox will hopefully mark that patch as good or bad soon anyway and remove it from the patches under consideration. == advantages == This approach: - Does not rely on the tinderboxes to ever come back with a result -- it should be robust against lost results. - does not need to test each patch individually, if the average patch quality is not too bad - requires only little bookkeeping on the side of the dispatcher - should allow us to confidently mark all patches as 'doesnt break master for this testcase' Opinions/Additions/Corrections? Objections? If not, we only need somebody to script this ;) Best, Bjoern P.S.: Dont show my careless use of 'propabilities' and such to the physics department at the university where I got my degree. It will just result in snorty laughs and cruel teasing. (*) Conflicts might be rare though, so we might even get away with the optimistic approach: just hoping that considered patches never conflict. _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Bjoern Michaelsen |
|
|
On Tue, Jun 12, 2012 at 04:45:06PM +0200, Bjoern Michaelsen wrote:
> After sending off the tinderbox with this patchset, do: > p-innocent(patch) *= p-innocent(testcase) > reducing the innocence of the patches without waiting for results. So we assume > the patches of the set to be partially guilty until proven innocent. On a second thought that should actually be: p-innocent(patch) *= (number of patches in patchset-1)/(number of patches in patchset) Best, Bjoern _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Bjoern Michaelsen |
|
|
In reply to this post by Bjoern Michaelsen
On Tue, Jun 12, 2012 at 04:45:06PM +0200, Bjoern Michaelsen wrote:
> After sending off the tinderbox with just this one patch, do: > > p-innocent(patch)/=p-innocent(testcase) > > for all patches in each tinderbox run with this patch (once for each run) ... and this should probably then be: > p-innocent(patch)/=0.5*(number of patches in patchset-1)/(numbers of patches in patchset) as we assume p-innocent(dirtypatch) ~ 0.5 and this reverts half of the 'guilt' from tinderbox runs with both patches in it. Best, Bjoern _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Eike Rathke-2 |
|
|
In reply to this post by Bjoern Michaelsen
Hi Bjoern,
On Tuesday, 2012-06-12 16:45:06 +0200, Bjoern Michaelsen wrote: > This also assumes patches under consideration to be independant (that > is: without conflicts) and not dependant on a specific order. Maybe I misunderstood, but the independency and arbitrary order I see as a problem. People do work in iterative steps.. so if we could at least assure that patches by the same author are ordered correctly that might work. Eike -- LibreOffice Calc developer. Number formatter stricken i18n transpositionizer. GnuPG key 0x293C05FD : 997A 4C60 CE41 0149 0DB3 9E96 2F1A D073 293C 05FD _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Bjoern Michaelsen |
|
|
Hi,
On Wed, Jun 13, 2012 at 02:03:31PM +0200, Eike Rathke wrote: > Maybe I misunderstood, but the independency and arbitrary order I see as > a problem. People do work in iterative steps.. so if we could at least > assure that patches by the same author are ordered correctly that might > work. Well, I simplified a bit here. When using gerrit you would push one patch separate if it is independant and multiple patches as a series if they are dependant. gerrit handles that correctly as a 'patch series'. I would not want to testbuild in the middle of one such series of patches -- so whenever there are multiple dependant patches in gerrit they would be viewed as 'one' by the tinderboxes. If one has dependant patches _and_ wants to have them tinderboxed separately one could push them in multiple sets. However, for now I consider that scenario a cornercase (its usually good enough to test the final state of a series of dependant patches). Best, Bjoern _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Norbert Thiebaud |
|
|
On Wed, Jun 13, 2012 at 7:20 AM, Bjoern Michaelsen
<[hidden email]> wrote: > > Well, I simplified a bit here. When using gerrit you would push one patch > separate if it is independant and multiple patches as a series if they are > dependant. gerrit handles that correctly as a 'patch series'. I would not want > to testbuild in the middle of one such series of patches Well... it _should_ be buildable at each step of the patch series... if not, clean-it up, it is not yet 'published' in the git sens of the term so you still have a chance to git rebase -i to fix things up. Building at each commit is very important to have useful bisections. Norbert _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Bjoern Michaelsen |
|
|
On Wed, Jun 13, 2012 at 08:00:12AM -0500, Norbert Thiebaud wrote:
> Well... it _should_ be buildable at each step of the patch series... > if not, clean-it up, it is not yet 'published' in the git sens of the > term so you still have a chance to git rebase -i to fix things up. > > Building at each commit is very important to have useful bisections. But we are doing that not now as pushes are atomic (or more precise: ref-updates are), so tinderboxes wouldnt see a half-push. Also all patches in a series have the same owner, so blaming is easy. The first goal of the tinderboxes is to have the master tip buildable all times -- having it buildable at every random commit is secondary. Also: bibisect etc. could be taught to only look for commits where the next commit date is at least 5 minutes off. Best, Bjoern _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Norbert Thiebaud |
|
|
On Wed, Jun 13, 2012 at 8:53 AM, Bjoern Michaelsen
<[hidden email]> wrote: > But we are doing that not now as pushes are atomic (or more precise: > ref-updates are), so tinderboxes wouldnt see a half-push. true. but just because we don't, doesn't mean we shouldn't > Also all patches in a > series have the same owner, so blaming is easy. blaming is one thing, quality is another. > The first goal of the > tinderboxes is to have the master tip buildable all times -- having it > buildable at every random commit is secondary. secondary but important nonetheless. > Also: bibisect etc. could be > taught to only look for commits where the next commit date is at least 5 minutes > off. No, that would not help. two commit cane within 5 minutes and yet independant and 2 commit can be 5 minutes apart and yet belong to the same patch-serie iow, the timestamp of a commit is not very helpful in general wrt to tinderbox Norbert _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Stephan Bergmann-2 |
|
|
In reply to this post by Norbert Thiebaud
On 06/13/2012 03:00 PM, Norbert Thiebaud wrote:
> On Wed, Jun 13, 2012 at 7:20 AM, Bjoern Michaelsen > <[hidden email]> wrote: >> >> Well, I simplified a bit here. When using gerrit you would push one patch >> separate if it is independant and multiple patches as a series if they are >> dependant. gerrit handles that correctly as a 'patch series'. I would not want >> to testbuild in the middle of one such series of patches > > Well... it _should_ be buildable at each step of the patch series... I think you are confusing buildability after each patch of a series has been applied in sequence (which is a desirable property indeed) vs. buildability with each patch of a series applied individually ("out of context," what Eike's concern was). Stephan _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Lubos Lunak |
|
|
In reply to this post by Bjoern Michaelsen
On Tuesday 12 of June 2012, Bjoern Michaelsen wrote:
> Hi all, > > this is a proposal to get as much focused tinderboxing as possible with > limited resources and unreliable build clients. This also assumes patches > under consideration to be independant (that is: without conflicts) and not > dependant on a specific order. gerrit allow us to have patches independant > of order as they are not yet merged/picked to the branch. We can also > easily exclude any patch on gerrit that conflicts against the existing > patches under > consideration: To do so we cherry-pick the new patch on the tip of the > branch and then cherry-pick all patches under consideration on top of that. > If it conflicts, we can report that back.(*) > > == Testcase == > > Tinderboxes will be grouped in "testcases": All tinderboxes in belonging to > one "testcase" are assumed to provide the same results for the same input. > The simplest testcase is: "completing a build on platform foo" while more > complex testcases include "completed a build on platform foo and run > checks". Where are we going to get these tinderboxes? The page for master lists 17 tinderboxes, out of which only about 11 build regularly. Out of which there are 4 linux-gcc tinderboxes, one linux-clang, 1(2?) macosx, 3 windows and 1 mingw. That, not counting any further division like running or not running checks, gives 5 testcases if I understand it correctly, and building successfully in any of them does not guarantee that the rest builds as well. So, the thing I do not understand, how is this (complex, as far as I can say) setup supposed to improve anything? There does not seem to be any big difference between dedicating X tinderboxes to it or just 1-2 tinderboxes. If we put a number of tinderboxes on this, they still won't test the patches enough, while there won't be that many left for checking that the actual repository builds fine, which will be needed anyway. If patches in gerrit[*] need to be tested, it should work just as well to use one of the really fast tinderboxes for checking that it builds, which will cover a majority of possible problems, then maybe one more can build including checks, and if any problem gets through, it'll be caught by the normal tinderboxes, which are needed anyway (for direct commits, some other seemingly unrelated commit breaking the patch meanwhile, etc.). [*] Speaking of which, when will we get told about this gerrit thing, besides random remarks, or have I missed it? -- Lubos Lunak [hidden email] _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Bjoern Michaelsen |
|
|
Hi,
On Thu, Jun 14, 2012 at 12:36:12PM +0200, Lubos Lunak wrote: > Where are we going to get these tinderboxes? The page for master lists 17 > tinderboxes, out of which only about 11 build regularly. Out of which there > are 4 linux-gcc tinderboxes, one linux-clang, 1(2?) macosx, 3 windows and 1 > mingw. That, not counting any further division like running or not running > checks, gives 5 testcases if I understand it correctly, and building > successfully in any of them does not guarantee that the rest builds as well. Right -- although I would count the linux-gcc-with-subsequentcheck tinderbox as a single test case. > So, the thing I do not understand, how is this (complex, as far as I can say) > setup supposed to improve anything? There does not seem to be any big > difference between dedicating X tinderboxes to it or just 1-2 tinderboxes. If > we put a number of tinderboxes on this, they still won't test the patches > enough, while there won't be that many left for checking that the actual > repository builds fine, which will be needed anyway. I dont think we should split up tinderboxes between this and repository builds (I asssume you mean testbuilding the head of master with that). As the patches are cherrypicked on the head of master with this, master is already tested along with this. However, while people are in the beginning still pushing even risky patches directly to master we cant assume master to be flawless all of the time -- so the proposed algo should be extended with "build the master you cherry-picked upon once after each failed build". What is this setup improving? Simple: - find buildbreakers _before_ they hit master - with gerrit, you can push your patch, wait for the tinderbox result you need and _then_ put it on master (so this gives everyone easy access to windows test builders, which as we all know are a pain to set up, and also subsequentcheck runs) - reduce tinderbox spammage as the list of possible offenders is smaller -- and esp. is not growing with every build - allows greenlighting bigger sets of patches with one compile (essential on slower platforms, f.e. Windows) > If patches in gerrit[*] need to be tested, it should work just as well to use > one of the really fast tinderboxes for checking that it builds, which will > cover a majority of possible problems, then maybe one more can build > including checks, and if any problem gets through, it'll be caught by the > normal tinderboxes, which are needed anyway (for direct commits, some other > seemingly unrelated commit breaking the patch meanwhile, etc.). That doesnt scale that well. Also note that the tinderboxes building plain master should end up a scenario that is very, very rarely needed (it will still be done of course) Best, Bjoern > [*] Speaking of which, when will we get told about this gerrit thing, besides > random remarks, or have I missed it? Look for a mail with "ACTION REQUIRED" in the subject in the libreoffice mailing list archive. _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Norbert Thiebaud |
|
|
In reply to this post by Stephan Bergmann-2
On Thu, Jun 14, 2012 at 1:59 AM, Stephan Bergmann <[hidden email]> wrote:
> On 06/13/2012 03:00 PM, Norbert Thiebaud wrote: > > > I think you are confusing buildability after each patch of a series has been > applied in sequence (which is a desirable property indeed) vs. buildable > with each patch of a series applied individually ("out of context," what > Eike's concern was). When you apply a series in sequence, at any point the result must be buildable... 1/ it is good practice 2/ if that is not the case bisection is much harder If you pushed directly to master and happened to push something that is not buildable, then of course you can't go back and fix it... but with gerrit, as long as your patches series is not 'merged' then you can git rebase interactive to fix the offending commit. Yes, you cannot pick a random patch in a series, apply just that and expect it to be buildable but if my series has 3 patches 1,2,3 the If I apply (1) (1,2) or (1,2,3) I expect the result to be buildable Norbert _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Stephan Bergmann-2 |
|
|
On 06/14/2012 04:57 PM, Norbert Thiebaud wrote:
> On Thu, Jun 14, 2012 at 1:59 AM, Stephan Bergmann<[hidden email]> wrote: >> On 06/13/2012 03:00 PM, Norbert Thiebaud wrote: >> >> >> I think you are confusing buildability after each patch of a series has been >> applied in sequence (which is a desirable property indeed) vs. buildable >> with each patch of a series applied individually ("out of context," what >> Eike's concern was). > > When you apply a series in sequence, at any point the result must be > buildable... > 1/ it is good practice > 2/ if that is not the case bisection is much harder > > If you pushed directly to master and happened to push something that > is not buildable, then of course you can't go back and fix it... > but with gerrit, as long as your patches series is not 'merged' then > you can git rebase interactive to fix the offending commit. > > Yes, you cannot pick a random patch in a series, apply just that and > expect it to be buildable > but if my series has 3 patches 1,2,3 > > the If I apply (1) (1,2) or (1,2,3) I expect the result to be buildable Yes sure (and I think nobody is arguing with that). Its just that I felt you guys where talking past each other, one talking about an "out of context buildability" problem while one discussing "sequential buildability." So, best forget that I said anything, anyway---seems to produce more confusion than clarification... ;) Stephan _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Bjoern Michaelsen |
|
|
In reply to this post by Norbert Thiebaud
On Thu, Jun 14, 2012 at 09:57:48AM -0500, Norbert Thiebaud wrote:
> When you apply a series in sequence, at any point the result must be > buildable... > 1/ it is good practice > 2/ if that is not the case bisection is much harder ... but it has a lower importance than having the tip of master buildable all the time. Which is why a tinderbox should leave testing intermediate states aside as long as there are still things to test that ensure the tip of master to stay buildable. Best, Bjoern _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
| Powered by Nabble | Edit this page |