|
Michael Meeks-2 |
|
|
So,
I was wondering, as I tried to work out where my disk space had gone ... ;-) why we have such an excessive size of Dep files. I appreciate that dependencies were totally busted in the past, and that life should now be a lot better; but ... $ du -m workdir/unxlngi6.pro/Dep | sort -n ... 90128 Dep/CxxObject/sd 109700 Dep/UnoApiPartTarget/offapi/com/sun/star 109704 Dep/UnoApiPartTarget/offapi/com/sun 109708 Dep/UnoApiPartTarget/offapi/com 109712 Dep/UnoApiPartTarget/offapi 135020 Dep/UnoApiPartTarget 142516 Dep/CxxObject/sc/source 143320 Dep/CxxObject/sc 162596 Dep/CxxObject/sw/source 163652 Dep/CxxObject/sw 452440 Dep/LinkTarget/Library 462560 Dep/LinkTarget 904564 Dep/CxxObject 1575964 Dep Seems like quite a lot - interestingly it's larger than the cxx objects, and the shared libraries and ... > 50% of workdir/ It is true I use an /opt/libreoffice/master - prefix instead of /a/b - but even so ... Sadly, catting all those .d files into a pipeline, sorting them etc. appears to consume my 4Gb of available disk space and fail ;-) there are after all 21 million lines there. Ho hum, Michael. -- [hidden email] <><, Pseudo Engineer, itinerant idiot _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Michael Meeks-2 |
|
|
And some thoughts on improving this; currently we add a ton of deps for packages that are really internal and typically change en-masse or not at all. eg. 'boost' - it is installed, and then ~never changes again - people don't edit a single boost header and expect a dependency clean re-compile. Ergo: find -type f | xargs grep 'solver.*boost/' | wc -l 4.5 million hits (of 21.5million - worth having). IMHO we could - without significant loss of functionality reduce all those deps to a single stamp file (which we prolly install anyway) in the solver. Should be ~trivial to elide in our dep-re-writing anyway, save > 300Mb of space etc. Should just be a few lines in solenv/bin/concat-deps.c My question would be: do we want to continue to make 'over precise deps' at this large computational, space and build performance cost possible ? ATB, Michael. -- [hidden email] <><, Pseudo Engineer, itinerant idiot _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Michael Stahl-2 |
|
|
On 01/06/12 11:16, Michael Meeks wrote:
> > And some thoughts on improving this; currently we add a ton of deps for > packages that are really internal and typically change en-masse or not > at all. > > eg. 'boost' - it is installed, and then ~never changes again - people > don't edit a single boost header and expect a dependency clean > re-compile. > > Ergo: > > find -type f | xargs grep 'solver.*boost/' | wc -l > > 4.5 million hits (of 21.5million - worth having). wow, that's a lot! > IMHO we could - without significant loss of functionality reduce all > those deps to a single stamp file (which we prolly install anyway) in > the solver. Should be ~trivial to elide in our dep-re-writing anyway, > save > 300Mb of space etc. > > Should just be a few lines in solenv/bin/concat-deps.c yes. just need to find the proper regex that doesn't have false positives :) > My question would be: do we want to continue to make 'over precise > deps' at this large computational, space and build performance cost > possible ? this sounds like a good idea to me, the external stuff tends not to change very often, and when it does just re-building everything that depends on it in any way is an acceptable cost; ccache will hopefully take care of the cases where the change really doesn't affect the cxx file. in a case like boost, which makes up a significant share of includes, this would of course also make the dependency parsing of make faster, which is, as you have found out, the bottleneck in make startup performance. also, there are some external modules that don't deliver headers properly or something, i remember the recent ICU upgrade breaking all kinds of stuff in an incremental build because the header timestamps were somehow wrong, so please do it for icu headers as well :) _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Norbert Thiebaud |
|
|
On Fri, Jun 1, 2012 at 5:10 AM, Michael Stahl <[hidden email]> wrote:
> On 01/06/12 11:16, Michael Meeks wrote: >> >> And some thoughts on improving this; currently we add a ton of deps for >> packages that are really internal and typically change en-masse or not >> at all. >> >> eg. 'boost' - it is installed, and then ~never changes again - people >> don't edit a single boost header and expect a dependency clean >> re-compile. >> >> Ergo: >> >> find -type f | xargs grep 'solver.*boost/' | wc -l >> >> 4.5 million hits (of 21.5million - worth having). > > wow, that's a lot! > >> IMHO we could - without significant loss of functionality reduce all >> those deps to a single stamp file (which we prolly install anyway) in >> the solver. Should be ~trivial to elide in our dep-re-writing anyway, >> save > 300Mb of space etc. >> >> Should just be a few lines in solenv/bin/concat-deps.c > > yes. just need to find the proper regex that doesn't have false positives :) getting OUTDIR from the env and substituting $OUTDIR/inc/<module>/.* with $WORKDIR/Package/<module>_inc should work for gbuild module for dmake... maybe patching deliver.pl to touch $WORKDIR/Package/<module>_inc when we deliver ? Of course that add the formal obligation that the 'package' that deliver the includes of a module be always named <module>_inc which is a soft requirement today I think from a concat-dep.c pow, that means teaching it to manage per-target list of subsitution (the goal is to have only one <module>_inc dep instead of many *.h for that module...) that mean among other things a hash creation/destroy per target with dep i.e need to put hash object creation/destruction under memory pool management, and being a bit creative to detect the above mention pattern as we parse the input (want to avoid parsing it twice) Norbert _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Caolán McNamara |
|
|
In reply to this post by Michael Stahl-2
On Fri, 2012-06-01 at 12:10 +0200, Michael Stahl wrote:
> i remember the recent ICU upgrade breaking all kinds of stuff in an > incremental build because the header timestamps were somehow wrong I fixed that one since FWIW. Though we don't have any dependencies on the output of the icu build-time tools on the build-tools themselves I think so they don't get regenerated on an icu version bump in an incremental build (I think) C. _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Lubos Lunak |
|
|
In reply to this post by Michael Meeks-2
On Friday 01 of June 2012, Michael Meeks wrote:
> And some thoughts on improving this; currently we add a ton of deps for > packages that are really internal and typically change en-masse or not > at all. > > eg. 'boost' - it is installed, and then ~never changes again - people > don't edit a single boost header and expect a dependency clean > re-compile. ... > IMHO we could - without significant loss of functionality reduce all > those deps to a single stamp file (which we prolly install anyway) in > the solver. Should be ~trivial to elide in our dep-re-writing anyway, > save > 300Mb of space etc. > > Should just be a few lines in solenv/bin/concat-deps.c > > My question would be: do we want to continue to make 'over precise > deps' at this large computational, space and build performance cost > possible ? I think there are several ways of reducing the size of .d files that are safer: - a significant part of .d content is the LO build directory - defining that one and some other common paths (solver, workdir) and sed /path/define/ should save quite some space (50% or possibly even more) - a significant part of .d content is the depend-on-nothing deps created by -MP , if those would be merged into one dedicated .d file that'd save a lot of space as well; not sure if this is easily doable though -- Lubos Lunak [hidden email] _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Bjoern Michaelsen |
|
|
On Mon, Jun 04, 2012 at 03:56:41PM +0200, Lubos Lunak wrote:
> On Friday 01 of June 2012, Michael Meeks wrote: > > And some thoughts on improving this; currently we add a ton of deps for > > packages that are really internal and typically change en-masse or not > > at all. > > > > eg. 'boost' - it is installed, and then ~never changes again - people > > don't edit a single boost header and expect a dependency clean > > re-compile. > ... > > IMHO we could - without significant loss of functionality reduce all > > those deps to a single stamp file (which we prolly install anyway) in > > the solver. Should be ~trivial to elide in our dep-re-writing anyway, > > save > 300Mb of space etc. > > > > Should just be a few lines in solenv/bin/concat-deps.c But concat-deps is _after_ the creation dep files of the objects (which will then still be huge). > > > > My question would be: do we want to continue to make 'over precise > > deps' at this large computational, space and build performance cost > > possible ? > > I think there are several ways of reducing the size of .d files that are > safer: > > - a significant part of .d content is the LO build directory - defining that > one and some other common paths (solver, workdir) and sed /path/define/ > should save quite some space (50% or possibly even more) We even did that once IIRC using $(OUTDIR)/$(WORKDIR) vars which are known to gbuild already. > - a significant part of .d content is the depend-on-nothing deps created > by -MP , if those would be merged into one dedicated .d file that'd save a > lot of space as well; not sure if this is easily doable though Arent we doing that already when merging the .d files for one library? Note however, that every bit of added complexity to the build system will bite you back one day. Currently the Deps are ~10% of the working directory -- even if you reduce them by 90%(*), nobody will jump with joy because his build now fits on his drive by being 9% smaller. So right now, I consider the topic premature optimization until proven otherwise. Best, Bjoern (*) which you wont unless you gzip them (which is doable and shouldnt have too big of an performance impact) _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Noel Grandin |
|
|
On 2012-06-04 17:19, Bjoern Michaelsen wrote: > Note however, that every bit of added complexity to the build system > will bite you back one day. Currently the Deps are ~10% of the working > directory -- even if you reduce them by 90%(*), > (*) which you wont unless you gzip them (which is doable and shouldnt > have too big of an performance impact) When dealing with simple stuff like ascii text, the compress command is - practically free on modern CPU's, - uses minimal memory, and gives you 80% of the benefit of the more sophisticated compression tools. Disclaimer: http://www.peralex.com/disclaimer.html _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Michael Stahl-2 |
|
|
In reply to this post by Bjoern Michaelsen
On 04/06/12 17:19, Bjoern Michaelsen wrote:
> On Mon, Jun 04, 2012 at 03:56:41PM +0200, Lubos Lunak wrote: >> On Friday 01 of June 2012, Michael Meeks wrote: >>> IMHO we could - without significant loss of functionality reduce all >>> those deps to a single stamp file (which we prolly install anyway) in >>> the solver. Should be ~trivial to elide in our dep-re-writing anyway, >>> save > 300Mb of space etc. >>> >>> Should just be a few lines in solenv/bin/concat-deps.c > > But concat-deps is _after_ the creation dep files of the objects (which will > then still be huge). who cares how big the files are (disk is cheap), the relevant metric is: how many seconds does make need to parse them? >>> My question would be: do we want to continue to make 'over precise >>> deps' at this large computational, space and build performance cost >>> possible ? >> >> I think there are several ways of reducing the size of .d files that are >> safer: >> >> - a significant part of .d content is the LO build directory - defining that >> one and some other common paths (solver, workdir) and sed /path/define/ >> should save quite some space (50% or possibly even more) > > We even did that once IIRC using $(OUTDIR)/$(WORKDIR) vars which are known to > gbuild already. yes, but keep in mind that variables in the dep files will need to be expanded by make, which will likely result in memory allocations, so it's an open question whether that will actually improve performance or slow it down. >> - a significant part of .d content is the depend-on-nothing deps created >> by -MP , if those would be merged into one dedicated .d file that'd save a >> lot of space as well; not sure if this is easily doable though > > Arent we doing that already when merging the .d files for one library? to some extent yes, but of course the same headers are included in many libraries, so there is still some amount of duplication there; however i don't know to improve this without breaking separate building of modules, which requires the LinkTarget .d files to be self-contained. > So right now, I consider the topic premature optimization until proven otherwise. > (*) which you wont unless you gzip them (which is doable and shouldnt have too > big of an performance impact) sadly AFAIK make cannot include compressed files... _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Noel Grandin |
|
|
On 2012-06-04 18:38, Michael Stahl wrote: > sadly AFAIK make cannot include compressed files... True. But it should be possible to either (a) teach it that trick or (b) check if the filesystem supports it, and if so, activate per-file compression. NTFS: http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/compact.mspx EXTFS http://e2compr.sourceforge.net/attic/manual-0.3/e2compr_45.html BTRFS https://blogs.oracle.com/wim/entry/btrfs_compression Disclaimer: http://www.peralex.com/disclaimer.html _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Bjoern Michaelsen |
|
|
In reply to this post by Michael Stahl-2
On Mon, Jun 04, 2012 at 06:38:46PM +0200, Michael Stahl wrote:
> who cares how big the files are (disk is cheap), the relevant metric is: > how many seconds does make need to parse them? > ... > yes, but keep in mind that variables in the dep files will need to be > expanded by make, which will likely result in memory allocations, so > it's an open question whether that will actually improve performance or > slow it down. IIRC I made some measurements back then. Though not very scientific, they suggested it makes no difference at all. > >> - a significant part of .d content is the depend-on-nothing deps created > >> by -MP , if those would be merged into one dedicated .d file that'd save a > >> lot of space as well; not sure if this is easily doable though > > > > Arent we doing that already when merging the .d files for one library? > > to some extent yes, but of course the same headers are included in many > libraries, so there is still some amount of duplication there; however i > don't know to improve this without breaking separate building of > modules, which requires the LinkTarget .d files to be self-contained. > > > So right now, I consider the topic premature optimization until proven otherwise. > > > (*) which you wont unless you gzip them (which is doable and shouldnt have too > > big of an performance impact) > > sadly AFAIK make cannot include compressed files... We never include the dep-files of the objects ( $(WORKDIR)/Dep/CxxObject ), only the concated per-library output ( $(WORKDIR)/Dep/LinkTarget ), right? Best, Bjoern _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Michael Meeks-2 |
|
|
In reply to this post by Bjoern Michaelsen
Hi Bjoern,
On Mon, 2012-06-04 at 17:19 +0200, Bjoern Michaelsen wrote: > But concat-deps is _after_ the creation dep files of the objects (which will > then still be huge). Sigh; true - it'll mostly speedup build time then; then again we could filter them during / after generation I suppose. > > - a significant part of .d content is the depend-on-nothing deps created > > by -MP , if those would be merged into one dedicated .d file that'd save a > > lot of space as well; not sure if this is easily doable though > > Arent we doing that already when merging the .d files for one library? Right; that is done already as we merge to a library. > Note however, that every bit of added complexity to the build system will bite > you back one day. Currently the Deps are ~10% of the working directory -- even > if you reduce them by 90%(*) Why do you think that Deps are 10% of the working directory ? $ du -m workdir/unxlngi6.pro | tail -n 1 2906 unxlngi6.pro $ du -m workdir/unxlngi6.pro/Dep | tail -n 1 1546 unxlngi6.pro/Dep For me that reads > 50% of the size. > So right now, I consider the topic premature optimization until proven otherwise. Is my workdir abnormally different to yours ? it's normal to have > 1.5Gb of 'stuff' in there I think - at least someone I sanity checked with saw that too - IMHO it's too big. ATB, Michael. -- [hidden email] <><, Pseudo Engineer, itinerant idiot _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Rob Snelders-2 |
|
|
Hi All,
I want to add that the buildspeed and disksize does matter. Not everybody who wants to help can afford the newest machines. -- Greetings, Rob Snelders Op 04-06-12 19:53, Michael Meeks schreef: > Hi Bjoern, > > On Mon, 2012-06-04 at 17:19 +0200, Bjoern Michaelsen wrote: >> But concat-deps is _after_ the creation dep files of the objects (which will >> then still be huge). > Sigh; true - it'll mostly speedup build time then; then again we could > filter them during / after generation I suppose. > >>> - a significant part of .d content is the depend-on-nothing deps created >>> by -MP , if those would be merged into one dedicated .d file that'd save a >>> lot of space as well; not sure if this is easily doable though >> Arent we doing that already when merging the .d files for one library? > Right; that is done already as we merge to a library. > >> Note however, that every bit of added complexity to the build system will bite >> you back one day. Currently the Deps are ~10% of the working directory -- even >> if you reduce them by 90%(*) > Why do you think that Deps are 10% of the working directory ? > > $ du -m workdir/unxlngi6.pro | tail -n 1 > 2906 unxlngi6.pro > $ du -m workdir/unxlngi6.pro/Dep | tail -n 1 > 1546 unxlngi6.pro/Dep > > For me that reads> 50% of the size. > >> So right now, I consider the topic premature optimization until proven otherwise. > Is my workdir abnormally different to yours ? it's normal to have> > 1.5Gb of 'stuff' in there I think - at least someone I sanity checked > with saw that too - IMHO it's too big. > > ATB, > > Michael. > LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Bjoern Michaelsen |
|
|
In reply to this post by Michael Meeks-2
On Mon, Jun 04, 2012 at 06:53:56PM +0100, Michael Meeks wrote:
> Sigh; true - it'll mostly speedup build time then; then again we could > filter them during / after generation I suppose. But thats not the big part - as a: du -h $(WORKDIR)/Dep/* will confirm -- CxxObjects is rather large ... Im beginning to like the idea of gzipping those. > Why do you think that Deps are 10% of the working directory ? > > $ du -m workdir/unxlngi6.pro | tail -n 1 > 2906 unxlngi6.pro > $ du -m workdir/unxlngi6.pro/Dep | tail -n 1 > 1546 unxlngi6.pro/Dep > > For me that reads > 50% of the size. Indeed. I looked at a -3-5 build with is much different. I couldnt get a master build as it is broken all day today. > Is my workdir abnormally different to yours ? it's normal to have > > 1.5Gb of 'stuff' in there I think - at least someone I sanity checked > with saw that too - IMHO it's too big. see above. ;) Best, Bjoern _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Bjoern Michaelsen |
|
|
On Mon, Jun 04, 2012 at 11:35:44PM +0200, Bjoern Michaelsen wrote:
> On Mon, Jun 04, 2012 at 06:53:56PM +0100, Michael Meeks wrote: > > Sigh; true - it'll mostly speedup build time then; then again we could > > filter them during / after generation I suppose. > > But thats not the big part - as a: > > du -h $(WORKDIR)/Dep/* > > will confirm -- CxxObjects is rather large ... Im beginning to like the idea of > gzipping those. 1.2G workdir/unxlngx6.pro/Dep/ of a: 5.4G . total. Yeah, thats a bit much. Theres: 644M workdir/unxlngx6.pro/Dep/CxxObject which can likely be compressed without much performance impact, giving us back some ~500MB with gzip. Might be worth a try. Best, Bjoern _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Bjoern Michaelsen |
|
|
In reply to this post by Rob Snelders-2
On Mon, Jun 04, 2012 at 08:15:19PM +0200, Rob Snelders wrote:
> I want to add that the buildspeed and disksize does matter. Not > everybody who wants to help can afford the newest machines. Yes, but in this case buildspeed and disksize are likely a tradeoff. Making the build smaller will likely make it slower. Best, Bjoern _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Noel Grandin |
|
|
On 2012-06-05 11:14, Bjoern Michaelsen wrote: > Yes, but in this case buildspeed and disksize are likely a tradeoff. > Making the build smaller will likely make it slower. Not necessarily. On rotating disks, it can be cheaper to compress because it results in less IO. Disclaimer: http://www.peralex.com/disclaimer.html _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Norbert Thiebaud |
|
|
In reply to this post by Bjoern Michaelsen
On Tue, Jun 5, 2012 at 4:13 AM, Bjoern Michaelsen
<[hidden email]> wrote: > Finished a master build: > > 1.2G workdir/unxlngx6.pro/Dep/ > > of a: > > 5.4G . > > total. Yeah, thats a bit much. Theres: > > 644M workdir/unxlngx6.pro/Dep/CxxObject > > which can likely be compressed without much performance impact, giving us back > some ~500MB with gzip. Might be worth a try. talking about that I've just notice that --disable-dependency-tracking was still building .d for c/cxx files I pushed a fix... Norbert _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Michael Meeks-2 |
|
|
In reply to this post by Bjoern Michaelsen
On Tue, 2012-06-05 at 11:13 +0200, Bjoern Michaelsen wrote: > On Mon, Jun 04, 2012 at 11:35:44PM +0200, Bjoern Michaelsen wrote: > > will confirm -- CxxObjects is rather large ... Im beginning to like the idea of > > gzipping those. :-) > Finished a master build: > 1.2G workdir/unxlngx6.pro/Dep/ Modulo compressing the CxxObject deps, I'm still trying to prototype some more speedups for the concatenated / big-library dependencies with a few local patches to concat-deps.pl [ the hyper-optimised C version is not as easy to casually hack ]. Anyhow - doing some more analysis; I see: 4823735 - lines of LinkTarget/ deps 1598180 - lines containing /boost/ 1178594 - offapi 608585 - udkapi So it seems the next big low-hanging fruit after boost is the IDL compilation. Having got down to 3.2 million lines - having another 1.7million (over 50%) of the dependency lines being compiled IDL files is slightly amazing (at least to me). I wonder where they all come from and why. Anyhow - one obvious oddness reading the code is the .hdl and .hpp duplication; my question is: $ grep -R 'api/.*\.hpp' * | wc -l 892769 $ grep -R 'api/.*\.hdl' * | wc -l 899179 So a few questions: if a .idl file is changed is there any circumstance where the .hpp and .hdl will not both be updated in lock-step. Also - do all IDL generated .hpp deps ultimately include all the .hdl files as well ? ie. we could simply elide \.hdl$ from all dependency files ? - which might knock 25% off our deps at a very trivial stroke. Then again a small number of files include .hdl files directly; is that a bug ? :-) Thanks, Michael. -- [hidden email] <><, Pseudo Engineer, itinerant idiot _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Stephan Bergmann-2 |
|
|
On 06/20/2012 05:15 PM, Michael Meeks wrote:
> if a .idl file is changed is there any circumstance where the .hpp > and .hdl will not both be updated in lock-step. IIRC, the logic of cppumaker is to first generate temporary .hpp and .hdl files and only copy them over existing ones in solver if they are actually different. That way, it should easily happen that a .hdl file is updated while the corresponding .hpp file is not. (For example, when a method is added to an interface and cppumaker is not running in "comprehensive type information" mode; or when a bug in cppumaker is fixed affecting .hdl but not .hpp output). > Also - do all IDL generated .hpp deps ultimately include all the .hdl Each .hpp file includes the corresponding .hdl file (if any), yes. > files as well ? ie. we could simply elide \.hdl$ from all dependency > files ? - which might knock 25% off our deps at a very trivial stroke. That would be unsound (see above). > Then again a small number of files include .hdl files directly; is that > a bug ? :-) Looks like a bug, yes. Care to share the list? Stephan _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
| Powered by Nabble | Edit this page |