|
Johannes Sixt |
|
|
I want to place a software manual under source control. It seems most
feasible to use a flat XML format, in particular, .fodt. But I have some difficulties because when LO 3.5.4 opens a .fodt and saves it again without making any changes, the resulting file changes nevertheless. I'm writing a small tool that transforms the XML into a canonical format so that only substantial changes remain. The question is: Which transformations are allowed? - I bring the styles under <office:automatic-styles> into a canonical order. Do styles in this section only reference style from <office:styles> section (e.g. via style:parent-style-name), which occurs earlier in the file? - I give the automatic style canonical names because due to the re-ordering they are re-numbered, which leads to a whealth of unwanted changes in <text:span style-name=...> attributes. (This seems to work so far.) But there are other changes: - <office:meta> changes. It's not a problem, I don't care about this. - <office:settings> changes. I don't know, yet, whether I mind or not. - The <draw:frame draw:z-index="251"> attribute changes. Can I just replace the z-index with 1 or 2? What will happen? - The <text:list xml:id="list533178598"> changes. That xml:id does not seem to be used anywhere. Can I just remove it? What will I lose? - Measurements change. E.g. (just to pick one case), in <style:graphic-properties> the draw:visible-area-width changes from 6.088cm to 6.089cm. Is there a remedy to avoid changes of this kind? Any insights are welcome! Thanks, -- Hannes _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Michael Meeks-2 |
|
|
Hi Johannes,
On Sun, 2012-06-17 at 22:10 +0200, Johannes Sixt wrote: > I want to place a software manual under source control. It seems most > feasible to use a flat XML format, in particular, .fodt. Yes - that's a good plan :-) > But I have some difficulties because when LO 3.5.4 opens a .fodt and > saves it again without making any changes, the resulting file changes > nevertheless. Right - this is a regular annoyance ! :-) > I'm writing a small tool that transforms the XML into a canonical format > so that only substantial changes remain. The question is: Which > transformations are allowed? Oh - so ... why write an external tool to do this, and not just fix it in LibreOffice ! ? :-) We'd be -very- interested in some patches that we can apply that will sort the automatic styles, and generate them with consistent naming in a sensible order :-) > (This seems to work so far.) The style rendering sounds sensible. > But there are other changes: > > - <office:meta> changes. It's not a problem, I don't care about this. Some level of sorting here might help too. > - <office:settings> changes. I don't know, yet, whether I mind or not. > > - The <draw:frame draw:z-index="251"> attribute changes. Can I just > replace the z-index with 1 or 2? What will happen? Odd :-) perhaps when we have smaller changes we can chase these oddnesses down better. > - The <text:list xml:id="list533178598"> changes. That xml:id does not > seem to be used anywhere. Can I just remove it? What will I lose? No idea; if it's unused just try removing it and see what happens. > - Measurements change. E.g. (just to pick one case), in > <style:graphic-properties> the draw:visible-area-width changes from > 6.088cm to 6.089cm. Is there a remedy to avoid changes of this kind? Ah; nasty, some rounding problem / internal representation issue - possibly again looking at the code we could do better here to make it more predictable; possibly using more precision we could do better (doubles instead of floats) ? > Any insights are welcome! So - the best place to fix this stuff is inside LibreOffice itself :-) then it is permanently fixed for everyone: you are not the only problem with this pain - soon we'll be using flat odf for our templates and will suffer the same way :-) The code to poke at is in: xmloff/ and sw/source/filter/xml/ It's not too hard to build libreoffice, checkout: http://www.libreoffice.org/developers-2/ Patches are very much more than welcome ! :-) Thanks ! Michael. -- [hidden email] <><, Pseudo Engineer, itinerant idiot _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Johannes Sixt |
|
|
Michael,
thanks for your feedback! Am 19.06.2012 10:48, schrieb Michael Meeks: > On Sun, 2012-06-17 at 22:10 +0200, Johannes Sixt wrote: >> I'm writing a small tool that transforms the XML into a canonical format >> so that only substantial changes remain. The question is: Which >> transformations are allowed? > > Oh - so ... why write an external tool to do this, and not just fix it > in LibreOffice ! ? :-) Because I'm using git, and then it's just a matter of a "simple" 'clean filter'. :-) >> - <office:meta> changes. It's not a problem, I don't care about this. > > Some level of sorting here might help too. Not only that. Most of the stuff is irrelevant (diverse counts, editing duration, time of last edit). That should just be removed if the document is placed under source control. Such stuff leads to merge conflicts almost by definition. (And, BTW, to be able to keep different modifications of the manual in different branches and *merge* them again is the whole point of this excercise.) >> - <office:settings> changes. I don't know, yet, whether I mind or not. I'll try removing this entire section and hope that LO does something sensible. >> - The <text:list xml:id="list533178598"> changes. That xml:id does not >> seem to be used anywhere. Can I just remove it? What will I lose? > > No idea; if it's unused just try removing it and see what happens. The ids are sometimes used in a text:continue-list attribute. Hence, they can't be stripped out blindly. >> - Measurements change. E.g. (just to pick one case), in >> <style:graphic-properties> the draw:visible-area-width changes from >> 6.088cm to 6.089cm. Is there a remedy to avoid changes of this kind? > > Ah; nasty, some rounding problem / internal representation issue - > possibly again looking at the code we could do better here to make it > more predictable; possibly using more precision we could do better > (doubles instead of floats) ? Probably. Looking at this again, these changes seem to happen only for draw:visible-area-*. Hence, it may also be a matter of conversion between screen dimensions (pixels?) and cm/mm/in/etc. > So - the best place to fix this stuff is inside LibreOffice itself :-) > then it is permanently fixed for everyone: you are not the only problem > with this pain - soon we'll be using flat odf for our templates and will > suffer the same way :-) > > The code to poke at is in: > > xmloff/ > and > sw/source/filter/xml/ Been there, done that. But it's way over my head (and time budget). See http://thread.gmane.org/gmane.comp.documentfoundation.libreoffice.devel/23528/focus=23543 -- Hannes _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Miklos Vajna-2 |
|
|
On Tue, Jun 19, 2012 at 07:56:08PM +0200, Johannes Sixt <[hidden email]> wrote:
> > The code to poke at is in: > > > > xmloff/ > > and > > sw/source/filter/xml/ > > Been there, done that. But it's way over my head (and time budget). See > > http://thread.gmane.org/gmane.comp.documentfoundation.libreoffice.devel/23528/focus=23543 Still, once you have such a "clean" script it would be nice to see what tricks does it do, so we could (step by step) fix LO itself; in the long term then you would not need such a filter. ;-) _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Thorsten Behrens |
|
|
In reply to this post by Johannes Sixt
Johannes Sixt wrote:
> >> - Measurements change. E.g. (just to pick one case), in > >> <style:graphic-properties> the draw:visible-area-width changes from > >> 6.088cm to 6.089cm. Is there a remedy to avoid changes of this kind? > > > > Ah; nasty, some rounding problem / internal representation issue - > > possibly again looking at the code we could do better here to make it > > more predictable; possibly using more precision we could do better > > (doubles instead of floats) ? > > Probably. Looking at this again, these changes seem to happen only for > draw:visible-area-*. Hence, it may also be a matter of conversion > between screen dimensions (pixels?) and cm/mm/in/etc. > Thanks, -- Thorsten _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Dennis E. Hamilton |
|
|
I think it is necessary to look at round-trip out-in conversion preservation.
For out-in (which this is, presumably), you want to record a decimal expression of the internal value that will convert back to the exact internal value on re-input. (The in-out case is that the input conversion provide whatever internal representation that will convert to the read value on re-output. Without additional information, it is generally very difficult to have these be the same.) It is also desirable, of course, that any other ODF consumer use the same technique so that its in-out conversion satisfies the out-in condition of the original source of the decimal expression of the value. There are old technical papers on how to have this work. The name David Matula comes to mind. There might be solutions in the conversions that exist in the basic Java classes for float data types. I think this was addressed in Common Lisp also. -----Original Message----- From: libreoffice-bounces+dennis.hamilton=[hidden email] [mailto:libreoffice-bounces+dennis.hamilton=[hidden email]] On Behalf Of Thorsten Behrens Sent: Wednesday, June 20, 2012 05:49 To: Johannes Sixt Cc: libreoffice-dev Subject: Re: Difficulties with Flat XML under source control Johannes Sixt wrote: > >> - Measurements change. E.g. (just to pick one case), in > >> <style:graphic-properties> the draw:visible-area-width changes from > >> 6.088cm to 6.089cm. Is there a remedy to avoid changes of this kind? > > > > Ah; nasty, some rounding problem / internal representation issue - > > possibly again looking at the code we could do better here to make it > > more predictable; possibly using more precision we could do better > > (doubles instead of floats) ? > > Probably. Looking at this again, these changes seem to happen only for > draw:visible-area-*. Hence, it may also be a matter of conversion > between screen dimensions (pixels?) and cm/mm/in/etc. > you can file a bug with a preferrably small sample doc? Thanks, -- Thorsten _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Thorsten Behrens |
|
|
Dennis E. Hamilton wrote:
> For out-in (which this is, presumably), you want to record a > decimal expression of the internal value that will convert back to > the exact internal value on re-input. (The in-out case is that > the input conversion provide whatever internal representation that > will convert to the read value on re-output. Without additional > information, it is generally very difficult to have these be the > same.) > > It is also desirable, of course, that any other ODF consumer use > the same technique so that its in-out conversion satisfies the > out-in condition of the original source of the decimal expression > of the value. > yes - but in a first approximation, one can probably relax this a bit (for the use case at hand): only _after_ the first save operation this needs to hold. Also, most people would probably be contempt with this to work for *one* ODF editing application. > It is also desirable, of course, that any other ODF consumer use > the same technique so that its in-out conversion satisfies the > out-in condition of the original source of the decimal expression > of the value. > Note that there's a difference between spreadsheet values (for which I think de facto the above holds true - likely everyone stores those in IEEE doubles), and other content: consumers might employ rather complex transformations to arrive at internal values, given e.g. a gradient center coordinate - asking for common behaviour is very close to asking for a common ODF application model. Cheers, -- Thorsten _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Dennis E. Hamilton |
|
|
It occurs to me that Postscript and PDF have dealt with this for imaging models that work consistently. Here, the "in" is to a renderer, but the model for representation of decimal expressions of find-sensitivity values seems to have been handled (for years). Those specifications may be some help too.
- Dennis -----Original Message----- From: Thorsten [mailto:[hidden email]] On Behalf Of Thorsten Behrens Sent: Wednesday, June 20, 2012 06:32 To: Dennis E. Hamilton Cc: 'libreoffice-dev' Subject: Re: Difficulties with Flat XML under source control Dennis E. Hamilton wrote: > For out-in (which this is, presumably), you want to record a > decimal expression of the internal value that will convert back to > the exact internal value on re-input. (The in-out case is that > the input conversion provide whatever internal representation that > will convert to the read value on re-output. Without additional > information, it is generally very difficult to have these be the > same.) > > It is also desirable, of course, that any other ODF consumer use > the same technique so that its in-out conversion satisfies the > out-in condition of the original source of the decimal expression > of the value. > yes - but in a first approximation, one can probably relax this a bit (for the use case at hand): only _after_ the first save operation this needs to hold. Also, most people would probably be contempt with this to work for *one* ODF editing application. > It is also desirable, of course, that any other ODF consumer use > the same technique so that its in-out conversion satisfies the > out-in condition of the original source of the decimal expression > of the value. > Note that there's a difference between spreadsheet values (for which I think de facto the above holds true - likely everyone stores those in IEEE doubles), and other content: consumers might employ rather complex transformations to arrive at internal values, given e.g. a gradient center coordinate - asking for common behaviour is very close to asking for a common ODF application model. Cheers, -- Thorsten _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Stephan Bergmann-2 |
|
|
In reply to this post by Dennis E. Hamilton
On 06/20/2012 03:07 PM, Dennis E. Hamilton wrote:
> I think it is necessary to look at round-trip out-in conversion preservation. > > For out-in (which this is, presumably), you want to record a decimal expression of the internal value that will convert back to the exact internal value on re-input. (The in-out case is that the input conversion provide whatever internal representation that will convert to the read value on re-output. Without additional information, it is generally very difficult to have these be the same.) > > It is also desirable, of course, that any other ODF consumer use the same technique so that its in-out conversion satisfies the out-in condition of the original source of the decimal expression of the value. > > There are old technical papers on how to have this work. The name David Matula comes to mind. > > There might be solutions in the conversions that exist in the basic Java classes for float data types. I think this was addressed in Common Lisp also. Hasn't there been progress in that field recently? Wait, yes, <http://dl.acm.org/citation.cfm?id=1806623> "Printing floating-point numbers quickly and accurately with integers" by Florian Loitsch. Stephan _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Thorsten Behrens |
|
|
Stephan Bergmann wrote:
> Hasn't there been progress in that field recently? Wait, yes, > <http://dl.acm.org/citation.cfm?id=1806623> "Printing floating-point > numbers quickly and accurately with integers" by Florian Loitsch. > Nice catch - and some code is here: http://code.google.com/p/double-conversion/ Cheers, -- Thorsten _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Michael Stahl-2 |
|
|
In reply to this post by Stephan Bergmann-2
On 21/06/12 14:07, Stephan Bergmann wrote:
> On 06/20/2012 03:07 PM, Dennis E. Hamilton wrote: >> I think it is necessary to look at round-trip out-in conversion preservation. >> >> For out-in (which this is, presumably), you want to record a decimal expression of the internal value that will convert back to the exact internal value on re-input. (The in-out case is that the input conversion provide whatever internal representation that will convert to the read value on re-output. Without additional information, it is generally very difficult to have these be the same.) >> >> It is also desirable, of course, that any other ODF consumer use the same technique so that its in-out conversion satisfies the out-in condition of the original source of the decimal expression of the value. >> >> There are old technical papers on how to have this work. The name David Matula comes to mind. >> >> There might be solutions in the conversions that exist in the basic Java classes for float data types. I think this was addressed in Common Lisp also. > > Hasn't there been progress in that field recently? Wait, yes, > <http://dl.acm.org/citation.cfm?id=1806623> "Printing floating-point > numbers quickly and accurately with integers" by Florian Loitsch. i am in awe that it's possible to get a paper on this topic published in this day and age; one would think this kind of problem would have been solved 30 years ago, and the developers of popular office suites were just ignorant of the solutions :) _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Michael Stahl-2 |
|
|
In reply to this post by Johannes Sixt
On 17/06/12 22:10, Johannes Sixt wrote:
> - The <text:list xml:id="list533178598"> changes. That xml:id does not > seem to be used anywhere. Can I just remove it? What will I lose? these are sadly auto-generated, which is a bug in itself; they are used in ODF itself for continuations, i.e. there can be another list that continues an existing list by referring to its text:id/xml:id; then there is another use in ODF 1.2 where RDF metadata can refer to the element by its xml:id, but that only works if the xml:id is actually persistent, i.e. the same value that is imported is then exported again; making the ids persistent requires extending the Writer core, which is a bit of work... _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Johannes Sixt |
|
|
In reply to this post by Thorsten Behrens
Am 20.06.2012 14:48, schrieb Thorsten Behrens:
> Johannes Sixt wrote: >>>> - Measurements change. E.g. (just to pick one case), in >>>> <style:graphic-properties> the draw:visible-area-width changes from >>>> 6.088cm to 6.089cm. Is there a remedy to avoid changes of this kind? >>> >>> Ah; nasty, some rounding problem / internal representation issue - >>> possibly again looking at the code we could do better here to make it >>> more predictable; possibly using more precision we could do better >>> (doubles instead of floats) ? >> >> Probably. Looking at this again, these changes seem to happen only for >> draw:visible-area-*. Hence, it may also be a matter of conversion >> between screen dimensions (pixels?) and cm/mm/in/etc. >> > Hrm, yeah - and we *really* don't want this slow drift - any chance > you can file a bug with a preferrably small sample doc? Here we go: https://bugs.freedesktop.org/show_bug.cgi?id=51334 draw:visible-area-width and -height are properties that pertain only to OLE objects, IIUC. -- Hannes _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
| Powered by Nabble | Edit this page |