Quantcast

Difficulties with Flat XML under source control

classic Classic list List threaded Threaded
13 messages Options
Johannes Sixt Johannes Sixt
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Difficulties with Flat XML under source control

I want to place a software manual under source control. It seems most
feasible to use a flat XML format, in particular, .fodt.

But I have some difficulties because when LO 3.5.4 opens a .fodt and
saves it again without making any changes, the resulting file changes
nevertheless.

I'm writing a small tool that transforms the XML into a canonical format
so that only substantial changes remain. The question is: Which
transformations are allowed?

- I bring the styles under <office:automatic-styles> into a canonical
order. Do styles in this section only reference style from
<office:styles> section (e.g. via style:parent-style-name), which occurs
earlier in the file?

- I give the automatic style canonical names because due to the
re-ordering they are re-numbered, which leads to a whealth of unwanted
changes in <text:span style-name=...> attributes.

(This seems to work so far.)

But there are other changes:

- <office:meta> changes. It's not a problem, I don't care about this.

- <office:settings> changes. I don't know, yet, whether I mind or not.

- The <draw:frame draw:z-index="251"> attribute changes. Can I just
replace the z-index with 1 or 2? What will happen?

- The <text:list xml:id="list533178598"> changes. That xml:id does not
seem to be used anywhere. Can I just remove it? What will I lose?

- Measurements change. E.g. (just to pick one case), in
<style:graphic-properties> the draw:visible-area-width changes from
6.088cm to 6.089cm. Is there a remedy to avoid changes of this kind?

Any insights are welcome!

Thanks,
-- Hannes
_______________________________________________
LibreOffice mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/libreoffice
Michael Meeks-2 Michael Meeks-2
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Difficulties with Flat XML under source control

Hi Johannes,

On Sun, 2012-06-17 at 22:10 +0200, Johannes Sixt wrote:
> I want to place a software manual under source control. It seems most
> feasible to use a flat XML format, in particular, .fodt.

        Yes - that's a good plan :-)

> But I have some difficulties because when LO 3.5.4 opens a .fodt and
> saves it again without making any changes, the resulting file changes
> nevertheless.

        Right - this is a regular annoyance ! :-)

> I'm writing a small tool that transforms the XML into a canonical format
> so that only substantial changes remain. The question is: Which
> transformations are allowed?

        Oh - so ... why write an external tool to do this, and not just fix it
in LibreOffice ! ? :-)

        We'd be -very- interested in some patches that we can apply that will
sort the automatic styles, and generate them with consistent naming in a
sensible order :-)

> (This seems to work so far.)

        The style rendering sounds sensible.

> But there are other changes:
>
> - <office:meta> changes. It's not a problem, I don't care about this.

        Some level of sorting here might help too.

> - <office:settings> changes. I don't know, yet, whether I mind or not.
>
> - The <draw:frame draw:z-index="251"> attribute changes. Can I just
> replace the z-index with 1 or 2? What will happen?

        Odd :-) perhaps when we have smaller changes we can chase these
oddnesses down better.

> - The <text:list xml:id="list533178598"> changes. That xml:id does not
> seem to be used anywhere. Can I just remove it? What will I lose?

        No idea; if it's unused just try removing it and see what happens.

> - Measurements change. E.g. (just to pick one case), in
> <style:graphic-properties> the draw:visible-area-width changes from
> 6.088cm to 6.089cm. Is there a remedy to avoid changes of this kind?

        Ah; nasty, some rounding problem / internal representation issue -
possibly again looking at the code we could do better here to make it
more predictable; possibly using more precision we could do better
(doubles instead of floats) ?

> Any insights are welcome!

        So - the best place to fix this stuff is inside LibreOffice itself :-)
then it is permanently fixed for everyone: you are not the only problem
with this pain - soon we'll be using flat odf for our templates and will
suffer the same way :-)

        The code to poke at is in:

        xmloff/
and
        sw/source/filter/xml/

        It's not too hard to build libreoffice, checkout:

        http://www.libreoffice.org/developers-2/

        Patches are very much more than welcome ! :-)

        Thanks !

                Michael.

--
[hidden email]  <><, Pseudo Engineer, itinerant idiot

_______________________________________________
LibreOffice mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/libreoffice
Johannes Sixt Johannes Sixt
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Difficulties with Flat XML under source control

Michael,

thanks for your feedback!

Am 19.06.2012 10:48, schrieb Michael Meeks:
> On Sun, 2012-06-17 at 22:10 +0200, Johannes Sixt wrote:
>> I'm writing a small tool that transforms the XML into a canonical format
>> so that only substantial changes remain. The question is: Which
>> transformations are allowed?
>
> Oh - so ... why write an external tool to do this, and not just fix it
> in LibreOffice ! ? :-)

Because I'm using git, and then it's just a matter of a "simple" 'clean
filter'. :-)

>> - <office:meta> changes. It's not a problem, I don't care about this.
>
> Some level of sorting here might help too.

Not only that. Most of the stuff is irrelevant (diverse counts, editing
duration, time of last edit). That should just be removed if the
document is placed under source control. Such stuff leads to merge
conflicts almost by definition.

(And, BTW, to be able to keep different modifications of the manual in
different branches and *merge* them again is the whole point of this
excercise.)

>> - <office:settings> changes. I don't know, yet, whether I mind or not.

I'll try removing this entire section and hope that LO does something
sensible.

>> - The <text:list xml:id="list533178598"> changes. That xml:id does not
>> seem to be used anywhere. Can I just remove it? What will I lose?
>
> No idea; if it's unused just try removing it and see what happens.

The ids are sometimes used in a text:continue-list attribute. Hence,
they can't be stripped out blindly.

>> - Measurements change. E.g. (just to pick one case), in
>> <style:graphic-properties> the draw:visible-area-width changes from
>> 6.088cm to 6.089cm. Is there a remedy to avoid changes of this kind?
>
> Ah; nasty, some rounding problem / internal representation issue -
> possibly again looking at the code we could do better here to make it
> more predictable; possibly using more precision we could do better
> (doubles instead of floats) ?

Probably. Looking at this again, these changes seem to happen only for
draw:visible-area-*. Hence, it may also be a matter of conversion
between screen dimensions (pixels?) and cm/mm/in/etc.

> So - the best place to fix this stuff is inside LibreOffice itself :-)
> then it is permanently fixed for everyone: you are not the only problem
> with this pain - soon we'll be using flat odf for our templates and will
> suffer the same way :-)
>
> The code to poke at is in:
>
> xmloff/
> and
> sw/source/filter/xml/

Been there, done that. But it's way over my head (and time budget). See

http://thread.gmane.org/gmane.comp.documentfoundation.libreoffice.devel/23528/focus=23543

-- Hannes
_______________________________________________
LibreOffice mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/libreoffice
Miklos Vajna-2 Miklos Vajna-2
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Difficulties with Flat XML under source control

On Tue, Jun 19, 2012 at 07:56:08PM +0200, Johannes Sixt <[hidden email]> wrote:
> > The code to poke at is in:
> >
> > xmloff/
> > and
> > sw/source/filter/xml/
>
> Been there, done that. But it's way over my head (and time budget). See
>
> http://thread.gmane.org/gmane.comp.documentfoundation.libreoffice.devel/23528/focus=23543

Still, once you have such a "clean" script it would be nice to see what
tricks does it do, so we could (step by step) fix LO itself; in the long
term then you would not need such a filter. ;-)
_______________________________________________
LibreOffice mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/libreoffice
Thorsten Behrens Thorsten Behrens
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Difficulties with Flat XML under source control

In reply to this post by Johannes Sixt
Johannes Sixt wrote:

> >> - Measurements change. E.g. (just to pick one case), in
> >> <style:graphic-properties> the draw:visible-area-width changes from
> >> 6.088cm to 6.089cm. Is there a remedy to avoid changes of this kind?
> >
> > Ah; nasty, some rounding problem / internal representation issue -
> > possibly again looking at the code we could do better here to make it
> > more predictable; possibly using more precision we could do better
> > (doubles instead of floats) ?
>
> Probably. Looking at this again, these changes seem to happen only for
> draw:visible-area-*. Hence, it may also be a matter of conversion
> between screen dimensions (pixels?) and cm/mm/in/etc.
>
Hrm, yeah - and we *really* don't want this slow drift - any chance
you can file a bug with a preferrably small sample doc?

Thanks,

-- Thorsten

_______________________________________________
LibreOffice mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/libreoffice

attachment0 (205 bytes) Download Attachment
Dennis E. Hamilton Dennis E. Hamilton
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

RE: Difficulties with Flat XML under source control

I think it is necessary to look at round-trip out-in conversion preservation.

For out-in (which this is, presumably), you want to record a decimal expression of the internal value that will convert back to the exact internal value on re-input.  (The in-out case is that the input conversion provide whatever internal representation that will convert to the read value on re-output.  Without additional information, it is generally very difficult to have these be the same.)

It is also desirable, of course, that any other ODF consumer use the same technique so that its in-out conversion satisfies the out-in condition of the original source of the decimal expression of the value.  

There are old technical papers on how to have this work.  The name David Matula comes to mind.

There might be solutions in the conversions that exist in the basic Java classes for float data types.  I think this was addressed in Common Lisp also.  

-----Original Message-----
From: libreoffice-bounces+dennis.hamilton=[hidden email] [mailto:libreoffice-bounces+dennis.hamilton=[hidden email]] On Behalf Of Thorsten Behrens
Sent: Wednesday, June 20, 2012 05:49
To: Johannes Sixt
Cc: libreoffice-dev
Subject: Re: Difficulties with Flat XML under source control

Johannes Sixt wrote:

> >> - Measurements change. E.g. (just to pick one case), in
> >> <style:graphic-properties> the draw:visible-area-width changes from
> >> 6.088cm to 6.089cm. Is there a remedy to avoid changes of this kind?
> >
> > Ah; nasty, some rounding problem / internal representation issue -
> > possibly again looking at the code we could do better here to make it
> > more predictable; possibly using more precision we could do better
> > (doubles instead of floats) ?
>
> Probably. Looking at this again, these changes seem to happen only for
> draw:visible-area-*. Hence, it may also be a matter of conversion
> between screen dimensions (pixels?) and cm/mm/in/etc.
>
Hrm, yeah - and we *really* don't want this slow drift - any chance
you can file a bug with a preferrably small sample doc?

Thanks,

-- Thorsten

_______________________________________________
LibreOffice mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/libreoffice
Thorsten Behrens Thorsten Behrens
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Difficulties with Flat XML under source control

Dennis E. Hamilton wrote:

> For out-in (which this is, presumably), you want to record a
> decimal expression of the internal value that will convert back to
> the exact internal value on re-input.  (The in-out case is that
> the input conversion provide whatever internal representation that
> will convert to the read value on re-output.  Without additional
> information, it is generally very difficult to have these be the
> same.)
>
> It is also desirable, of course, that any other ODF consumer use
> the same technique so that its in-out conversion satisfies the
> out-in condition of the original source of the decimal expression
> of the value.  
>
Hi Dennis,

yes - but in a first approximation, one can probably relax this a
bit (for the use case at hand): only _after_ the first save
operation this needs to hold. Also, most people would probably be
contempt with this to work for *one* ODF editing application.

> It is also desirable, of course, that any other ODF consumer use
> the same technique so that its in-out conversion satisfies the
> out-in condition of the original source of the decimal expression
> of the value.  
>
Note that there's a difference between spreadsheet values (for which
I think de facto the above holds true - likely everyone stores those
in IEEE doubles), and other content: consumers might employ rather
complex transformations to arrive at internal values, given e.g. a
gradient center coordinate - asking for common behaviour is very
close to asking for a common ODF application model.

Cheers,

-- Thorsten

_______________________________________________
LibreOffice mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/libreoffice

attachment0 (205 bytes) Download Attachment
Dennis E. Hamilton Dennis E. Hamilton
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

RE: Difficulties with Flat XML under source control

It occurs to me that Postscript and PDF have dealt with this for imaging models that work consistently.  Here, the "in" is to a renderer, but the model for representation of decimal expressions of find-sensitivity values seems to have been handled (for years).  Those specifications may be some help too.

 - Dennis

-----Original Message-----
From: Thorsten [mailto:[hidden email]] On Behalf Of Thorsten Behrens
Sent: Wednesday, June 20, 2012 06:32
To: Dennis E. Hamilton
Cc: 'libreoffice-dev'
Subject: Re: Difficulties with Flat XML under source control

Dennis E. Hamilton wrote:

> For out-in (which this is, presumably), you want to record a
> decimal expression of the internal value that will convert back to
> the exact internal value on re-input.  (The in-out case is that
> the input conversion provide whatever internal representation that
> will convert to the read value on re-output.  Without additional
> information, it is generally very difficult to have these be the
> same.)
>
> It is also desirable, of course, that any other ODF consumer use
> the same technique so that its in-out conversion satisfies the
> out-in condition of the original source of the decimal expression
> of the value.  
>
Hi Dennis,

yes - but in a first approximation, one can probably relax this a
bit (for the use case at hand): only _after_ the first save
operation this needs to hold. Also, most people would probably be
contempt with this to work for *one* ODF editing application.

> It is also desirable, of course, that any other ODF consumer use
> the same technique so that its in-out conversion satisfies the
> out-in condition of the original source of the decimal expression
> of the value.  
>
Note that there's a difference between spreadsheet values (for which
I think de facto the above holds true - likely everyone stores those
in IEEE doubles), and other content: consumers might employ rather
complex transformations to arrive at internal values, given e.g. a
gradient center coordinate - asking for common behaviour is very
close to asking for a common ODF application model.

Cheers,

-- Thorsten

_______________________________________________
LibreOffice mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/libreoffice
Stephan Bergmann-2 Stephan Bergmann-2
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Difficulties with Flat XML under source control

In reply to this post by Dennis E. Hamilton
On 06/20/2012 03:07 PM, Dennis E. Hamilton wrote:
> I think it is necessary to look at round-trip out-in conversion preservation.
>
> For out-in (which this is, presumably), you want to record a decimal expression of the internal value that will convert back to the exact internal value on re-input.  (The in-out case is that the input conversion provide whatever internal representation that will convert to the read value on re-output.  Without additional information, it is generally very difficult to have these be the same.)
>
> It is also desirable, of course, that any other ODF consumer use the same technique so that its in-out conversion satisfies the out-in condition of the original source of the decimal expression of the value.
>
> There are old technical papers on how to have this work.  The name David Matula comes to mind.
>
> There might be solutions in the conversions that exist in the basic Java classes for float data types.  I think this was addressed in Common Lisp also.

Hasn't there been progress in that field recently?  Wait, yes,
<http://dl.acm.org/citation.cfm?id=1806623> "Printing floating-point
numbers quickly and accurately with integers" by Florian Loitsch.

Stephan
_______________________________________________
LibreOffice mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/libreoffice
Thorsten Behrens Thorsten Behrens
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Difficulties with Flat XML under source control

Stephan Bergmann wrote:
> Hasn't there been progress in that field recently?  Wait, yes,
> <http://dl.acm.org/citation.cfm?id=1806623> "Printing floating-point
> numbers quickly and accurately with integers" by Florian Loitsch.
>
Nice catch - and some code is here: http://code.google.com/p/double-conversion/

Cheers,

-- Thorsten

_______________________________________________
LibreOffice mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/libreoffice

attachment0 (205 bytes) Download Attachment
Michael Stahl-2 Michael Stahl-2
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Difficulties with Flat XML under source control

In reply to this post by Stephan Bergmann-2
On 21/06/12 14:07, Stephan Bergmann wrote:

> On 06/20/2012 03:07 PM, Dennis E. Hamilton wrote:
>> I think it is necessary to look at round-trip out-in conversion preservation.
>>
>> For out-in (which this is, presumably), you want to record a decimal expression of the internal value that will convert back to the exact internal value on re-input.  (The in-out case is that the input conversion provide whatever internal representation that will convert to the read value on re-output.  Without additional information, it is generally very difficult to have these be the same.)
>>
>> It is also desirable, of course, that any other ODF consumer use the same technique so that its in-out conversion satisfies the out-in condition of the original source of the decimal expression of the value.
>>
>> There are old technical papers on how to have this work.  The name David Matula comes to mind.
>>
>> There might be solutions in the conversions that exist in the basic Java classes for float data types.  I think this was addressed in Common Lisp also.
>
> Hasn't there been progress in that field recently?  Wait, yes,
> <http://dl.acm.org/citation.cfm?id=1806623> "Printing floating-point
> numbers quickly and accurately with integers" by Florian Loitsch.

i am in awe that it's possible to get a paper on this topic published in
this day and age; one would think this kind of problem would have been
solved 30 years ago, and the developers of popular office suites were
just ignorant of the solutions :)


_______________________________________________
LibreOffice mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/libreoffice
Michael Stahl-2 Michael Stahl-2
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Difficulties with Flat XML under source control

In reply to this post by Johannes Sixt
On 17/06/12 22:10, Johannes Sixt wrote:
> - The <text:list xml:id="list533178598"> changes. That xml:id does not
> seem to be used anywhere. Can I just remove it? What will I lose?

these are sadly auto-generated, which is a bug in itself; they are used
in ODF itself for continuations, i.e. there can be another list that
continues an existing list by referring to its text:id/xml:id;  then
there is another use in ODF 1.2 where RDF metadata can refer to the
element by its xml:id, but that only works if the xml:id is actually
persistent, i.e. the same value that is imported is then exported again;
making the ids persistent requires extending the Writer core, which is a
bit of work...

_______________________________________________
LibreOffice mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/libreoffice
Johannes Sixt Johannes Sixt
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Difficulties with Flat XML under source control

In reply to this post by Thorsten Behrens
Am 20.06.2012 14:48, schrieb Thorsten Behrens:

> Johannes Sixt wrote:
>>>> - Measurements change. E.g. (just to pick one case), in
>>>> <style:graphic-properties> the draw:visible-area-width changes from
>>>> 6.088cm to 6.089cm. Is there a remedy to avoid changes of this kind?
>>>
>>> Ah; nasty, some rounding problem / internal representation issue -
>>> possibly again looking at the code we could do better here to make it
>>> more predictable; possibly using more precision we could do better
>>> (doubles instead of floats) ?
>>
>> Probably. Looking at this again, these changes seem to happen only for
>> draw:visible-area-*. Hence, it may also be a matter of conversion
>> between screen dimensions (pixels?) and cm/mm/in/etc.
>>
> Hrm, yeah - and we *really* don't want this slow drift - any chance
> you can file a bug with a preferrably small sample doc?

Here we go:

https://bugs.freedesktop.org/show_bug.cgi?id=51334

draw:visible-area-width and -height are properties that pertain only to
OLE objects, IIUC.

-- Hannes
_______________________________________________
LibreOffice mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/libreoffice
Loading...