|
Lionel Elie Mamane |
|
|
I'd appreciate some help on fdo#51239 which is of the form "doing this
is horribly slow"; how can I profile LibO, i.e. find out where it spends all its time; if it spends a lot of time in function FOO, can I also get information on what the call stack looks like "most of the time"? Either someone teaches me how to do it (keep in mind the "time spent" might be in C++ or Java code), or someone just does it and reports findings in the bug? Thanks in advance, -- Lionel _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Lubos Lunak |
|
|
On Wednesday 11 of July 2012, Lionel Elie Mamane wrote:
> I'd appreciate some help on fdo#51239 which is of the form "doing this > is horribly slow"; how can I profile LibO, i.e. find out where it > spends all its time; if it spends a lot of time in function FOO, can I > also get information on what the call stack looks like "most of the > time"? Have a look at Callgrind (http://valgrind.org/docs/manual/cl-manual.html). I find it rather straightforward to use (in short: valgrind --tool=callgrind <app>; kcachegrind callgrind.out.<number> and keep clicking on functions in the 'Callees' tab), especially with the manual explaining things, but if you run into problems or advanced topics, feel free to ask. > Either someone teaches me how to do it (keep in mind the "time spent" > might be in C++ or Java code), or someone just does it and reports > findings in the bug? I have no idea how Callgrind handles Java. -- Lubos Lunak [hidden email] _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Noel Grandin |
|
|
In reply to this post by Lionel Elie Mamane
try googling for "profiling C++" or "profiling on <insert your operating
system here>" On 2012-07-11 11:48, Lionel Elie Mamane wrote: > I'd appreciate some help on fdo#51239 which is of the form "doing this > is horribly slow"; how can I profile LibO, i.e. find out where it > spends all its time; if it spends a lot of time in function FOO, can I > also get information on what the call stack looks like "most of the > time"? > > Either someone teaches me how to do it (keep in mind the "time spent" > might be in C++ or Java code), or someone just does it and reports > findings in the bug? > > > Thanks in advance, > Disclaimer: http://www.peralex.com/disclaimer.html _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Lionel Elie Mamane |
|
|
In reply to this post by Lubos Lunak
On Wed, Jul 11, 2012 at 12:10:09PM +0200, Lubos Lunak wrote:
> On Wednesday 11 of July 2012, Lionel Elie Mamane wrote: >> I'd appreciate some help on fdo#51239 which is of the form "doing this >> is horribly slow"; how can I profile LibO, i.e. find out where it >> spends all its time; if it spends a lot of time in function FOO, can I >> also get information on what the call stack looks like "most of the >> time"? > Have a look at Callgrind (http://valgrind.org/docs/manual/cl-manual.html). Thanks for the pointer; it did not handle Java code in a meaningful way and it did not follow costs across threads, but it pointed me to the C++ code calling the long-running Java code, which was started in a fresh thread, but I successfully "guessed" what created this fresh thread, and so I got started. So the issue is resolved in 3.6 & master; 3.5 will take more effort to backport. -- Lionel _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Stephan Bergmann-2 |
|
|
On 07/18/2012 10:06 AM, Lionel Elie Mamane wrote:
> The issue *originally* reported in fdo#51239, which has been bumped to > fdo#52170: in LibreOffice 3.5.2, much slower than "before", on > GNU/Linux, but NOT on Windows. Unknown on Mac. Not entirely clear what > "before" it is. *This* issue *could* be JVM-thread-attach/detach > related, or some other "interaction between Java and C++" problem. My > "hunch" in this direction is because it does not happen on MS Windows. > So if you could look into it, even if only to rule it out, that would > be useful. Looked into it on x64 Linux master, against both an 1.7.0 openjdk (as included in Fedora 17) and a jre-6u33-linux-x64 downloaded from Oracle. In either case, it does take significantly to open the Sales Invoices table (from <https://bugs.freedesktop.org/attachment.cgi?id=63257>), and then it takes ages to jump to the end. However, I could not find anything obviously suspicious at the JVM level. I locally reverted <http://cgit.freedesktop.org/libreoffice/core/commit/?id=bb59742bcf4883af5876a2ffadcc4a689e414b60> "Confine JDBC driver to thread-affine apartment for Java 6 performance" and then everything became even more slow, so it apparently is not the case that the effect of that fix has meanwhile regressed again. I noticed that there are truckloads of calls to m_pVm->AttachCurrentThread from jvmaccess::VirtualMachine::attachThread (jvmaccess/source/virtualmachine.cxx). Such "outermost" calls into JNI (i.e., not recursively from a thread that is already in a JNI call, where jvmaccess::VirtualMachine::attachThread would be cheap) are always expensive (and the changes to the JVM that we tried to address with bb59742bcf4883af5876a2ffadcc4a689e414b60 only made them extra-extra expensive, when they already had been expensive). The LO database code is notorious for making truckloads of such expensive outermost JNI calls, but I do not know if this has increased substantially recently and so alone could explain any recent performance regressions. Stephan _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Lionel Elie Mamane |
|
|
On Thu, Jul 19, 2012 at 02:12:30PM +0200, Stephan Bergmann wrote:
> On 07/18/2012 10:06 AM, Lionel Elie Mamane wrote: >> The issue *originally* reported in fdo#51239, which has been bumped to >> fdo#52170: in LibreOffice 3.5.2, much slower than "before", on >> GNU/Linux, but NOT on Windows. Unknown on Mac. Not entirely clear what >> "before" it is. *This* issue *could* be JVM-thread-attach/detach >> related, or some other "interaction between Java and C++" problem. My >> "hunch" in this direction is because it does not happen on MS Windows. >> So if you could look into it, even if only to rule it out, that would >> be useful. > Looked into it on x64 Linux master (...) Thanks. > it does take significantly to open the Sales Invoices table (from > <https://bugs.freedesktop.org/attachment.cgi?id=63257>), and then it > takes ages to jump to the end. However, I could not find anything > obviously suspicious at the JVM level. > I noticed that there are truckloads of calls to > m_pVm->AttachCurrentThread from > jvmaccess::VirtualMachine::attachThread > (jvmaccess/source/virtualmachine.cxx). Such "outermost" calls into > JNI (i.e., not recursively from a thread that is already in a JNI > call, where jvmaccess::VirtualMachine::attachThread would be cheap) > are always expensive (...). The LO database code is notorious for > making truckloads of such expensive outermost JNI calls, (...) Try to give me an idea what would cause such calls. Is it Java code calling C(++) code (presumably through UNO) or C(++) code calling Java code (again presumably through UNO)? For example, to minimise the number of Java calls from C++ , maybe we could, when we retrieve a whole row through JDBC, doing this from Java code instead of from C++ code: for (i=0; i < row->getNumberOfColumns; ++i) cacheEntry[i]=row->getColumnValue(i); This would bring us down from one call per column to one call per row, hopefully without duplicating *too* *much* logic between a C++ version and a Java version. Or maybe if we know that we are about to make a truckload of Java calls from C++, wrap it in a Java trampoline that just calls the C++ function that will make many Java calls? So that we are in the "recursive" case? Hmm... that could be hackish, since it would involve generic code looking at whether it is connecting to the database engine through JDBC (Java) or some other way. A breakpoint at jvmaccess::VirtualMachine::attachThread in gdb does not seem to be immediately useful; the backtrace does not show me where that "comes from". Is there a way to see that? -- Lionel _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Stephan Bergmann-2 |
|
|
On 07/19/2012 02:38 PM, Lionel Elie Mamane wrote:
> On Thu, Jul 19, 2012 at 02:12:30PM +0200, Stephan Bergmann wrote: >> On 07/18/2012 10:06 AM, Lionel Elie Mamane wrote: >>> The issue *originally* reported in fdo#51239, which has been bumped to >>> fdo#52170: in LibreOffice 3.5.2, much slower than "before", on >>> GNU/Linux, but NOT on Windows. Unknown on Mac. Not entirely clear what >>> "before" it is. *This* issue *could* be JVM-thread-attach/detach >>> related, or some other "interaction between Java and C++" problem. My >>> "hunch" in this direction is because it does not happen on MS Windows. >>> So if you could look into it, even if only to rule it out, that would >>> be useful. > >> I noticed that there are truckloads of calls to >> m_pVm->AttachCurrentThread from >> jvmaccess::VirtualMachine::attachThread >> (jvmaccess/source/virtualmachine.cxx). Such "outermost" calls into >> JNI (i.e., not recursively from a thread that is already in a JNI >> call, where jvmaccess::VirtualMachine::attachThread would be cheap) >> are always expensive (...). The LO database code is notorious for >> making truckloads of such expensive outermost JNI calls, (...) > > Try to give me an idea what would cause such calls. Is it Java code > calling C(++) code (presumably through UNO) or C(++) code calling Java > code (again presumably through UNO)? C++ code in connectivity and/or dbacess calling Java, not through UNO but directly talking JNI (though going through UNO would not actually make a difference to the performance implications). Grepping for jvmaccess::VirtualMachine::AttachGuard in connectivity and dbaccess should show the places where this is done. > For example, to minimise the number of Java calls from C++ , maybe we > could, when we retrieve a whole row through JDBC, doing this from Java > code instead of from C++ code: > > for (i=0; i < row->getNumberOfColumns; ++i) > cacheEntry[i]=row->getColumnValue(i); > > This would bring us down from one call per column to one call per row, > hopefully without duplicating *too* *much* logic between a C++ version > and a Java version. Yes, this could be helpful. (When I discussed this with Ocke years ago, he always claimed there it was not possible to redesign the connectivity/dbaccess code to reduce the number of such "outermost" JNI calls, though.) > A breakpoint at jvmaccess::VirtualMachine::attachThread in gdb does > not seem to be immediately useful; the backtrace does not show me > where that "comes from". Is there a way to see that? That might be due to <http://cgit.freedesktop.org/libreoffice/core/commit/?id=bb59742bcf4883af5876a2ffadcc4a689e414b60> "Confine JDBC driver to thread-affine apartment for Java 6 performance" offloading part of the code off the main thread into a dedicated thread. You can try reverting that locally (at the expense of even slower performance). Stephan _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Lionel Elie Mamane |
|
|
On Thu, Jul 19, 2012 at 02:53:35PM +0200, Stephan Bergmann wrote:
> On 07/19/2012 02:38 PM, Lionel Elie Mamane wrote: >> On Thu, Jul 19, 2012 at 02:12:30PM +0200, Stephan Bergmann wrote: >>> On 07/18/2012 10:06 AM, Lionel Elie Mamane wrote: >>> I noticed that there are truckloads of calls to >>> m_pVm->AttachCurrentThread from >>> jvmaccess::VirtualMachine::attachThread >>> (jvmaccess/source/virtualmachine.cxx). Such "outermost" calls into >>> JNI (i.e., not recursively from a thread that is already in a JNI >>> call, where jvmaccess::VirtualMachine::attachThread would be cheap) >>> are always expensive (...). The LO database code is notorious for >>> making truckloads of such expensive outermost JNI calls, (...) >> Try to give me an idea what would cause such calls. Is it Java code >> calling C(++) code (presumably through UNO) or C(++) code calling Java >> code (again presumably through UNO)? > Grepping for jvmaccess::VirtualMachine::AttachGuard in connectivity > and dbaccess should show the places where this is done. It is wrapped in a SDBThreadAttach class, and this grep yields the expected places. Good, I see now. >> For example, to minimise the number of Java calls from C++ , maybe we >> could, when we retrieve a whole row through JDBC, doing this from Java >> code instead of from C++ code: >> for (i=0; i < row->getNumberOfColumns; ++i) >> cacheEntry[i]=row->getColumnValue(i); >> This would bring us down from one call per column to one call per row, >> hopefully without duplicating *too* *much* logic between a C++ version >> and a Java version. > Yes, this could be helpful. (When I discussed this with Ocke years > ago, he always claimed there it was not possible to redesign the > connectivity/dbaccess code to reduce the number of such "outermost" > JNI calls, though.) Well, I haven't thought of a clean way to do it yet; several approaches exist. I'll have to sleep on it. Listing them here mostly not to forget them. One idea is to extend our SDBC API (which is basically a C++ version of JDBC, which is an extension of ODBC, which is an extension of X/Open SQL CLI, ...); it currently has only "retrieve one column value"; we'd add an *optional* "get a whole row". If the optional extension is not implemented, the code would fallback to a C++ implementation of "get a whole row" that makes multiple calls to "retrieve one column value". Replace source/drivers/jdbc/JStatement.cxx (and friends), which *are* called through UNO, by a full-Java implementation; ah no, that wouldn't help, since each UNO call would pass the expensive C++/Java barrier :-| Change source/drivers/jdbc/JStatement.cxx (and friends) to retrieve the whole row (through an ad-hoc Java function) and cache it? Which would make us a *third* level of caching, but well... Maybe it can cooperate with KeySet so that KeySet does not cache again in this case. Hmmm... the caching nature of KeySet was introduced because of ODBC limitations; maybe we could push that to drivers (through a helper wrapper class they can use to provide this): - the SDBC-ODBC layer would use the helper class "as is"; since it is to blame for the limitations that introduce the need for caching... Only it would suffer from the consequences. - the SDBC-JDBC layer would have a special implementation that fetches the whole row in one ad-hoc Java function call. I'm fond of my trampoline idea, but 't would have to be done in a clean way; not sure yet how. -- Lionel _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Michael Meeks-2 |
|
|
On Thu, 2012-07-19 at 15:47 +0200, Lionel Elie Mamane wrote: > I'm fond of my trampoline idea, but 't would have to be done in a > clean way; not sure yet how. Heh ;-) I'm still holding my breath for the end-of-problem Firebird solution myself ... I wonder if there are some sillies that we could tighten up there though - eg. there could be random ref/unref thrash in the Java / bridge traffic that might be easy to reduce. In theory there is this beautiful built-in UNO tracing logic that I've never used - that can show us all the calls that happen between various pieces - Stephan: could that get us a trace of the back-to-back cross-language thrash ? Presumably just looking at that would give us some immediate inspiration :-) ATB, Michael. -- [hidden email] <><, Pseudo Engineer, itinerant idiot _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Stephan Bergmann-2 |
|
|
On 07/19/2012 06:03 PM, Michael Meeks wrote:
> On Thu, 2012-07-19 at 15:47 +0200, Lionel Elie Mamane wrote: >> I'm fond of my trampoline idea, but 't would have to be done in a >> clean way; not sure yet how. > > Heh ;-) I'm still holding my breath for the end-of-problem Firebird > solution myself ... > > I wonder if there are some sillies that we could tighten up there > though - eg. there could be random ref/unref thrash in the Java / bridge > traffic that might be easy to reduce. In theory there is this beautiful > built-in UNO tracing logic that I've never used - that can show us all > the calls that happen between various pieces - Stephan: could that get > us a trace of the back-to-back cross-language thrash ? This is about direct use of JNI, not via UNO. Stephan _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Lionel Elie Mamane |
|
|
In reply to this post by Michael Meeks-2
On Thu, Jul 19, 2012 at 05:03:32PM +0100, Michael Meeks wrote:
> On Thu, 2012-07-19 at 15:47 +0200, Lionel Elie Mamane wrote: >> I'm fond of my trampoline idea, but 't would have to be done in a >> clean way; not sure yet how. > Heh ;-) I'm still holding my breath for the end-of-problem Firebird > solution myself ... Even if we switch to Firebird for our embedded database, we still want to support connecting to any database through any JDBC driver, so this question of heavy C++ <-> Java cost will still be on the table. -- Lionel _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Noel Grandin |
|
|
In reply to this post by Lionel Elie Mamane
On 2012-07-19 14:38, Lionel Elie Mamane wrote: > For example, to minimise the number of Java calls from C++ , maybe we > could, when we retrieve a whole row through JDBC, doing this from Java > code instead of from C++ code: > > for (i=0; i < row->getNumberOfColumns; ++i) > cacheEntry[i]=row->getColumnValue(i); > > This would bring us down from one call per column to one call per row, > hopefully without duplicating *too* *much* logic between a C++ version > and a Java version. > > For bonus points, retrieve a whole chunk of rows at a time, say around 1000. Disclaimer: http://www.peralex.com/disclaimer.html _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
|
Michael Meeks-2 |
|
|
In reply to this post by Stephan Bergmann-2
On Thu, 2012-07-19 at 18:09 +0200, Stephan Bergmann wrote: > This is about direct use of JNI, not via UNO. Ah - I wonder ... ;-) if it is the transition that is the problem; could we have a Java thread polling a loopback socket and sucking simple commands & parsing & executing them, then passing the data back in some other form ;-> It'd be hideous if that actually was quicker than a direct JNI call but ... ;-) it's possible. Particularly in the world of under-utilized multi-threaded CPUs it might not be -so- bad. Bit of a pain cross-platform, but might work for suitably chunky methods; I wonder - could we use the wonder binary-urp magic to prototype it [ though I guess we'd need a native Java implementation of that - hmm ]. Urgh ! Michael. -- [hidden email] <><, Pseudo Engineer, itinerant idiot _______________________________________________ LibreOffice mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/libreoffice |
| Powered by Nabble | Edit this page |