Quantcast

Collecting User Statistics

classic Classic list List threaded Threaded
7 messages Options
Jaskaran Singh Jaskaran Singh
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Collecting User Statistics

Hi,

Currently we collect user stats when someone downloads LO from our
website. Now these may not be very useful since only very limited
information is obtained by this method. Also, not everyone gets to
participate in this because not everyone downloads LO. Some just get it
preinstalled on their O.S while others get a copy through their friends.

I believe it's important for us to know about our users as deeply as
possible so as to make informed choices. The information which we should
be looking for is:

1. Operating System, word size and kernel version
2. RAM and Cache amount
3. CPU and GPU specs
4. Opencl driver
5. Display specs
6. Country
7. Default Language
8. <anything_else?>

Now, obviously this is sensitive information and most users would
disagree to share it. So we could introduce a way to anonymously share
this data. We could enable client to use a proxy to share this OR enable
this data to be sent over Tor (Onion Router). But again, most users
wouldn't want that.

So I've found another way of doing this. Have a look at Rappor[1]. It
introduces some random noise so that we are never sure of the data that
client sends us. The statistics that we would get would be in terms of
probability. For example, if a system has i3 processor, it will roll a
dice to determine whether it should speak the truth or not. And by
default we could have 80% (?) chance of speaking the truth. So if we get
the data that user is running i3 processor, we are 80% sure that he/she
is. And 20% chance that he/she is reporting wrong info. So aggregate
that for a large number of users and we would get a rough trend.

We could also share this data in the forms of numbers and graphs(and
other representations) on our website.

So this would work this way. Whenever someone installs or upgrades LO
and starts LO for the first time, a dialog box appears asking for
permission to share some data while also explaining how this would not
compromise their privacy.

I'd like to know your views on this. And I'd like to implement this if
none of you want to. I may apply for this as a project in GSoC. So
please inform me if you can be a mentor for this project.

[1] https://www.chromium.org/developers/design-documents/rappor

Regards,
Jaskaran Veer Singh


_______________________________________________
LibreOffice mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/libreoffice
Tor Lillqvist-2 Tor Lillqvist-2
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Collecting User Statistics

What would be the usefulness of such information? We don't want to let what machine and OS users run at some (future) point in time dictate what we do in even more future LibreOffice versions.

Like, if we would put such code into 5.4, and in Spring 2018 it would say that 5% of users, let's say a million, still run Windows XP or have CPUs without SSE2, would that then mean that those users would effectively hold us ransom to keeping support for XP and lack of SSE2 forever? I don't think we want that. This is a developer-driven project and we don't like to keep historical baggage forever.

Commercial suppliers of LibreOffice support are another thing, of course. If a paying customer insists on XP support (or some other silly thing), the company does that.

Just my opinion, of course.

---tml


_______________________________________________
LibreOffice mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/libreoffice
Jaskaran Singh Jaskaran Singh
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Collecting User Statistics

Hi Tor,

On Thursday 09 March 2017 03:29 PM, Tor Lillqvist wrote:
> What would be the usefulness of such information? We don't want to let
> what machine and OS users run at some (future) point in time dictate
> what we do in even more future LibreOffice versions.

Ofcourse, it should not dictate what we do. But having some information
is better than no information. Until now, we're just estimating the
numbers but suppose, the number of XP users comes out to be 20%
(hypothetical, but possible since XP is still quite popular in 3rd world
where free software is also popular), then I guess it would be more
sensible to continue support for it. But that's the Organization's
decision. My point is that having data would help us make informed
decisions. Not that data would dictate us what decisions we make.

Actually, my original intent was to get data which would help our
marketing department in figuring out where we lack and where new avenues
wait for us. And in a similar way for development team as well.
Over time, it could serve as an indicator of LO's popularity based on
the category of users (like Mac Users, high-end/low-end system users)

> Like, if we would put such code into 5.4, and in Spring 2018 it would
> say that 5% of users, let's say a million, still run Windows XP or have
> CPUs without SSE2, would that then mean that those users would
> effectively hold us ransom to keeping support for XP and lack of SSE2
> forever? I don't think we want that. This is a developer-driven project
> and we don't like to keep historical baggage forever.
>
> Commercial suppliers of LibreOffice support are another thing, of
> course. If a paying customer insists on XP support (or some other silly
> thing), the company does that.
>
> Just my opinion, of course.
>
> ---tml
>

Regards,
Jaskaran Veer Singh
IRC Nick: jvsg

_______________________________________________
LibreOffice mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/libreoffice
Heiko Tietze-2 Heiko Tietze-2
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Collecting User Statistics

In reply to this post by Jaskaran Singh
Hi Jaskaran,

most of these information are submitted when you access a web site, unless you actively use countermeasures. Therefore we know the percentage of users who stick to XP, what their average screen resolution is, and where they come from.

Much more interesting is how LibreOffice is being used. That starts with sessions, e.g. how often people open a module, how long is it open in average etc. And we are very interested in the actual interaction, meaning how often is "Clone Formatting" used, for example. Those data have been gathered during the project renaissance [1] and it was the basis for the recent menu and toolbar changes [2]. The data clearly needs an update.

Additionally to the overall statistics it would be interesting to go more into the details. As an example we could make decisions based on what function is followed most by an undo. If we go ahead with this approach the data storage becomes interesting. Consider a large number of users that submit all their interactions...

TDF did a tender in 2015 [3] but there wasn't too much interest in doing the work. But it's still a topic. And AFAIK we have something implemented. Some interactions (or just the summary?) is stored when a command-line switch is set. But don't remember exactly.

I would really appreciate if you would work on this topic.

Cheers,
Heiko

[1] https://wiki.openoffice.org/wiki/Renaissance
[2] https://design.blog.documentfoundation.org/2016/01/22/way-down-in-the-libreoffice-menus/
[3] https://blog.documentfoundation.org/blog/2015/02/24/tender-to-develop-and-incorporate-usability-metrics-collection-for-libreoffice-201502-02/

On 03/09/2017 10:10 AM, Jaskaran Singh wrote:

> Hi,
>
> Currently we collect user stats when someone downloads LO from our
> website. Now these may not be very useful since only very limited
> information is obtained by this method. Also, not everyone gets to
> participate in this because not everyone downloads LO. Some just get it
> preinstalled on their O.S while others get a copy through their friends.
>
> I believe it's important for us to know about our users as deeply as
> possible so as to make informed choices. The information which we should
> be looking for is:
>
> 1. Operating System, word size and kernel version
> 2. RAM and Cache amount
> 3. CPU and GPU specs
> 4. Opencl driver
> 5. Display specs
> 6. Country
> 7. Default Language
> 8. <anything_else?>
>
> Now, obviously this is sensitive information and most users would
> disagree to share it. So we could introduce a way to anonymously share
> this data. We could enable client to use a proxy to share this OR enable
> this data to be sent over Tor (Onion Router). But again, most users
> wouldn't want that.
>
> So I've found another way of doing this. Have a look at Rappor[1]. It
> introduces some random noise so that we are never sure of the data that
> client sends us. The statistics that we would get would be in terms of
> probability. For example, if a system has i3 processor, it will roll a
> dice to determine whether it should speak the truth or not. And by
> default we could have 80% (?) chance of speaking the truth. So if we get
> the data that user is running i3 processor, we are 80% sure that he/she
> is. And 20% chance that he/she is reporting wrong info. So aggregate
> that for a large number of users and we would get a rough trend.
>
> We could also share this data in the forms of numbers and graphs(and
> other representations) on our website.
>
> So this would work this way. Whenever someone installs or upgrades LO
> and starts LO for the first time, a dialog box appears asking for
> permission to share some data while also explaining how this would not
> compromise their privacy.
>
> I'd like to know your views on this. And I'd like to implement this if
> none of you want to. I may apply for this as a project in GSoC. So
> please inform me if you can be a mentor for this project.
>
> [1] https://www.chromium.org/developers/design-documents/rappor
>
> Regards,
> Jaskaran Veer Singh
>
>
> _______________________________________________
> LibreOffice mailing list
> [hidden email]
> https://lists.freedesktop.org/mailman/listinfo/libreoffice
>
--
Dr. Heiko Tietze
UX Designer
Tel. +49 (0)179/1268509


_______________________________________________
LibreOffice mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/libreoffice

signature.asc (499 bytes) Download Attachment
Markus Mohrhard Markus Mohrhard
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Collecting User Statistics

In reply to this post by Jaskaran Singh
Hey Jaskaran,

On Thu, Mar 9, 2017 at 10:10 AM, Jaskaran Singh <[hidden email]> wrote:
Hi,

Currently we collect user stats when someone downloads LO from our
website. Now these may not be very useful since only very limited
information is obtained by this method. Also, not everyone gets to
participate in this because not everyone downloads LO. Some just get it
preinstalled on their O.S while others get a copy through their friends.

I believe it's important for us to know about our users as deeply as
possible so as to make informed choices. The information which we should
be looking for is:

1. Operating System, word size and kernel version
2. RAM and Cache amount
3. CPU and GPU specs
4. Opencl driver
5. Display specs
6. Country
7. Default Language
8. <anything_else?>

Now, obviously this is sensitive information and most users would
disagree to share it. So we could introduce a way to anonymously share
this data. We could enable client to use a proxy to share this OR enable
this data to be sent over Tor (Onion Router). But again, most users
wouldn't want that.

So I've found another way of doing this. Have a look at Rappor[1]. It
introduces some random noise so that we are never sure of the data that
client sends us. The statistics that we would get would be in terms of
probability. For example, if a system has i3 processor, it will roll a
dice to determine whether it should speak the truth or not. And by
default we could have 80% (?) chance of speaking the truth. So if we get
the data that user is running i3 processor, we are 80% sure that he/she
is. And 20% chance that he/she is reporting wrong info. So aggregate
that for a large number of users and we would get a rough trend.

We could also share this data in the forms of numbers and graphs(and
other representations) on our website.

So this would work this way. Whenever someone installs or upgrades LO
and starts LO for the first time, a dialog box appears asking for
permission to share some data while also explaining how this would not
compromise their privacy.

I'd like to know your views on this. And I'd like to implement this if
none of you want to. I may apply for this as a project in GSoC. So
please inform me if you can be a mentor for this project.



So basically this requires an opt-in scheme instead of the opt-out that you have in mind. Users are very sensitive when it comes to collecting information that are perceived as personal. Based on that I think the value might not be as big as you hoped. Currently the plan is to collect info about the number of active users as part of the automatic update but not much more. Similar to Tor I'm not so sure if I see the value in having a huge collection of statistics that we are not planning to use. Besides the obviously problem of privacy the bigger your data set the more work you need to invest in processing the data.

Based on that it would help if you would provide some cases where having such detailed statistics would help us improve LibreOffice.

Regards,
Markus

_______________________________________________
LibreOffice mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/libreoffice
Jaskaran Singh Jaskaran Singh
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Collecting User Statistics

Hi moggi,

Originally, my vision was to use the data collected for the help of our
marketing team. They would get to know the category of people they
should aim for. They could do all kinds of data mining stuff to get
what's useful for them.

l10n team could also use this data to see where LO is gaining popularity
and where they should focus their efforts on. The dev team could also
make use of it. But I can't think of any at the moment.

And yes, users could be very conscious of the data they share with us.
Maybe, a polite dialog box explaining them how their privacy isn't
compromised can work. But, quite a few of them would still opt out.

On Tuesday 14 March 2017 08:21 PM, Markus Mohrhard wrote:

> Hey Jaskaran,
>
> On Thu, Mar 9, 2017 at 10:10 AM, Jaskaran Singh <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi,
>
>     Currently we collect user stats when someone downloads LO from our
>     website. Now these may not be very useful since only very limited
>     information is obtained by this method. Also, not everyone gets to
>     participate in this because not everyone downloads LO. Some just get it
>     preinstalled on their O.S while others get a copy through their friends.
>
>     I believe it's important for us to know about our users as deeply as
>     possible so as to make informed choices. The information which we should
>     be looking for is:
>
>     1. Operating System, word size and kernel version
>     2. RAM and Cache amount
>     3. CPU and GPU specs
>     4. Opencl driver
>     5. Display specs
>     6. Country
>     7. Default Language
>     8. <anything_else?>
>
>     Now, obviously this is sensitive information and most users would
>     disagree to share it. So we could introduce a way to anonymously share
>     this data. We could enable client to use a proxy to share this OR enable
>     this data to be sent over Tor (Onion Router). But again, most users
>     wouldn't want that.
>
>     So I've found another way of doing this. Have a look at Rappor[1]. It
>     introduces some random noise so that we are never sure of the data that
>     client sends us. The statistics that we would get would be in terms of
>     probability. For example, if a system has i3 processor, it will roll a
>     dice to determine whether it should speak the truth or not. And by
>     default we could have 80% (?) chance of speaking the truth. So if we get
>     the data that user is running i3 processor, we are 80% sure that he/she
>     is. And 20% chance that he/she is reporting wrong info. So aggregate
>     that for a large number of users and we would get a rough trend.
>
>     We could also share this data in the forms of numbers and graphs(and
>     other representations) on our website.
>
>     So this would work this way. Whenever someone installs or upgrades LO
>     and starts LO for the first time, a dialog box appears asking for
>     permission to share some data while also explaining how this would not
>     compromise their privacy.
>
>     I'd like to know your views on this. And I'd like to implement this if
>     none of you want to. I may apply for this as a project in GSoC. So
>     please inform me if you can be a mentor for this project.
>
>
>
> So basically this requires an opt-in scheme instead of the opt-out that
> you have in mind. Users are very sensitive when it comes to collecting
> information that are perceived as personal. Based on that I think the
> value might not be as big as you hoped. Currently the plan is to collect
> info about the number of active users as part of the automatic update
> but not much more. Similar to Tor I'm not so sure if I see the value in
> having a huge collection of statistics that we are not planning to use.
> Besides the obviously problem of privacy the bigger your data set the
> more work you need to invest in processing the data.
>
> Based on that it would help if you would provide some cases where having
> such detailed statistics would help us improve LibreOffice.
>
> Regards,
> Markus

Regards,
Jaskaran

_______________________________________________
LibreOffice mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/libreoffice
Dennis Roczek Dennis Roczek
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Collecting User Statistics

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi Jaskaran,

On 14.03.2017 16:17, Jaskaran Singh wrote:
> l10n team could also use this data to see where LO is gaining populari
ty
> and where they should focus their efforts on. The dev team could also
> make use of it. But I can't think of any at the moment.
If you speak - say - 2 languages (English + native) then you can only
translate one language and you won't start learning (e.g.) Mongolian.

Or did I miss something?

Dennis
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJYyA+FAAoJEM4+Qf3OKrbZgXMQAIpimGKDTq1shAAP/6FEHsrm
BUr3fceufA8vOx4y8b29rg8UBIYelnLgZ5EvIkMBsz2KI+hBq3ugRzQxMXBM5dkm
u2CjnUESAEAQu4d3MS9VblQq4Tx+EUDhDDDHff/DDDi40xBRWbbni2T6GCn0FUaD
9nB0FDSi+ri58K1LXVeshT3ghdEUWTyw3sNKpNOBzRbuXinSsP5XU9JP20aHoaZv
+zNkgayes91YlNxm2zmiaFPRRAcRLEcLCoH5e9hVbfCMLTlsgqqB1bV1V0EK+ism
R8SO1EMhQPaqbBJsIL5nTz+BtIEqMxcBXw8SnDkzACXdVWsKPVVb7RPPc6iRfv+i
REFxvT+vD7Pjfdzv6lxdgMXa5/pJ2cB69zc9b7jlAEzOHy3glawl4NOFo/twp7CP
hkn51zPK7OGKmO6Y9B8XIpFxom9212rQSfgolhfob4KXX3mX8P6zTZcEGo8tpQh4
KtDA/3lu0klwi9bbK5WfYSc8F3KSomiYcs/o00V2OATKZVpklw8c2MbT8Pm1ZW5a
r2WqJKh7Ue+55BZ8AqGBZkqHKJ411pffXalZF1rUMkCR1w0Y2k7lXu4d8POcO5Oc
ngQ3u1yBoZzHAlapOtzPUXBcsUMCoSrGsUeP1l38MrUlnEsVcr6HcLCoEgn80fc+
MamsgICAZwVAcKJ9dmJ5
=PoBX
-----END PGP SIGNATURE-----
_______________________________________________
LibreOffice mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/libreoffice
Loading...