Issue 38942 - project home page data stats for 2004
Summary: project home page data stats for 2004
Status: CLOSED IRREPRODUCIBLE
Alias: None
Product: stats
Classification: Infrastructure
Component: www (show other issues)
Version: current
Hardware: All All
: P3 Trivial
Target Milestone: ---
Assignee: issues@stats
QA Contact: issues@stats
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-12-13 20:16 UTC by diane
Modified: 2005-01-21 16:49 UTC (History)
1 user (show)

See Also:
Issue Type: TASK
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
data only for project home page hits (893.85 KB, text/plain)
2004-12-13 20:21 UTC, diane
no flags Details
cleaned up text file; shows only data (848.86 KB, text/plain)
2004-12-14 18:40 UTC, diane
no flags Details
data in OOo1.9m62, with defined data range and auto-filter in place. (341.74 KB, application/octet-stream)
2004-12-14 19:11 UTC, diane
no flags Details
sorted and hits sub-tototaled by project by month; data expandable by project. (529.85 KB, application/octet-stream)
2004-12-15 00:23 UTC, diane
no flags Details
altered csv file ported into OOoBase file (4.88 KB, application/vnd.oasis.opendocument.base)
2005-01-21 15:11 UTC, diane
no flags Details
update csv file (thanx for the feedback! sorry for the inconvenience.) (848.88 KB, text/plain)
2005-01-21 16:39 UTC, diane
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description diane 2004-12-13 20:16:35 UTC
I'm opening this issue to attach a text version of the project home page hits
provided in the html file linked in the following thread at the dev@stats list:

http://stats.openoffice.org/servlets/ReadMsg?list=dev&msgNo=55
Comment 1 diane 2004-12-13 20:21:07 UTC
Created attachment 20492 [details]
data only for project home page hits
Comment 2 diane 2004-12-14 18:25:36 UTC
That txt file has some data that was not cleaned up properly.
Comment 3 diane 2004-12-14 18:40:39 UTC
Created attachment 20515 [details]
cleaned up text file; shows only data
Comment 4 diane 2004-12-14 18:45:37 UTC
I have pumped the data into the snapshot version of calc (40165 rows), defined
the data range (all), and enabled an auto-filter, which should help when to pull
data by project from it. Now I am hunting for MIME type... can I use calc, or
must I use binary for this new OOo file format?
Comment 5 diane 2004-12-14 19:11:29 UTC
Created attachment 20516 [details]
data in OOo1.9m62, with defined data range and auto-filter in place.
Comment 6 diane 2004-12-15 00:23:44 UTC
Created attachment 20524 [details]
sorted and hits sub-tototaled by project by month; data expandable by project.
Comment 7 diane 2004-12-15 00:48:09 UTC
Crunching of the data was not too terrible of a task. The computer did most of
the work. I wonder if a script might be created to automate the task in some
places, because the Collabnet data will be somewhat uniform. The process I used
was as follows:

1. Opened the HTML file in a text editor.
2. Deleted all the peripheral HTML code above and below the data.
3. Used Find and Replace to Replace All of the repetitive and unnecessary
strings, either with commas, or with nothing, to trim the data down into comma
delimited lines of data.
4. Saved the file as .csv, then opened it in Calc.
5. Defined a data range to allow quick selection of all cells with data.
6. Added a column called Month, and used the MONTH function to assign a month to
each line of data. Copied the MONTH function from cell B1 and pasted it down the
column by defining another data range for columnA, selecting the columnA data
range, then used the control and enter keys to deactivate the title cell and the
B1 cell, whose formula I was pasting into the other cells.
7. Sorted the data by project, then by date.
8. Applied sub-totals: First group by Project with sub-total for Hits; second
group by Month with sub-total for Hits; third group by Date with no sub-total.

I'm guessing this data would be easier to handle in a database. Opening the CSV
file in Calc and saving it greatly reduced its file size. Adding the sub-totals
made saves take a long time, but (snapshot) Calc seemed pretty stable throughout
all of the process.
Comment 8 diane 2004-12-15 00:51:06 UTC
In step 6 of the process, I should have said the A1 cell, not the B1 cell, in
both instances.
Comment 9 diane 2005-01-21 15:11:02 UTC
Created attachment 21754 [details]
altered csv file ported into OOoBase file
Comment 10 diane 2005-01-21 15:36:31 UTC
I altered the csv file by adding a title row, and by replacing all commas with
semicolons, so that I could open it in OOoBase. Once the data was in OOoBase, I
created three queries. We could set up a query for each project. I used the
query wizard, and it seemed pretty straightfwd.

The data is not normalized properly, I am sure. (Wicked newbie in operation
here.) :) Amazingly (to me), the file size shrunk to 4.9K! There are still two
sets of data per project for most dates. Earlier I asked earlier why, hoping to
better define the second entries, but no response came back. I am not yet aware
of how we would add new data to this file, if more were provided. Maybe I'll
come back to play further as I learn more, or maybe others would prefer to pick
up where I have stopped.

For now, I've done what I planned to do with this data. For me it was mostly
play; I prefer to play with real data and this was a fun playground. I am
closing this issue now, due to a seemingly lack of interest. The data is here if
anyone needs or wants it, and the issue can be reopened if there is need.
Comment 11 diane 2005-01-21 15:36:59 UTC
closed.
Comment 12 Joost Andrae 2005-01-21 16:24:54 UTC
JA->Dianne: remark: It would be very useful if you attach the csv data which you
have stored within /home/diane/Files/OOoStats because the odb file format
doesn't have all data in a self contained form. I have to rebuild the structure
on my system if I want to open such a file.

Unfortunately I cannot access that data from that file...
Comment 13 diane 2005-01-21 16:39:25 UTC
Created attachment 21756 [details]
update csv file (thanx for the feedback! sorry for the inconvenience.)
Comment 14 diane 2005-01-21 16:44:16 UTC
Hi Joost,

I attached the csv file. Is that what you meant? Or do I need to somehow attach
(or embed) it into the database file? I will do more if it is needed. I'm just
painfully inexperienced ATM, which maybe is the whole point of my work in the
first place. :)
Comment 15 diane 2005-01-21 16:49:58 UTC
the database file was created in OOo1.9m69.