Apache OpenOffice (AOO) Bugzilla – Issue 38942
project home page data stats for 2004
Last modified: 2005-01-21 16:49:58 UTC
I'm opening this issue to attach a text version of the project home page hits provided in the html file linked in the following thread at the dev@stats list: http://stats.openoffice.org/servlets/ReadMsg?list=dev&msgNo=55
Created attachment 20492 [details] data only for project home page hits
That txt file has some data that was not cleaned up properly.
Created attachment 20515 [details] cleaned up text file; shows only data
I have pumped the data into the snapshot version of calc (40165 rows), defined the data range (all), and enabled an auto-filter, which should help when to pull data by project from it. Now I am hunting for MIME type... can I use calc, or must I use binary for this new OOo file format?
Created attachment 20516 [details] data in OOo1.9m62, with defined data range and auto-filter in place.
Created attachment 20524 [details] sorted and hits sub-tototaled by project by month; data expandable by project.
Crunching of the data was not too terrible of a task. The computer did most of the work. I wonder if a script might be created to automate the task in some places, because the Collabnet data will be somewhat uniform. The process I used was as follows: 1. Opened the HTML file in a text editor. 2. Deleted all the peripheral HTML code above and below the data. 3. Used Find and Replace to Replace All of the repetitive and unnecessary strings, either with commas, or with nothing, to trim the data down into comma delimited lines of data. 4. Saved the file as .csv, then opened it in Calc. 5. Defined a data range to allow quick selection of all cells with data. 6. Added a column called Month, and used the MONTH function to assign a month to each line of data. Copied the MONTH function from cell B1 and pasted it down the column by defining another data range for columnA, selecting the columnA data range, then used the control and enter keys to deactivate the title cell and the B1 cell, whose formula I was pasting into the other cells. 7. Sorted the data by project, then by date. 8. Applied sub-totals: First group by Project with sub-total for Hits; second group by Month with sub-total for Hits; third group by Date with no sub-total. I'm guessing this data would be easier to handle in a database. Opening the CSV file in Calc and saving it greatly reduced its file size. Adding the sub-totals made saves take a long time, but (snapshot) Calc seemed pretty stable throughout all of the process.
In step 6 of the process, I should have said the A1 cell, not the B1 cell, in both instances.
Created attachment 21754 [details] altered csv file ported into OOoBase file
I altered the csv file by adding a title row, and by replacing all commas with semicolons, so that I could open it in OOoBase. Once the data was in OOoBase, I created three queries. We could set up a query for each project. I used the query wizard, and it seemed pretty straightfwd. The data is not normalized properly, I am sure. (Wicked newbie in operation here.) :) Amazingly (to me), the file size shrunk to 4.9K! There are still two sets of data per project for most dates. Earlier I asked earlier why, hoping to better define the second entries, but no response came back. I am not yet aware of how we would add new data to this file, if more were provided. Maybe I'll come back to play further as I learn more, or maybe others would prefer to pick up where I have stopped. For now, I've done what I planned to do with this data. For me it was mostly play; I prefer to play with real data and this was a fun playground. I am closing this issue now, due to a seemingly lack of interest. The data is here if anyone needs or wants it, and the issue can be reopened if there is need.
closed.
JA->Dianne: remark: It would be very useful if you attach the csv data which you have stored within /home/diane/Files/OOoStats because the odb file format doesn't have all data in a self contained form. I have to rebuild the structure on my system if I want to open such a file. Unfortunately I cannot access that data from that file...
Created attachment 21756 [details] update csv file (thanx for the feedback! sorry for the inconvenience.)
Hi Joost, I attached the csv file. Is that what you meant? Or do I need to somehow attach (or embed) it into the database file? I will do more if it is needed. I'm just painfully inexperienced ATM, which maybe is the whole point of my work in the first place. :)
the database file was created in OOo1.9m69.