Diese Seite auf Deutsch: Wikipedia:WikiProjekt Vorlagenauswertung

This project (Codename: Templatetiger) extracts all templates from the database dump. It intends to analyse the variable values contained in the templates and represent them in new ways, apply filters and do other useful stuff.

Startpage: https://iw.toolforge.org/templatetiger

Screenshot

Objectives

Bearbeiten

The now dormant project Wikidata shall be revived, Wikipedia itself shall be prepared for the highly interesting Semantic MediaWiki.

On the one hand, this projects intends to facilitate maintenance work (on categories and templates) and projects on Wikipedia, on the other hand it intends to offer new search capabilities for interested users of Wikipedia. Although such data was extracted previously within the projects Geographical coordinates and Persondata, a similar effort was not afforded to smaller subjects, until now.

Another objective is to demonstrate that template extraction via the parser (realtime data) is sensible and feasible in regards to performance. Moreover, not every template needs necessarily a table for itself.

The project layout intends to support all the mayor Wikipedia languages from the start.


This project is used by modifying the URL, so it offers only a very limited usability. To mitigate this limitations, we will answer questions to the project on the talk page. As the amount of data is very large, we please ask interested parties for a little patience until results to a query are displayed. Once the query is completed the results can be used very fast, though.

Template selection

Bearbeiten

To select the available templates, open on one of the following pages:

Table display

Bearbeiten

It is possible to filter the template for a pair of variables by using the URL parameters "Where" and "is" for the desired variable value. For example:

displays all people born in London.

displays all mountains which where first climbed during the 19th century.

The query tries to find a substring by using the SQL command LIKE %...%. As wildcards, the commands "%" are used for a variable number of signs, the command "_" for exactly one discretionary sign. The following query displays all mountains, whose last eruption is noted in the template (volcanoes): https://iw.toolforge.org/templatetiger/tt-table4.php?lang=de&template=Infobox%20Berg&where=LETZTE%20ERUPTION&is=_&offset=0&limit=30

Regular expressions

Bearbeiten

https://iw.toolforge.org/templatetiger/tt-table4.php?lang=de&template=Infobox%20Berg&where=H%he&is=%5b8%5d%5b0-9%5d%5b0-9%5d%5b0-9%5d&offset=0&limit=30&regexp=yes]

Display of all 8000 m mountains by using regular expressions (please copy internet address into the browser box)

Negating the query

Bearbeiten

&where...&is=...&not=yes shows only items, which do not fulfill the query, but they must contain an entry.

Sorting the result

Bearbeiten
  • &order=article sorting the articles alphabetical by article names (Example)
  • &order=columnname sorting the articles alphabetical by a selectable column. Shows only articles with an entry. (Example Games sorting by producer)

Sorting works only without a filter.

Selection of Columns

Bearbeiten

With parameter &columns=column1,column2,... only some columns are displayed, so the result could be faster and more clearly. Example: https://iw.toolforge.org/templatetiger/tt-table4.php?template=Infobox%20See&lang=de&where=&is=&columns=LAGE,MAX-TIEFE

Change of line count

Bearbeiten

By changing the limit variable in the URL it is possible to display more than the standard amount of articles which is set to 30. On account of security the maximum amount is limited to 2000 at the moment.

Re-use of data in spreadsheets

Bearbeiten

By copying the tables in new spreadsheets it is possible to hide rows, sort the content or change data fields.

Openoffice Calc

Bearbeiten

Supports besides Copy/Paste the possiblity to use Insert/Link to external datas... directly to the URL.

MS Excel

Bearbeiten

The programm can be used with the Extras/Web query.

Disadvantages of this procedure

Bearbeiten

All data field entries are uniformly recognized as text. This limits the possibilities for e.g. the sorting of numbers. More than one filter criterium is difficult to apply. It appears to be difficult to find articles which lack certain field entries.

Database

Bearbeiten

Creation

Bearbeiten

The data inside the database is read while extracting the geographical data (WikiProject Geographical coordinates). The Perl script was extended for this purpose, so that it can also read all text inbetween curly braces ({{…}}). Templates without variables are ignored. From the other templates the variable names and values are extracted.

The current method reads only templates, which are not listed inside another template. If for example in the Template:Infobox in the variable "POSITION=" the Template:Coordinate is used, only the variable value "Template:Coordinate" can be read. Furthermore, the comments inside the templates are left out, because otherwise the data interpretation would have been much more difficult.

Database layout

Bearbeiten

For each language version there will be an own table. For the German language, this is "pub_tt1_de". Every variable in every template of every article contains one set of data, so for the German WP, there are 1,9 million data entries at the moment. A data set contains:

name Name of the article
name_id ID-number of the article
tp_name Name of the template
entry_name Name of the template variable (1,2,3,… or name1,name2…)
value Value of the template variable

How to access

Bearbeiten

Through the Toolserver you can access to u_kolossos_p database, if you have an account. This way, you can write your own application.

Data download

Bearbeiten

There are plans to make the data download possible later.

Project participants

Bearbeiten

People are sought for further optimization of the data analysis, and to propagate the project in the other language versions.

Project coordinators

Bearbeiten

We are keen to answer any of your questions.