Thursday, April 20, 2017

#Wikidata user stories - Suggesting Henry Putnam, a great #Librarian

As software suggest what articles to write, it is relevant to understand what logic it is based on. Phenomena like the "six degrees of separation" made popular around Kevin Bacon has its scientific approach in graph theory "betweenness centrality". This is used as a basis in the research that what articles are important and what automated suggestions to make.

Mr Putnam is one of the more relevant librarians. He developed an eponymous classification system, continued its development as the Librarian of Congress (it is still in use), was twice president of the American Library Association and was a knight of the order of the Polar Star. When weight is applied to references to a person, all this is of relevance in the right setting.

When an article is to be written or improved, it helps when it can be suggested what it is that can be improved. By including statements in Wikidata suggestions can be made based in the local language. Facts like date of birth and death are also easy and obvious.

So when people consider a particular subject to be of universal relevance, it helps when associated subjects are well developed in Wikidata. When for all the presidents of the American Library Association many facts like where they studied, where they worked and what awards they received are included. When this is done for all the people who share categories, the betweenness of many influential librarians increases. This will have its influence on what is suggested for people to do.
Thanks,
       GerardM

Wednesday, April 19, 2017

#Wikidata user stories - the sum of all #knowledge


Map showing all places English Wikipedia covers


Map showing all places GeoNames covers

They say "a picture paints a thousand words". There is no argument; English Wikipedia covers only so much. With such a lack of coverage it is impossible to understand what is missing and its relevance particularly to people who do not read English.

LSJbot has created lots of articles for the places GeoNames knows about in several Wikipedias. As a consequence through the backdoor much of the missing information enters Wikidata. There have been some rumblings among Wikidatans that the GeoNames data is not perfect.. But hey, let's make "Be bold", a Wikipedia quality a Wikidata quality as well.

For many Wikipedians, the notion of bot generated articles is an anathema. For others the fact that there is so much that we do not cover is as problematic. The good news is that more information in Wikidata will enable us to predict what is lacking in content. We only need to acknowledge that Wikipedia is not the sum of all knowledge.. yet.
Thanks,
      GerardM

#Wikidata user story - Suggestions to #Wikipedia editors

Exciting is the #research done on "suggestions to Wikipedia editors". There is a paper and a great presentation. The bottom line is that when you know what to suggest to people; when you make it personal, the result is what you would hope. Consider, 3.2 times the number of articles created and two times more articles created than without personalised recommendations.

There is math involved, obviously, but the gist is that when suggestions are in line with previous activities, people will be triggered to do more. When you listen to the presentation, this first experiment asks people to translate from English. The assumption is that English covers more than most.

The slides of the presentation include visualisations showing the coverage of several Wikipedias. When you consider them, it becomes clear where the Wikimedia projects are challenged.

Leila Zia, the presenter makes it clear; all this would not be possible without Wikidata. One thing where Wikidata is different from the assumptions of the research is that there is an increasing number of subjects that have no links to Wiki(m/p)edia articles at all. Many of these are connected to existing content as they share common statements, statements like "profession: soccer player" of "award received: whatever award".

When totally new subjects are to be considered, there is already plenty that might be suggested in Wikidata itself.
Thanks,
      GerardM

Monday, April 17, 2017

#Wikidata user story - #DBpedia, #death and #Federation

Federation between DBpedia and Wikidata became possible. As a consequence, the results of a query that runs on DBpedia can be linked to Wikidata.

Some time ago people at DBpedia created a wonderful query that shows differences between DBpedia and the Dutch and Greek Wikipedia. It received approval from the Dutch Wikipedia community.

With federation something much more interesting became possible; a federated query comparing Wikidata with one DBpedia at a time. When the query runs, current data from Wikidata and DBpedia is presented.  When a Wikipedia associated with  DBpedia changes, DBpedia may import the differences from a RSS-feed and consequently running the query again will show the latest differences.

Updating information about one particular type of statement like date of death, place of death or whatever, will always be based on the current differences.. Experiencing the results in this way is truly motivating. Federation is an instrument that can helps us improve the quality of either federated system.
Thanks,
      GerardM

#Wikidata user story - #Wikipedia #diversity and diversity #research

Diversity, especially the "gender gap" is one of the best researched subjects of Wikipedia. There are many projects that have it as their goal to diminish the gap they object to.

Wikidata has the best and most up to date information about any Wikipedia. People are updating Wikidata all the time, typically its information is based on a Wikipedia.

Take gender; many a Wikipedia has a category for this so it is easy to update Wikidata based on what is in such categories. When a researcher is interested in the articles where Wikidata does not have such information, articles will be found and it is appreciated when Wikidata is updated by them as part of these activities. As a rule, the percentage of "humans" with no known gender is dropping anyway.

When a Wikipedia editor has an interest in female scientists that do not have an article in English, it is easy enough to have a query for that. Not all female scientists with or without a Wikipedia article can be found this way but it is just a matter of adding them in Wikidata. When another editor is interested in female scientists with no article in German of Kannada, it is just one change in the same query.
Thanks,
        GerardM

#Wikidata user story - the #library

The OCLC is an organisation combining most of the libraries in the world. It used to connect to the English Wikipedia but as Wikidata connects all Wikipedias, the OCLC does a better job linking to Wikidata. Through Wikidata it can link to articles about authors in any language.

For many authors the connection between VIAF, the system used by the OCLC and Wikidata is still missing. Many people are adding VIAF identifiers and once a month the data is imported and all the new data pops up.

Best practice at English Wikipedia has it that an {{authority control}} template is added in the reference section of people. When a VIAF identifier is added in Wikidata not only a VIAF identifier but also Worldcat information is shown (the example is for William Keepers Maxwell Jr.). Doing this is possible for any Wikipedia.

Now to expand on this; when a reader opts in, we could show if a book of an author is available in the local library.. What do you think?
Thanks,
       GerardM

Why #Wikidata? Because it is useful!

Wikidata was useful from the start. It provides a service to all Wikipedias and after the startup, it now provides the same service to Commons and Wikisource. It connects information about the same subject, they are the interwiki links.

The next phase was to connect these subjects. This is an internal Wikidata project and it not really used. This data could be useful but it is not always up to date and the requirements for the primary use cases are not realistic and almost impossible to fulfil. The challenge is to provide sourced information for every statement.

The challenge is: how do we provide a use for the Wikidata data. How do we get people to actually use Wikidata, have an interest in the data and maintain what is in their interest.

Software developers create "user stories" to explain what their software is to achieve. Why not write user stories that show how Wikidata can already be used and expand the stories on how to be even more useful and usable?
Thanks,
      GerardM

Sunday, April 16, 2017

#Wikipedia - The death of Lanier Meaders

Mr Meaders was a notable potter who died in February 1998 according to folkpottery.com. The English Wikipedia article however is in two minds about his death. Yes he is dead but when did he die?

According to the category he was one of the living death for 10 years. In the text the year of his demise is correctly stated as 1998. By googling for a source another date was found.

As I am not an English Wikipedian, I do not know how to indicate sources in English Wikipedia. The date of death in Wikidata does have a reference. The question is how differences like the dates of death of Mr Meaders are found and improve the consistency in the information that we provide in all of our projects.
Thanks,
      GerardM

NB the information in Wikidata on Mr Meaders is not complete.

Thursday, April 13, 2017

#Wikidata - People die; implications for another #policy approach

People die, notable people die. It is natural and it happens all the time. Many a #Wikipedia has a category for the people who died in a specific year. Such categories are what makes a wonderful tool by Pasleim tick. It shows those Wikidata items that have no date of death while a Wikipedia knows about the demise of the person involved.

This is a wonderful tool; it allows Wikidata to take care of those who died and update its data. It leaves us with another option and add one more tool. A tool that checks if the date of death exists in the Wikipedias that do not have such a category.

Consider this; a date of death is relevant when you consider the "Biographies of Living People". Having complete information for people is important. So why not flip our approach to the BLP and provide tools to improve the existing information in all of our projects?

First things first; the objective is to signal the death of a person. As is the current policy, it is up to every project to do with it as it likes. What should follow is looking for sources when one is available and preferably add at least one to Wikidata for re-use.

What are the benefits; a positive approach to maintenance and invite people to do something that actually matters now. It is an invitation to read the article and see what more can be done to get in into shape.

When the date for a death exists in an article, the article will be removed from the articles that need attention. There are plenty of valid approaches to this.

Improving user engagement is one of the objectives of the Wikimedia Foundation itself. I really want the WMF to include active engagement where it makes a difference and be as pro active as it can in this field. This is a positive approach and that is what we badly need.
Thanks,
      GerardM

Saturday, April 08, 2017

#WhiteHouse Fellows - Mrs Margarita Colmenares

Mrs Margarita Colmenares is a White House Fellow. A message was posted on Twitter that her article had been created and to support the message, it was easy enough to add her on Wikidata as well. The article mentioned that she was a White House Fellow and adding one layer of additional information is one way of making a person more relevant.

Adding this fellowship and adding other people who were a fellow was easy enough. The Wikipedia article referred to the website of the White House for information and when you visit its website you will be thanked for having an interest in this subject.

At a time like this it is good to consider Archive.org.  Its crawler worked well at some dates for other dates the message you will see is: "Got an HTTP 301 response at crawl time".

Anyway.. Together, the information at whitehouse.gov and at archive.org provide enough of a reference.
Thanks,
     GerardM

Friday, April 07, 2017

#Wikidata - #Perfection or #progress

When you consider the intention of the "BLP" or the "Biographies of Living People", you will find that it is defensive. It is the result of court cases brought against the Wikimedia Foundation or Wikipedians by living people. The result was a restrictive policy that intents to enforce the use of "sources" for all statements on living people.

The upside was fewer court cases and the downside; administrators who blindly applied this policy particularly in the big Wikipedias. Many people left, they no longer edit Wikipedia.

At Wikidata there are proponents of enforcing a BLP explicitly so that they have the "mandate" to block people when they consider them too often in violation of such a policy.

For a reality check; there are many known BLT issues in Wikidata that are not taken care of. There are tools like the one by Pasleim who make it easy to do so. There have been no external complaints about Wikidata so far but internal complaints, complaints about the quality of descriptions for instance, are easily waved away.

The implementation of a "DLP" or "Data of Living People" where "sources" are mandatory would kill much of the work done at Wikidata and will not have an effect on the existing backlog. Killing the backlog removes much of the usability of Wikidata and will prove to be even worse.

In order to responsibly consider new policies, first reflect on the current state of a project. What issues need to be addressed, what can be done to focus attention on the areas where it is most needed. How can we leverage what we know in other projects and in external sources. When it is really urgent make a cost analysis and improve the usability of our software to support the needed progress. And yes, stop insisting on perfection; it is what you aim for, No one of us is in a position to throw the first stone.
Thanks,
      GerardM