Re: [PATCH 5/7] remote-mediawiki: support fetching from (Main) namespace
From: Eric Sunshine <hidden>
Date: 2017-11-01 19:56:57
On Sun, Oct 29, 2017 at 10:51 PM, Antoine Beaupré [off-list ref] wrote:
When we specify a list of namespaces to fetch from, by default the MW API will not fetch from the default namespace, refered to as "(Main)" in the documentation: https://www.mediawiki.org/wiki/Manual:Namespace#Built-in_namespaces I haven't found a way to address that "(Main)" namespace when getting the namespace ids: indeed, when listing namespaces, there is no "canonical" field for the main namespace, although there is a "*" field that is set to "" (empty). So in theory, we could specify the empty namespace to get the main namespace, but that would make specifying namespaces harder for the user: we would need to teach users about the "empty" default namespace. It would also make the code more complicated: we'd need to parse quotes in the configuration. So we simply override the query here and allow the user to specify "(Main)" since that is the publicly documented name.
Thanks, this explanation makes the patch a lot clearer. More below...
quoted hunk ↗ jump to hunk
Signed-off-by: Antoine Beaupré <redacted> ---diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl@@ -264,9 +264,14 @@ sub get_mw_tracked_categories { sub get_mw_tracked_namespaces { my $pages = shift; foreach my $local_namespace (@tracked_namespaces) { - my $namespace_id = get_mw_namespace_id($local_namespace); + my ($namespace_id, $mw_pages); + if ($local_namespace eq "(Main)") { + $namespace_id = 0; + } else { + $namespace_id = get_mw_namespace_id($local_namespace); + }
I meant to ask this in the previous round, but with the earlier patch mixing several distinct changes into one, I plumb forgot: Would it make sense to move this "(Main)" special case into get_mw_namespace_id() itself? After all, that function is all about determining an ID associated with a name, and "(Main)" is a name.
next if $namespace_id < 0; # virtual namespaces don't support allpages
- my $mw_pages = $mediawiki->list( {
+ $mw_pages = $mediawiki->list( {Why did the "my" of $my_pages get moved up to the top of the foreach loop? I can't seem to see any reason for it. Is this an unrelated change accidentally included in this patch?
action => 'query',
list => 'allpages',
apnamespace => $namespace_id,
--