Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sitemaps for multi languages #1

Closed
Lesrad opened this issue Mar 11, 2015 · 62 comments
Closed

Sitemaps for multi languages #1

Lesrad opened this issue Mar 11, 2015 · 62 comments

Comments

@Lesrad
Copy link

Lesrad commented Mar 11, 2015

A sitemap is only made for one language when enabling sitemaps with this plugin addon, it would be great if the plugin could generate a sitemap for all languages.

I used to manually hack wp-seo to make it work for my site including sitemap generation, see this post maybe it will give you some tips: https://wordpress.org/support/topic/mqtranslate-and-multilingual-seo-with-yoast-seo/page/2?replies=64

but I'm not a coder so I'm sure you guys could make a better version.

@johnclause
Copy link
Member

Thanks a lot for the tips, we will definitely look into it shortly.

@Lesrad
Copy link
Author

Lesrad commented Mar 20, 2015

For a better fix, there maybe hints of another solution i.e. for some one that knows how to code may find helpful http://trac.transposh.org/browser/trunk/WordPress/plugin/transposh/wp/transposh_3rdparty.php from line 223 there seems like another fix (+ line 43-44) looks like it needs manual patching too.
The file also includes several other popular plugin 'fixes' that qtranslate-x users may encounter.

@johnclause
Copy link
Member

Thank you! We will take a look.

@johnclause
Copy link
Member

Did you try using https://wordpress.org/plugins/google-xml-sitemaps-v3-for-qtranslate/, is there anything wrong with it?

Is there a complete solution which works for Yoast SEO and qTranslate-X, which we could put in the code?

@johnclause
Copy link
Member

I looked now over Yoast SEO code, there is no filter, which can be easily enough used to generate entries for each language. Making it work will be almost as much work as writing a new plugin. Why don't we simply use already existent plugin Google XML Sitemaps v3 for qTranslate, which works fine for many people, including myself?

@Lesrad
Copy link
Author

Lesrad commented Sep 18, 2015

Hi John,

Thanks for looking the code over,

Not sure if you got my last link https://wordpress.org/plugins/wp-seo-yoast-integration-mq-translate/developers/
That plugin seems to have code for yoast sitemap.

Google XML Sitemaps v3 for qTranslate

I have used it in the past then had issues at one stage can't recall what it was but the plugin isn't maintained and there are several issues people have with it including:

I understand that for over 80% of users it should suffice, but on the other hand if your plugin could add yoasts sitemaps it would be an added benefit.

Maybe if its hard to hook directly into yoast itself, perhaps commented out code could be inserted into your plugin with a manually applied fix, like I said I'm not good at coding but I got it working for my site manually.

.

Something like:http://trac.transposh.org/browser/trunk/WordPress/plugin/transposh/wp/transposh_3rdparty.php from line 223, but they left out one bracket apparently https://wordpress.org/support/topic/tranposh-make-wordpress-seo-by-yoast-sitemap-xml-pages-blank?replies=3

Hope that helps.

Another approach which may be better, not sure is try incorporate https://wordpress.org/plugins/wp-seo-yoast-integration-mq-translate/ (https://github.com/rufein/wp-seo-yoast-integration-mq-translate) into your plugin.

@johnclause
Copy link
Member

https://wordpress.org/plugins/wp-seo-yoast-integration-mq-translate/developers/

Why is it not working with qtx? I would think qtx is no different from mq- at database level. Have you tried to turn on option "Compatibility Functions"? I am sure if it needs adjustments, it would be very little to make it work with qtx.

All that would be needed for a manual applied patch

Yes, I understand how to change Yost code, and it is not much, but the problem is that it cannot be done in an encapsulated and updatable way. The way https://wordpress.org/plugins/wp-seo-yoast-integration-mq-translate/developers/ did it, it is a copy of Yoast code, modified then. This way we lose ability to take advantage of Yoast future improvements and put ourselves in trouble to always update our code after Yoast update theirs. This does not go along with WP main policy and design. People do this out of grief just to fix their own site, but to make it publicly available and supported would be a lot of trouble, high maintenance.

I would rather try to convince Yoast to put a couple of filters in, that we could hook our little code on. I know that he does not like qTranslate and will refuse to do it for the sake of qtx, but if we design it in a way that any multilingual plugin can hook, and submit a pull request at his place, he might be more cooperative.

@johnclause
Copy link
Member

I have just tried https://wordpress.org/plugins/wp-seo-yoast-integration-mq-translate/developers/, it appears to be a copy of mq- as well. This is just unmaintainable at all ...

@johnclause
Copy link
Member

I was looking into Yoast code again, all we need to ask them is to put code:

function sitemap_url( $url ) {

    if(isset($url['urls'])){
        //means that $url is an array of urls, so concatenate the output from each of them
        $output = '';
        foreach($url['urls'] as $u){
            $output .= $this->sitemap_url( $u );//recursive call
        }
        return $output;
    }

right at the beginning of function function sitemap_url( $url ) in /wp-content/plugins/wordpress-seo/inc/class-sitemaps.php.

Then we can make filter 'wpseo_sitemap_entry' to return an array of urls for each language, instead of a single url, which is normally expected. That is all. This would make Yoast SEO compatible-ready with any multilingual plugin.

What would you think?

@johnclause
Copy link
Member

I will submit code needed for filter hopefully within next 48 hours and then you could test this idea on your site. Is that ok?

@johnclause
Copy link
Member

Image part will be the same for all language urls, is that correct? We will simply copy it?

@johnclause
Copy link
Member

Can you write me directly through https://qtranslatexteam.wordpress.com/contact-us/, I need to discuss something off line.

@Lesrad
Copy link
Author

Lesrad commented Sep 18, 2015

Saw this already in place Yoast/wordpress-seo#2579 may help

@johnclause
Copy link
Member

I committed the changes. Please, use the latest qtx, https://github.com/qTranslate-Team/qtranslate-x, and the latest integration plugin, https://github.com/qTranslate-Team/wp-seo-qtranslate-x.

After installing them, all should function as before, no changes should be observed, except a few improvements unrelated to sitemaps. Then insert the code listed above into yoast file /wp-content/plugins/wordpress-seo/inc/class-sitemaps.php right after the line

function sitemap_url( $url ) {

Do not duplicate the line itself, of course. Here is a copy of the code above to be inserted into /wp-content/plugins/wordpress-seo/inc/class-sitemaps.php for the sake of convenience:

function sitemap_url( $url ) {

    if(isset($url['urls'])){
        //means that $url is an array of urls, so concatenate the output from each of them
        $output = '';
        foreach($url['urls'] as $u){
            $output .= $this->sitemap_url( $u );//recursive call
        }
        return $output;
    }

Please, let me know if it does what you need. Thanks!

P.S. is you use QTranslate Slug, please try it too.

@Lesrad
Copy link
Author

Lesrad commented Sep 19, 2015

Hi John,

It works nearly perfectly, ran on my test env.
Issues were:

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 15728640 bytes) in E:\Websites\Backup\2015\InstantWP_4.4.2\iwpserver\htdocs\wordpress\wp-includes\wp-db.php on line 1285
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 27250551 bytes) in E:\Websites\Backup\2015\InstantWP_4.4.2\iwpserver\htdocs\wordpress\wp-includes\wp-db.php on line 2864

I commented out images as there were x10 for each language prob what caused memory errors and it worked fine after that.

' ' '
$output .= "\t\t" . str_replace( ',', '.', $url['pri'] ) . "\n";

    // if ( isset( $url['images'] ) && ( is_array( $url['images'] ) && $url['images'] !== array() ) ) {
    //  foreach ( $url['images'] as $img ) {
    //      if ( ! isset( $img['src'] ) || empty( $img['src'] ) ) {
    //          continue;
    //      }
    //      $output .= "\t\t<image:image>\n";
    //      $output .= "\t\t\t<image:loc>" . esc_html( $img['src'] ) . "</image:loc>\n";
    //      if ( isset( $img['title'] ) && ! empty( $img['title'] ) ) {
    //          $output .= "\t\t\t<image:title><![CDATA[" . _wp_specialchars( html_entity_decode( $img['title'], ENT_QUOTES, $this->charset ) ) . "]]></image:title>\n";
    //      }
    //      if ( isset( $img['alt'] ) && ! empty( $img['alt'] ) ) {
    //          $output .= "\t\t\t<image:caption><![CDATA[" . _wp_specialchars( html_entity_decode( $img['alt'], ENT_QUOTES, $this->charset ) ) . "]]></image:caption>\n";
    //      }
    //      $output .= "\t\t</image:image>\n";
    //  }
    //  unset( $img );
    //}
    $output .= "\t</url>\n";

    return $output;

' ' '
Obviously the images are not filtered so wp-seo sees them all multiplied by how many languages are active in my case 10.

I tested tags and they worked fine, but none of my tags are translated, so they work fine if your tag was e.g. TAG: en:word de:word da:word , I guess they would be replicated if you had en:word de:wort da:ord you will prob get the number of languages active times the number of words.
In both instances it would be best practice to exclude tags from the sitemap, the option to exclude tags is already setup and functioning in yoast seo.

Another recommendation for users would be to set Max entries per sitemap inside of yoast sitemaps:
Max entries per sitemap:100
(in my case that equals 1000 urls per page, number of langs x Max entries per sitemap)

I dont have qslug setup will try adding it now, can't imagine it would work but will test it now.

@Lesrad
Copy link
Author

Lesrad commented Sep 19, 2015

Another potential issue may be other users site url structure like:
/en/foo/, en.yoursite.com, or ?lang=en.
My structure is simple /blank (base lang) /de da/ es/ etc. for my structure it works perfectly.

@johnclause
Copy link
Member

I am not sure how images need to be treated. Don't they need to be listed for each language? They come with "title" and "alt", which I translated for each language. Would that be correct, if no db error? Looks like your sitemap with images gets so big, that it cannot even fit in a database ently?

The code for filter 'wpseo_sitemap_entry' is in file /wp-content/plugins/wp-seo-qtranslate-x/qwpseo-front.php. Take a look, it is simple enough. You may comment images there too.

Maybe we should use different approach and try to generate a separate sitemap for each language, like "page-sitemap-en.xml", "page-sitemap-xx.xml", etc? Each sitemap will then have approximately the same size of single language map?

I am not sure how hard that would be to implement. Would that work in general for search engines? Or pages with different languages should be listed next to each other? I am not sure about the sitemap requirements.

BTW, page /wp-admin/admin.php?page=wpseo_xml has option "Max entries per sitemap". Have you tried to make it smaller?

@Lesrad
Copy link
Author

Lesrad commented Sep 19, 2015

I think the way it's setup now is correct, except images are not filtered by language, if you made separate language pages maintenance may be harder if they changed their code.

Yes, that's what I meant there are already settings in yoast for Max entries per sitemap etc. was recommended for other users.

Was just trying out qtrans slug, doesn't play nicely with my site, anyway once others use this they will provide better feedback as to issues they are having.

@Lesrad
Copy link
Author

Lesrad commented Sep 19, 2015

images are listed, but in my case 10 images per url per language
screenshot
http://ge.tt/9M3JTAO2/v/0
source:
http://ge.tt/6dC1UAO2/v/0
In that first instance the post has 40 images times 10 languages
it should only have 4 images in 10 languages if that makes sense

I set the Max entries per sitemap right down (5) to assist the generation of the sitemap with images, may help other users with memory issues.

@Lesrad
Copy link
Author

Lesrad commented Sep 19, 2015

Basically that image loop needs to run (url images) divided by (number of languages)

Hmnn that will not work neither if a post had a different image in a lang or more images than another lang.

It maybe easier just to disable images by commenting them out, unless you could add a break point/unique identifier for $img['src'] per language, or like you said earlier make separate pages per lang which would get tricky and be harder to maintain.

Images are not really needed, and I get no errors from google sitemaps when submitting without them, the alternative is Google XML Sitemaps v3 for qTranslate which also doesn't do images, having said that though it would be nice to have them ;)

Overall as it stands now it would help many users, its a pretty straight forward patch even if wp-seo don't add your updated function, if they did, users would still need to manually comment out images as it stands now.

@Lesrad Lesrad closed this as completed Sep 19, 2015
@Lesrad Lesrad reopened this Sep 19, 2015
@johnclause
Copy link
Member

I now understood the problem with images, We can use filter 'wpseo_sitemap_urlimages' to filter duplicated entries, no additional code at Yoast. I added image filtering in the latest https://github.com/qTranslate-Team/wp-seo-qtranslate-x.

But I discovered a problem, Yoast does not pass home_url through the filter 'wpseo_sitemap_entry', and no other filter, so it stays in default language only in sitemaps.

In fact, after looking a bit more, I now suspect that making separate sitemaps per language might be possible to do without modification Yoast code. I'll research a bit more. The question is, would that be ok for search engines? I do not see a reason why it would not be, but it would be good if you could research this issue to be sure?

@Lesrad
Copy link
Author

Lesrad commented Sep 19, 2015

This http://wplang.org/sitemaps-multiple-languages-wordpress/ suggests separate sitemap per Lang

@johnclause
Copy link
Member

How about this way? I committed new changes. Please, download again the latest qtx, https://github.com/qTranslate-Team/qtranslate-x, and the latest integration plugin, https://github.com/qTranslate-Team/wp-seo-qtranslate-x. No need to change Yoast code.

@johnclause
Copy link
Member

Yes, I got it, I will put some answers here, because other people may have input too:

if a post has another image per language shows default language image only

This is a real problem, unrelated to Yoast. Those duplicates of images used to come from different languages on a page, but I do not have information which image from which language to pick correct "title", "caption", and "alt". I currently simply take the first out of all with the same "src". If title, caption and alt are entered on "Edit Media" screen per image, then I can extract a proper language. This can be a big additional work for the admin if those entries are not filled on "Edit Media" page. It can probably be automated with a script. Or I have to completely reimplement image extraction code. Let us think a bit more about the best way.

Shows original language translation for default language in title/caption(alt) regardless of language

I am not sure what is meant here.

/de/sitemap_index.xml should preferably only show de links but shows all langs. Would be nice to filter/sort/ by language if possible or (don’t show other langs when in a language) then one could submit individual sitemaps to google per lang like /de/sitemap_index.xml / /da/sitemap_index.xml /es/sitemap_index.xml etc.

It does not fit in the way it is done now, but it maybe a good idea. I did not mean to use /xx/sitemap_index.xml other than for testing. The search robots will always go to /sitemap.xml, which gets redirected to /xx/sitemap_index.xml of the default language. I can probably do "/xx-sitemap_index.xml" to show one language sitemap, if it is important.

@johnclause
Copy link
Member

On image attributes: how about now? Update to the latest from https://github.com/qTranslate-Team/wp-seo-qtranslate-x. I translated $p-post_content in filter 'wpseo_xml_sitemap_post_url', and search of images later goes on already translated text which includes only one language.

Do you think it would be a good idea to fetch title, caption and "alt" from image properties entered on "Edit Media", if those attributes come empty from Yoast code?

@Lesrad
Copy link
Author

Lesrad commented Sep 23, 2015

I can confirm that:

  • The image attributes are working perfectly now.
  • The images each load validly per language per post (very nice) i.e. if a post contains other images per language only relevant image is shown if no image exists it shows blank correctly, which will stop any google webmaster tool sitemap errors.

Do you think it would be a good idea to fetch title, caption and "alt" from image properties entered on > "Edit Media", if those attributes come empty from Yoast code?

That sounds like a great idea.

Seems to be working pretty nicely now, only enhancement is to make xx/sitemap_index.xml or "/xx-sitemap_index.xml" filtered so only relevant language urls are shown, if that's even possible.

You should activate your sitemap plugin for the entire network of sites and create a sitemap for
each site (and language). One for your French version, one for your Spanish version
http://www.mydomain.com/fr/sitemap_index.xml
http://www.mydomain.com/es/sitemap_index.xml
http://www.mydomain.com/de/sitemap_index.xml
from: http://wplang.org/sitemaps-multiple-languages-wordpress/

As it stands now, it beats my sitemap :) and Google XML Sitemaps v3 for qTranslate, so good job.

@johnclause
Copy link
Member

On sitemap structure. Here is my understanding:

I think robots try name "sitemap.xml", it gets redirected to "sitemap_index.xml" with the default language active. That sitemap has all languages. Robots will never hit any "i18n-index-sitemap.xml", unless you submit those names to google manually, and if you do, then whatever you submitted is the only what will get hit. If you submit all "i18n-index-sitemap.xml" and do not submit "sitemap_index.xml", then you are fine.

I am not sure if this theory is correct.

I thought that robots do not browse the site if there is "sitemap.htm". If it is there, then they will only browse what is listed in sitemap.xml.

I never saw two-levels sitemap indices, and I am not sure if it is allowed. Probably yes. I can replace the main sitemap_index with an index pointing to all language index maps. Could you research to make sure that this would be a good idea?

@johnclause
Copy link
Member

I've updated to hierarchical sitemaps, just to test the idea, but I think we need to revert to the previous schema. It looks like sitemap indices are not designed to list other indices. I could not find an example of that on internet. Try the latest version just out of curiosity.

In fact, with the current setup you have now, you may create file sitemap.xml manually and list "i18n-index" urls for each language. Then sitemap_index will be hidden from robots, but will still work if typed in manually.

@Lesrad
Copy link
Author

Lesrad commented Sep 27, 2015

OK, Back to the sitemap.

I think the way its setup now with:

sitemap_index.xml showing a list of all i18n-index langs is correct, the structure is like "Best Sitemap Structure" image listed above, only thing is displayed like this only sitemap_index.xml should be visible not /(lang)/ sitemap_index.xml, but I could be wrong.

One issue I did see is that in my case:
/en/i18n-index-sitemap.xml should be /i18n-index-sitemap.xml
(extra redirect prob losses some pagerank)

Maybe if case (wp-admin/options-general.php?page=qtranslate-x#general) Hide URL language information for default language = TRUE
Then
link /(language code)/ should be blank.
Similar to: qTranslate-Team/qtranslate-x#166 but maybe that's not possible.

I will ask for advice regarding the hierarchy of sitemaps:
http://wplang.org/sitemaps-multiple-languages-wordpress/
and google webmaster help https://productforums.google.com/forum/#!topic/webmasters/mWinc2_pggU;context-place=forum/webmasters

@johnclause
Copy link
Member

I would not worry about redirections, robots do not use cookies, and for them if there is no /en/ it assumes the default language with no redirection, as you want it to be. It is only during human testing, it picks up the active language and does redirections sometimes to switch the language properly. I do not see an issue here.

What is wrong with the last setup, the link to "Sitemap Index" to parent index does not appear on index maps and is wrong on sitemaps. This link comes from style file "main-sitemap.xsl", as far as I understood. The style file is referenced in .xml file, you can find it in the source. Replacing with correct link on sitemap pages is probably doable with a hack-like code. However, I do not understand why this link does not appear on the language index pages. Following the logic, it should appear, but it does not, which makes me think that index pages cannot point to index pages again.

I feel more comfortable with the old sitemap_index with all languages listed in one index, unless we figure out the proper theory on this. I could not find an example of two-level sitemap indices so far. I am afraid that there is a reason for that.

As I mentioned before, you may still submit i18n-index* urls to google manually and that would be fine as well. Also if we put flat sitemap_index.xml with all languages by default, you may still create sitemap.xml actual file in place with index of indices manually and that will hide sitemap_index.xml from the robots.

Let us see what you can find out from people. If indices of indices are allowed, then we need to figure out the proper syntax for that. Please, try to prepare then files manually, which work correctly with "Sitemap Index" link to parent index, and I will try to do the same in PHP code.

@johnclause
Copy link
Member

The problem with title & meta is really strange. I have no idea how that can happen. Could you try deactivation plugin (then line "plugins/wp-seo-qtranslate-x/i18n-config.json" will disappear from option "Configuration Files"), removing all the files from the file system, then putting all files freshly downloaded from the GitHub again, make sure folder name is "wp-seo-qtranslate-x", activate again (line "plugins/wp-seo-qtranslate-x/i18n-config.json" should be back) and see if it makes difference.

Please, remove additional folders and copies you created, this does not help to figure out the problem and can be very confusing.

After that fresh install, send me please the configuration shown in "Configuration Inspector": /wp-admin/options-general.php?page=qtranslate-x&config_inspector=show.

@johnclause
Copy link
Member

I did see a solid answer that any structure is acceptable on manual submission of maps, but I did not see an answer to two-level indices files read by robots.

@johnclause
Copy link
Member

Could you please figure out the syntax of two-level indices files and send me how you would wish them to be for your site, for example. There is no problem with manual submission of maps to google either way, exactly as Irena says, it is already working in both setups. But I do not understand how to list them properly in the top level sitemap.xml.

@johnclause
Copy link
Member

I'll read more too ...

@Lesrad
Copy link
Author

Lesrad commented Sep 28, 2015

John, I think what you had before is correct, top level index should be the i8n-sitemap-index per language, manually submitted and linked to. Exactly like the image below (note the top right balloon in the image should have /de/).
alt-sitemap
If you think that a flat main index is best, then leave it there, I will block that file on my own site but up to you.

I am trying to follow what exactly you mean by two-level indices files and what the problem is but I don't see an issue, think we should just leave it as is.

I feel more comfortable with the old sitemap_index with all languages listed in one index, unless we figure out the proper theory on this. I could not find an example of two-level sitemap indices so far. I am afraid that there is a reason for that.

As I mentioned before, you may still submit i18n-index* urls to google manually and that would be fine as well. Also if we put flat sitemap_index.xml with all languages by default, you may still create sitemap.xml actual file in place with index of indices manually and that will hide sitemap_index.xml from the robots.

Agreed, that's what needs to be done, will email you with qtranslate-x issues: Configuration Inspector etc, rather than adding them here.

@Lesrad
Copy link
Author

Lesrad commented Sep 28, 2015

This is what I will likely do with my sitemap setup:

In .htaccess (only because all search engines already use sitemap_index.xml URL for my site)
RedirectMatch 301 sitemap_index.xml http://site.com/i18n-index-sitemap.xml

Link in footer.php
<a href="http://site.com/i18n-index-sitemap.xml">Sitemap</a>

Submit individual language sitemap urls to google webmaster tools, apply localization where possible:
http://site.com/i18n-index-sitemap.xml English
http://site.com/da/i18n-index-sitemap.xml Dansk
http://site.com/de/i18n-index-sitemap.xml Deutsch
http://site.com/es/i18n-index-sitemap.xml Espanol

Optional add to end of robot.txt file
Sitemap: http://site.com/i18n-index-sitemap.xml
Sitemap: http://site.com/da/i18n-index-sitemap.xml
Sitemap: http://site.com/de/i18n-index-sitemap.xml
Sitemap: http://site.com/es/i18n-index-sitemap.xml

@johnclause
Copy link
Member

Yes, that is ok, but we also need a default solution for everybody with doing no additional setup.

We need to figure out the syntax for two-level indices for sitemap. If you look in the source of all sitemaps you will see that "index.xml" has xml entity "", which lists "" items. This type of files I call "index map". Other files, called "sitemap", have xml entity "" with items "". In my theory item can only point to a "sitemap" and cannot point to "index map", or there is another syntax to point an item to index map from other parent index map. I've never seen an example of "" entry pointed to an index file within "". This is what I would like to clarify. May be it is ok to point "" entry to another index map instead of sitemap, but then why style file "main-sitemap.xsl" works differently in sitemap and in index map. In sitemap file it produces link to parent index, in red, "Sitemap Index", but in index map, this link does not appear, but we should have it. Maybe, all we need to do is to figure out different syntax for style file referred within an index map file.

Your solution with .htaccess file will work for you in any case, but we also need to worry about people who did not dig it that deeply and will not know that they need to do additional configuration. Manual submitting of maps to google is fine, but there are many other search engines, which might not be as important, but still one cannot manually submit all sitemaps to all search engines. We need to make sure that any search engine robot will figure out all appropriate sitemaps automatically, starting from the standard "/sitemap.xml".

I hope this clarifies what I wrote previously too briefly.

@johnclause
Copy link
Member

Yes, I've read http://www.sitemaps.org/protocol.html, there is no mentioning that index map can be listed under tag inside other index. They either assume that it is self-obviously allowed or it means that it is disallowed. Not sure how to find out for sure.

@johnclause
Copy link
Member

Yeah, Leslie, you are so good with finding the stuff 👍 That is probably the answer. Have you tried to validate your top sitemap_index.xml, which lists other index maps on google sitemap validator?

@johnclause
Copy link
Member

Thanks. No I was not joking, I did not hit that one when I tried to search, although I was not trying hard enough. You got lucky to put better string to search then, so it is indeed good.

@Lesrad
Copy link
Author

Lesrad commented Sep 30, 2015

I think it seems like commit 1545425 (hierarchical sitemaps) should be removed to prevent Nested Sitemap indexes errors.

Then users will be able to setup sitemaps one of 2 ways:

1) Simple - Using the default sitemap_index.xml
Advantages:
  • Easy to setup nothing else to do all, same setup as yoast sitemap.
Disadvantages:
  • No indivdual localization control from google webmaster tool
  • May or may not have issues with sitemap duplication by language /de/ /es/ sitemap_index.xml if links in footer exist to /??/sitemap_index.xml and are crawled. (needs live testing)

OR

2) Advanced - Submiting each language's i18n-index-sitemap.xml ( i.e. not using sitemap_index.xml)
Advantages:
  • Indivdual sitemap per language.
  • Indivdual localization control from google webmaster tool.
Disadvantages:
  • Bit harder to setup:
    • add a redirect in .htaccess /sitemap_index.xml to /i18n-index-sitemap.xml
    • (optional)more urls need to be added to sitemaps: robots.txt and more urls to submit to google/bing.
For both types optional:
  • Add sitemap/s to robots.txt; submit to google/bing; add link in footer

@johnclause
Copy link
Member

I agree. The only thing is that we do not need to add a redirect in .htaccess /sitemap_index.xml to /i18n-index-sitemap.xml, if we manually submitted i18n-index-sitemap.xml to a search engine.

Flat sitemap_index.xml with all languages does not hurt to have in any case (it is virtual anyway, not an actual file on file system, as you probably notice). In your case, since it is already submitted, search engines will continue to work correctly picking up all languages through this file, until you change it in search engine console to make it even better with separate maps.

Search engines, which you did not configure manually, will still find all languages through default sitemap.xml, which is also virtual and gets redirected to flat sitemap_index.xml with all languages.

People should always be advised to modify robots.txt with all i18n-index-sitemap.xml. Then robots, which pay attention to this configuration, will also work correctly.

So, I do not see a reason to bother with additional redirection.

I will revert to flat sitemap_index shortly.

@johnclause
Copy link
Member

I checked in hopefully the final version in terms of sitemaps. Please, review the notes on page "/wp-admin/admin.php?page=wpseo_xml". I hope all human links in sitemaps also tuned correctly, depending on referrer page - try to break it by clicking all the links.

Please, also test all other features that you use as well. Thanks a lot for all your help.

@johnclause
Copy link
Member

If you kept your browser open all this time, please use Ctrl+F5 to refresh cache.

@johnclause
Copy link
Member

Discovered some browser caching problems with xls files, just updated again, please download it again if you have already done it.

@Lesrad
Copy link
Author

Lesrad commented Oct 1, 2015

Seems like its all working fine, good job, and thanks for adding it, guess we can only be sure once it's live, I assume the next biggie everyone will want is working Page Analysis :)

FYI:

  • Real time page analysis is coming to Yoast SEO for WordPress on November 2nd.
  • wpseo_pre_analysis_post_content and a few other filters are going away, if you use that filter, update your code.
  • We will have a working beta on September 9th that you can test your code against. We might have it earlier, follow @YoastDev on Twitter for updates. Feel free to tweet your questions there too.

https://yoast.com/dev-blog/yoast-seo-breaking-api-changes/
https://github.com/Yoast/YoastSEO.js

@johnclause
Copy link
Member

Yes, page analysis, I am afraid we will need Yoast cooperation for that ...

Did you have a chance to test all other stuff?
On Oct 1, 2015 3:10 AM, "Lesrad" [email protected] wrote:

Seems like its all working fine, good job, and thanks for adding it, guess
we can only be sure once it's live, I assume the next biggie everyone will
want is working Page Analysis :)


Reply to this email directly or view it on GitHub
#1 (comment)
.

@Lesrad
Copy link
Author

Lesrad commented Oct 2, 2015

Just had a major issue when putting the latest github versions to my live site, reverted back to wp current version in the mean time, until I figure out whats wrong
my
Basically any language except my base language would link from index and category pages saying the post is only available in default language, which wasn't true cause if I clicked the title the article exists.

My test server has the same issue.

Index.php

<div class="post" id="post-<?php the_ID(); ?>">
<div class="title">
<?php if( is_sticky() ) : ?>
<h1><?php if (function_exists('get_cat_icon')) get_cat_icon(); ?><a href="<?php the_permalink() ?>" rel="bookmark" title="<?php the_title(); ?>"><strong><?php the_title(); ?></strong></a></h1>
<?php else : ?>
<h2><?php if (function_exists('get_cat_icon')) get_cat_icon(); ?><a href="<?php the_permalink() ?>" rel="bookmark" title="<?php the_title(); ?>"><strong><?php the_title(); ?></strong></a></h2>
<?php endif; ?>
<div class="date"><span><script language="javascript" type="text/javascript">document.write('<?php the_time('jS F Y ') ?>');</script></span><?php the_category(', '); ?></div>  
</div>

<div class="cover">
<div class="entry">
                    <?php the_content('<!--:en-->Read More...<!--:--><!--:da-->Laes mere...<!--:--><!--:de-->Lesen Sie mehr...<!--:--><!--:es-->Leer mas ...<!--:--><!--:no-->Les mer ...<!--:--><!--:pt-->Leia Mais ...<!--:--><!--:sv-->Las mer ...<!--:--><!--:fr-->Lire la suite ...<!--:--><!--:nl-->Lees Meer ... <!--:--><!--:it-->Leggi Tutto ...<!--:-->'); ?>


</div>

</div>

Generates this:

<div class="post" id="post-42815">
<div class="title">
<h1><a href="http://127.0.0.1:4001/wordpress/no/titanpoker-ipops-micro-2015/" rel="bookmark" title="TitanPoker iPOPS Micro Turneringskalender"><strong>TitanPoker iPOPS Micro Turneringskalender</strong></a></h1>
<div class="date"><span><script language="javascript" type="text/javascript">document.write('19th februar 2015 ');</script></span><a href="http://127.0.0.1:4001/wordpress/no/category/poker-tournaments/" rel="category tag">Pokerturneringer</a></div>    
</div>

<div class="cover">
<div class="entry">
                    <p class="qtranxs-available-languages-message qtranxs-available-languages-message-no">Beklager, denne oppføringen bare tilgjengelig i <a href="http://127.0.0.1:4001/wordpress/en/" class="qtranxs-available-language-link qtranxs-available-language-link-en">English</a>.</p>                    

</div>

</div>

Seems like the_content isn't working and adding class="qtranxs-available-languages-message qtranxs-available-languages-message-no"

In default language, it generates correctly eg:

<div class="cover">
<div class="entry">
                    <p><a href="http://127.0.0.1:4001/wordpress/888-poker-awards-season-tournament"><img class="alignleft size-full wp-image-42772" style="margin: 5px;" src="http://127.0.0.1:4001/wordpress/wp-content/plugins/speed-booster-pack/inc/images/1x1.trans.gif" data-lazy-src="http://127.0.0.1:4001/wordpress/wp-content/uploads/888-poker-awards-season.jpg" alt="888 poker awards season" width="160" height="160" /><noscript><img class="alignleft size-full wp-image-42772" style="margin: 5px;" src="http://127.0.0.1:4001/wordpress/wp-content/uploads/888-poker-awards-season.jpg" alt="888 poker awards season" width="160" height="160" /></noscript></a>A special 888 Poker Awards Season Tournament will run on 22nd February offering $2,500 in cash prizes and $5,000 in electronic gadgets.</p>
<p>To get a seat into the tournament players can buy-in directly for $5 or make a deposit for a free tournament token.</p>
<p>Electronic prizes are paid out to players who finish in specific positions or eliminate the most players from the tournament. These prizes include:  LG TV, GoPro Hero 4 camera, Sony PlayStation 4, popcorn machine and a poker strategy book.</p>
<p style="text-align: right;"><a title="888 poker bonus" href="http://127.0.0.1:4001/wordpress/888-poker-bonus-code/">888Poker Bonus</a><br />
<a title="888poker app" href="http://127.0.0.1:4001/wordpress/888poker-android-app-for-mobile-phones/">888 Poker App</a></p>
<p> <a href="http://127.0.0.1:4001/wordpress/888-poker-awards-season-tournament/#more-42769" class="more-link">888 Poker Awards Season Tournament</a></p>                   

</div>

</div>

qTranslate-Team/qtranslate-x@fa98566 is the reason, if I delete

        if($post->filter == 'raw') continue;//@since 3.4.5

in qtranslate_frontend.php then the site works normally again, maybe related issue qTranslate-Team/qtranslate-x#271

@floopyzicer
Copy link

For me latest version sitemap works just fine for all languages, i still did not add anything in robots.txt file i am testing on a live site.

@johnclause
Copy link
Member

@Lesrad: yeah, this is what I meant by "test all other stuff". Yes that line appears to be a big trouble, it breaks "more" tags severely and unrecoverably. I think the filter "qtranxf_postsFilter" was created to fix "more" tag problem and I thought that it would be called with filter='display' from the_content, but for some reason WP calls it in 'raw'.

That line was added for one of the first version of sitemaps, but then it was done in a different way as far as I can now tell. When this line is commented out, do all sitemap continue to work correctly including proper image attributes translation? If so, I will comment it out permanently. It also broke "get_the_excerpt", and it may show more problems later.

Could you please re-test all without that line, and if all is ok, I will undo that line.

@floopyzicer
Copy link

lesrad that person you reffering to is me :) i don't know the rules for sharing the site here so i just sent message to qtranslatexteam email (i guess that's John and Gunu :) ) it works and i think it is nearly perfect or even perfect hehe all post_sitemaps page_sitemaps category_sitemaps tag_sitemaps are shown the proper way each for it's own language my favorite tags are shown in all 3 languages :) But i did not notice any error so i guess i will need to check if i have one now.
Also q-translate slug works fine for me together with all other languages(so each language slug is used in a sitemap.

@johnclause
Copy link
Member

I guess, we are ready to close this issue, we can still write into a closed issue, or we can re-open it, if needed. If new problems are discovered, I would suggest to open a new issue, since this one is already exceptionally long. We can cite it from new issue if needed.

Thank you very much to all of you for invaluable help.

@floopyzicer
Copy link

On some pages i noticed q-translate slugs on other languages are not read by yoast give the message :
PAGE URL : No

Only first language version says yes but i don't think it is a problem, just letting you know.

@Lesrad
Copy link
Author

Lesrad commented Oct 13, 2015

Sorry for the delayed response, got busy with some personal stuff.

Here are some screenshots that may help others, John feel free to reuse them if you wish.

Submit Sitemap to Google

https://www.google.com/webmasters/tools/home?hl=en

g1

  • Create a new Property ("Site") for each language from Google search console home
  • Submit only the sitemap for that language
    • Eg. add property site.com then add property site.com/fr/ (in this example 2 properties)
    • Submit sitemap site.com/fr/i18n-index-sitemap.xml from site.com/fr/ property then submit site.com/i18n-index-sitemap.xml under site.com property you will also see the fr sitemap listed as in the image above.
  • Once added you will see site.com/fr/ sitemap from the base url i.e. from site.com

g2

  • You will have stats/tools per language

g3

  • You can target a country per language directly.

g4

  • Overview of submitted indexed sitemaps from base url

Submit Sitemap to Bing

https://www.bing.com/webmaster/configure/sitemaps/home
b1

  • Add sitemap urls to Bing per language.

b2

  • You can set language/region settings as desired.

Robots (Add/Edit to sites root directory: robots.txt)

r

  • Example of robots.txt, helps other search engines find/crawl your sitemaps.

Footer (wp-theme: footer.php)

s2

  • Example footer code

s

  • Output of footer code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants