Google, Yahoo! and MSN search engines treat HTML character entities in different ways
It has been discovered during trying to validate some XML documents.
For example, encoding for “»” is & r a q u o ; During research on HTML entities, it has been discovered that search engines have different query input convert.
Google search engine converts HTML encoded characters into character it represents before query is processed. Let’s take Õ as an example – this character is encoded as & # 2 1 3 ;
Searching for & # 2 1 3 ; we receive for both “Õ” and “O” from Google. It means that Google engine knows that “O” and “Õ” are probably the same – “Õ” is a variation of “O”. The same search in Yahoo! returns search results for “213”. We can conclude that Yahoo! cuts characters before processing search query, or it doesn’t represents character encoding at all. MSN does just the same to Yahoo! Try searching for “A” and “& # 6 5 ;” for better understanding of the question.
Moreover, all search engines return different error codes. Searching for “& # 3 6 ;” (which represents “$”) we’ll receive no results in Google. There are neither results, nor error messages returned. The same situation is if you search for “$”. Since Yahoo! and MSN are cutting language meaningless characters off, the query “& # 3 6 ;” returns a search for “36”. Searching for “$” you will receive an error message that will tell that there are no search results for “$”.
According to the research made, we can make a conclusion that Google is the only search engine from “The big trio” that understands (or allows) searches for specific HTML characters and their language meaningless character representations.
Filed under: Optimization News