Try 30 days of free premium.

Search function improvements


david wrote 8 years ago: 1

There are a few shows (mainly those with dots or dashes in their name) that can't be found by the search function in some of their common forms.

I'm gathering all these cases here so that we can fix them all at once when I get around to it. Feel free to reply with any additional examples!

- Search for "The Man from UNCLE" to find "The Man From U.N.C.L.E."
- Search for "NTSF" to find "NTSF:SD:SUV"
- Search for "Marvels Agents of S H I E L D" to find "Marvels Agents of S.H.I.E.L.D"


david wrote 8 years ago: 1

Scoring (sorting of the results):

- The best score for "The Amazing Race" is not "The Amazing Race" itself, but "The Amazing Race Canada"
- The best score for "The Walking Dead" is not "The Walking Dead" itself, but "Fear of the Walking Dead"


david wrote 8 years ago: 1

All of these should be fixed now. Please report if you see any more problems with the search functionality.

rastamanx wrote 8 years ago: 1

I still have kind of an issue with "The 100".

The singlesearch function returns "100 Questions". My guess is "the" is escaped for the search, which would force us to use the search function, and iterate through the results to find an exact match. I'm not sure that's ideal.

Also, adding "(2014)" to the search makes it return nothing at all.

So, here are my 3 suggestions :

- Do not escape "the" while using singlesearch

- Try to take into consideration the year specification (either match the year regexp and use it as a search parameter, or add "alternate name" entries to the shows needing it, and search within alternate names to try and find an exact match)

- When no result is found, at least output an error message instead of a blank page

Tonks wrote 8 years ago: 1

When you search for Hawaii Five-0 (zero, it should be a zero, not sure if it is here, it's the difference with the old show) , some may search for hawaii 5-0 or hawaï 5-0 or any of these spelling hawaï five-o (letter). Depending on their countries, i think.

I do not know if the search engine integrates aliases or aka (for foreign titles), but this one instance does not give any result, not even just hawaii if you enter hawaii 5-0 for example. Or at least it didn't when i tried. Would you mind adding those ? Thank you.

Tonks wrote 8 years ago: 1

Jaguar, it depends on the show we are talking about.

The old show with Jack Lord used the Letter O, it would be incorrect if it was the number zero. but the new show uses the number Zero. I still remember the notice sent by CBS when they launched the show 6 years ago because that was a funny little note.

I tried both iterrations : zero and the letter o in searching, i get results for both shows. I also tried with 5-0 (zero) 5-o (letter) and it also worked which it didn't when i did yesterday.

I guess this is resolved ;) Thanks.


JuanArango wrote 8 years ago: 1

Tonks wrote:
When you search for Hawaii Five-0 (zero, it should be a zero, not sure if it is here, it's the difference with the old show) , some may search for hawaii 5-0 or hawaï 5-0 or any of these spelling hawaï five-o (letter). Depending on their countries, i think.
I do not know if the search engine integrates aliases or aka (for foreign titles), but this one instance does not give any result, not even just hawaii if you enter hawaii 5-0 for example. Or at least it didn't when i tried. Would you mind adding those ? Thank you.

If an aka is added, the search should find it, I will add those akas to Hawaii 5-0 :)

EDIT:just saw jaguar did it already, thx :)

cheers
Juan


JuanArango wrote 8 years ago: 1

rastamanx wrote:
I still have kind of an issue with "The 100".
The singlesearch function returns "100 Questions". My guess is "the" is escaped for the search, which would force us to use the search function, and iterate through the results to find an exact match. I'm not sure that's ideal.
Also, adding "(2014)" to the search makes it return nothing at all.
So, here are my 3 suggestions :
- Do not escape "the" while using singlesearch
- Try to take into consideration the year specification (either match the year regexp and use it as a search parameter, or add "alternate name" entries to the shows needing it, and search within alternate names to try and find an exact match)
- When no result is found, at least output an error message instead of a blank page

As far as I know david is optimizing the search function, but he might be able to tell you more about it later today. What we do not do is write a year along with the title, we just go with the show name :)

cheers
Juan


david wrote 8 years ago: 1

rastamanx wrote:
I still have kind of an issue with "The 100".
The singlesearch function returns "100 Questions".

Damn, exact matches should always come out on top, but for some reason it sometimes just doesn't work. I'll fix this.

Though in general I can't stress enough that if at all possible, the search API should be used instead of singlesearch, as singlesearch has no way to give you the proper result in case there are multiple shows with the same name.

Gadfly wrote 8 years ago: 1

One thing might be to list matches in order of date.

I do a search on The Flash and get the 1990 version. Then the 2007 Flash Gordon. And then, third, is the current CW series.


david wrote 8 years ago: 1

Thanks for noting "The Flash". I'll make sure that one's fixed too, as in, "The Flash" should definitely not rank below "Flash Gordon".

Ranking on date isn't the answer either, because a large and popular show can then be overtaken by a crappy Australian reality show with the same name.

We're trying to rank based on popularity (which is a combination of visitors and follows), but this is pretty hard as well. It's a thin line between giving preference to good name-based matches, and giving preference to popularity.

vBm wrote 8 years ago: 1

Another example of it is when you search for "The Player" and as first result you get "Player Gets Played" and show we searched for is second on the list when using /search/shows. It seems currently you're sorting via "score". First show has 5.0954947 while the one i was looking for had 4.417797.

On the other hand when searching via /singlesearch/shows you'll get "Player Gets Played". Would be idea if you'd be able to make it work as you said, that exact matches are shown when it's possible.

Rustak wrote 8 years ago: 1

Other problem with search - it doesn't show every show with a word in it.

Good example of that: if you search "Gundam" it shows some of the Gundam shows, but not Mobile Suit Gundam Wing and some others.


david wrote 8 years ago: 1

vBm wrote:
Another example of it is when you search for "The Player" and as first result you get "Player Gets Played" and show we searched for is second on the list when using /search/shows.

Thanks for the example, will fix that one too.

Exact matches coming out on top is definitely the intention. It's already coded like that, but for reasons unknown to me yet elasticsearch sometimes refuses to do so (of course barely reproducable).

Gadfly wrote 8 years ago: 1

Searching for the show Zoo yields some odd results, too. You got Our Zoo, then The Zoo Gang, and then finally the just-renewed CBS series Zoo.

I can see why the other two come up, and not sure how you would prioritize the latter to come first. But presumably that's what you would want.

rastamanx wrote 8 years ago: 1

david wrote:
Thanks for the example, will fix that one too.
Exact matches coming out on top is definitely the intention. It's already coded like that, but for reasons unknown to me yet elasticsearch sometimes refuses to do so (of course barely reproducable).

I'm pretty sure it's as I said in my original post : ignoring "the" in the search query.

These might be relevant (or not) :

https://www.elastic.co/guide/en/elasticsearch/refe...
https://www.elastic.co/guide/en/elasticsearch/refe..

Sadly, it's another story about "Zoo" not coming up first.


david wrote 8 years ago: 1

This is now fixed. (The root cause was a lot of foreign AKA's being added to shows recently; skewing Elasticsearch' tf-idf scoring). I confirmed all examples in this thread now work correctly. To summarize:

Exact matches should always come out on top. Please report it if you spot a scenario where this doesn't happen.

If there are multiple exact matches for a show name, their relative ordering is mostly undefined. We're hoping to improve on this in the future (having more popular shows come out on top), there's no need to report this.


david wrote 8 years ago: 1

JAGUARDOG wrote:
Thanks for all your hard work David now you can finally go to bed? I do have a question however does the search function key off of other letters when it brings up results as well as looking for exact matches? Like "Zoo" as a good example because after the first 3 results that all have Zoo in them you then get all kind of crazy responses and the only thing I see in common with them is they have double letter O's in them like Too, Goo, Doo, Scooby-Doo and then you have this weird one "Fashionably Late with Rachel Zoe" that doesn't have the double letter O in them?

Thanks! Yep, that's the intended behavior. The search function returns similar results as well as exact results, to be able to deal with typos. For example, to be able to give you a result here: http://www.tvmaze.com/search?q=rachel+zoo. If you search for a show name that's a very common word (house, zoo) this means you'll also get some unrelated results; but the exact match should always come out on top.

vBm wrote 8 years ago: 1

david wrote:
This is now fixed.

Thank you very much.


david wrote 8 years ago: 1

Ha, nice find. Noted for the next time I have to work on the search function :)

Try 30 days of free premium.