Try 30 days of free premium.

Improving hit-rate on singlesearches

newznab wrote 8 years ago: 1

Firstly thanks for the work on the site and api.

We knocked up a simple test on the api to check out the singlesearch matches based on the file names we are dealing with. Results were as follows...

http://pastebin.com/raw.php?i=v7BraSJP

A 58% match rate based on a smallish sample.

Can you please give some thoughts on improving matching, whether its tweaks we can do to naming on our side, or is it down to the fact the unmatched shows are yet to make it into your database, or are there improvements that could be made to the api/search on your end.

Thanks in advance


david wrote 8 years ago: 1

Thank you!

I checked out the entries on your list. Most of the non-matches indeed simply don't exist in our database yet. I'd say we have very good coverage of currently running shows, but there are still gaps in older shows. (All shows are added to TVmaze manually to ensure validity and consistency. We're rapidly adding extra tools to help us with the increased work- and request load though).

There's one scenario your end can improve on though. Searching for "Big Brother AU" or "Doctor Who 2005" won't be very succesful because we store the actual shownames without any prefixes or suffixes. See this recent post for more details: http://www.tvmaze.com/threads/314/shows-using-same...

I'm not sure what your source for these show names is, but you can probably use some simple regexes to detect cases where your name contains a suffix besides the actual show name. For example for Doctor Who, you could query http://api.tvmaze.com/search/shows?q=doctor%20who, determine that your suffix is a year, and choose the entry that premiered in 2005 rather than the one from 1963.

hecks wrote 8 years ago: 1

BTW, here's a corner case that could be instructive: single-searching "Heroes" (the original 2006 series) returns "Heroes Reborn" (the 2015 reboot):

http://api.tvmaze.com/singlesearch/shows?q=heroes

So how might a single search return the 2006 series instead? TVRage had an &exact=1 parameter for such cases to force matching the exact search string and no more. Otherwise, I would expect that only single-searching "heroes reborn" should return the reboot.

spockers wrote 8 years ago: 1

So, this thread isn't about dating?

Seriously though, I have nothing to add, but that's what I thought of when I saw the title.

Carry on. :D

evol1 wrote 8 years ago: 1

david wrote:
Thank you!

Along these lines, it would be nice to exclude and include certain things in general when using the single search . For example:

1. Whens searching for Castle , I get Dani's Castle instead. I am sure that the majority of people searching for Castle are not looking for Dani's Castle.

- Castle is a running show and more popular, Dani's Castle is also on hiatus and in the UK

- I believe Dani's Castle is showing up only because it is a newer show

2. When searching for Top Gear I get the US version instead of the UK version.

- I am in the US but I would like the UK version to show which is more popular

For both situations it would be great to have the ability to include or exclude the country (US vs UK), show status (running), etc, really any of the categories that show up in the api. And a sort order by popularity might help too instead of newest to oldest, i think this is the default.

evol1 wrote 8 years ago: 1

hecks wrote:
BTW, here's a corner case that could be instructive: single-searching "Heroes" (the original 2006 series) returns "Heroes Reborn" (the 2015 reboot):
http://api.tvmaze.com/singlesearch/shows?q=heroes
So how might a single search return the 2006 series instead? TVRage had an &exact=1 parameter for such cases to force matching the exact search string and no more. Otherwise, I would expect that only single-searching "heroes reborn" should return the reboot.

Yes Please, &exact=1 would help out so much. Then it wouldn't find everything that has the name of the show in it, but only the show with that name.

Im getting this with castle now, and i remember i ran into this with House verses Desperate Housewives long ago on tvrage. It was pretty annoying when I kept seeing stuff pop up for Desperate Housewives.


david wrote 8 years ago: 1

The bug is now fixed, e.g. searching for Heroes or Castle will return the exact match as top result again. If you spot any more issues in that area, please report them here: http://www.tvmaze.com/threads/221/search-function-...

Feel free to continue the discussion here regarding the behavior if multiple shows with the same name exist. But for the time being we'll continue to strongly recommend using search instead of singlesearch if you want to be able to distinguish different shows with the same name.

srob650 wrote 8 years ago: 1

David, I'm wondering about the Empire example as well. Is your general policy to score newer shows higher? If so, the Empire search is broken.

srob650 wrote 8 years ago: 1

FYI I have gone ahead and written logic for cases like this into my Python API.


david wrote 8 years ago: 1

srob650 wrote:
David, I'm wondering about the Empire example as well. Is your general policy to score newer shows higher? If so, the Empire search is broken.

No, "newer" has never been a factor.

Our intention is for more popular (based on recent views; follows; etc) shows to come out on top, if there are multiple matches. This is still a work in progress though.

srob650 wrote 8 years ago: 1

Understood. I'm surprised that the 1962 show comes out on top based on that but I totally get that it's a work in progress :)


david wrote 8 years ago: 1

Yeah, it's proven a little tricky to balance "how accurate is this match" and "how popular is this show".

Try 30 days of free premium.