PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Autres forums > Forum Programmation & Conception > mysql.general > Fast relevance sorting of full text search results
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
Fast relevance sorting of full text search results

Réponse
 
LinkBack Outils de la discussion
Vieux 02/01/2008, 00h09   #1
Urms
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Fast relevance sorting of full text search results


I'm using pretty standard approach to sorting search results by relevancy:

SELECT DISTINCT product_name,
MATCH (keywords) AGAINST ('CONSOLIDATED* 16* bearing*' IN BOOLEAN MODE) AS
rate
FROM _TT
WHERE MATCH ( keywords ) AGAINST ('CONSOLIDATED* 16* bearing*' IN BOOLEAN
MODE ) >0
ORDER BY rate DESC

It works fine as long as the quantity of results is not big. Once the
quantity is about 50,000 and more (I have a very big database) the query
starts working way too slow. Total number of records is about 4 million. It
takes about 2 sec when there are 50,000 records in the result but at the
same time it takes only about 0.006 sec without ORDER BY clause.

I understand that ORDER BY is time consuming but maybe someone knows a
different way to have sorting by relevancy.

Thanks in advance!
--
View this message in context: http://www.nabble.com/Fast-relevance...p14571054.html
Sent from the MySQL - General mailing list archive at Nabble.com.

  Réponse avec citation
Vieux 06/01/2008, 15h57   #2
Shawn Green
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Fast relevance sorting of full text search results

Urms wrote:
> I'm using pretty standard approach to sorting search results by relevancy:
>
> SELECT DISTINCT product_name,
> MATCH (keywords) AGAINST ('CONSOLIDATED* 16* bearing*' IN BOOLEAN MODE) AS
> rate
> FROM _TT
> WHERE MATCH ( keywords ) AGAINST ('CONSOLIDATED* 16* bearing*' IN BOOLEAN
> MODE ) >0
> ORDER BY rate DESC
>
> It works fine as long as the quantity of results is not big. Once the
> quantity is about 50,000 and more (I have a very big database) the query
> starts working way too slow. Total number of records is about 4 million. It
> takes about 2 sec when there are 50,000 records in the result but at the
> same time it takes only about 0.006 sec without ORDER BY clause.
>
> I understand that ORDER BY is time consuming but maybe someone knows a
> different way to have sorting by relevancy.
>
> Thanks in advance!


I think it's your SELECT DISTINCT that is slowing you down. For each new
row being considered for inclusion to your result set, you are asking
the engine to compare that row against all other rows you already have
in the set. So what's happening is that you are doing a longer and
longer linear search the larger your datasets become.

One option is to cache your results in a temporary table then
de-duplicate your results from there.

Another option is to create a temporary table with a UNIQUE key on the
columns you want to remain unique and use an INSERT IGNORE. Because of
the UNIQUE key (or PRIMARY KEY if that's your choice) you will be doing
an indexed search of all values rather than a linear search through the
entire list. This would look something like:

CREATE TEMPORARY TABLE tmpFT_results (
product_name <insert datatype here>
, rate <insert datatype here>
, primary key (product_name)
) ENGINE = MEMORY;

INSERT IGNORE tmpFT_results
SELECT product_name,
MATCH (keywords) AGAINST ('CONSOLIDATED* 16* bearing*' IN BOOLEAN MODE) AS
rate
FROM _TT
WHERE MATCH ( keywords ) AGAINST ('CONSOLIDATED* 16* bearing*' IN BOOLEAN
MODE ) >0;

SELECT product_name, rate
FROM tmpFT_results
ORDER BY rate DESC;

Sure it's three separate steps but it's tuned to the process you are
trying to perform. The SELECT DISTINCT processing has no idea that you
only need to keep the values of product_name distinct as we would hope
the `rate` component may be duplicated.

If there is the possibility of different `rate` results for the same
product_name value then you may also want to use the temporary table
method to somehow weight (sum or average comes to mind) the match values
across all responses before returning the results.

I hope these ideas your performance and search accuracy.

Best wishes,
--
Shawn Green, Support Engineer
MySQL Inc., USA, www.mysql.com
Office: Blountville, TN
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ /
/ /|_/ / // /\ \/ /_/ / /__
/_/ /_/\_, /___/\___\_\___/
<___/
Join the Quality Contribution Program Today!
http://dev.mysql.com/qualitycontribution.html

  Réponse avec citation
Vieux 09/01/2008, 09h24   #3
Sebastian Mendel
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Fast relevance sorting of full text search results

Urms schrieb:
> I'm using pretty standard approach to sorting search results by relevancy:
>
> SELECT DISTINCT product_name,
> MATCH (keywords) AGAINST ('CONSOLIDATED* 16* bearing*' IN BOOLEAN MODE) AS
> rate
> FROM _TT
> WHERE MATCH ( keywords ) AGAINST ('CONSOLIDATED* 16* bearing*' IN BOOLEAN
> MODE ) >0
> ORDER BY rate DESC
>
> It works fine as long as the quantity of results is not big. Once the
> quantity is about 50,000 and more (I have a very big database) the query
> starts working way too slow. Total number of records is about 4 million. It
> takes about 2 sec when there are 50,000 records in the result but at the
> same time it takes only about 0.006 sec without ORDER BY clause.


you should reformat your query or table structure

for a quick solution:

probably with 50.000 records it exceeds your myisam_sort_buffer_size or
sort_buffer_size, try to raise them

--
Sebastian

  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 07h19.


Édité par : vBulletin® version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières ©2000-2008
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,14875 seconds with 11 queries