~ Essays ~
|
|
|
|
essays |
(Courtesy of fravia's searchlore.org)
(¯`·.¸(¯`·.¸ Using Fuzzy Logic ¸.·´¯)¸.·´¯)
by Shally Steckerl
slightly edited by fravia+, published @ searchlores
in March 2002
Shally Steckerl is a 'head-hunter', yet -this notwithstanding- a very nice person,
who's "on the seeking trade" since many years
(by all means read
his well-known essay What
We Can Learn from Internet Email Headers). He emailed me recently:
"...I wrote a few more articles and I thought
this would be a good time to send you the best ones. The Fuzzy Logic
article is of particular
interest as it refers to using proximity search terms.
Please let me know, if you are still interested and still remember,
what it was you wanted me to do to publish these articles on your site". Well,
indeed I would love to receive a thorough essay
explaining his very sound 'pendulum' approach (see below), and
I still hope he'll send it over! In the meantime enjoy the many hints and try out
-you will be surprised- his 'AOL' searching
tips!
Using Fuzzy Logic
Everyone is right about using the NEAR fuzzy logic command in
AltaVista. I love AltaVista, but its not the best at handling this
fuzzy logic. There are others out there that handle them better.
I'll review AltaVista first since everyone is familiar with it,
but stay tuned until the later part of this article to learn how
to search the others.
NEAR searching is very useful for opening up a narrow search to
include other possible combinations of a set of words. AltaVista
considers near to be within 10 words to the left or right of the
first term. Like this:
Nurse NEAR licensed
That will return pages containing the term Nurse where it appears
within ten words of licensed. This way you catch all the types of
licensed nurses like Licenced Vocational Nurse and Licensed
Practical Nurse. But you also catch the other ones like
"Registered nurse in emergency room. Provided and supervised
licensed..." where Nurse is 7 words away from Licensed.
In contrast you don't find the "Licensed Driver" who was a "Sketch
Nurse" in a play in Wisconsin (read her resume at
http://suzanneadams.com/resume.htm). To further illustrate, in
AltaVista a search for "nurse NEAR licensed AND title:resume"
returned 63 documents, while "nurse AND licensed AND title:resume"
returned 103.
But the fun doesn't end here. Broaden your horizon a little and
use two other extremely powerful search engines. One very old, and
one very new. I am talking about AOL and Vivisimo.
http://search.aol.com
AOL has the little known ability to search with three Boolean
Near, which I have used for many years, but also the ability to
use the search commands ADJ and W/n.
"What is that?" you ask?
ADJ means directly adjacent, with it you find documents that
contain what's on the left directly in front of what's on the
right of your keyword. ADJ is different than "double quotes" for
three reasons. Fist, ADJ in AOL Search automatically allows for
root word variants or truncation as in program, programming, and
programmer. Second, ADJ can connect complex expressions. For
example: (engineer or developer or architect) adj software finds
items containing either software engineer, software developer or
software architect. Finally, unlike "quoted phrases" your words
can be on either side of each other not necessarily in order. So
to find both versions of database next to design you would have to
use ("design database" OR "database design") in another search
engine.
W/n is a proximity operator that gives you the power to manually
set how close you want things to be. It will find documents with
your requested word occurring within a specified number of words
to the right of your keyword. Use any number for "n". Example:
optical W/5 engineer finds documents in which optical occurs
within five words after, to the right of, engineer - as in optical
systems engineer, optical board level design engineer, optical
long-haul systems engineer, etc. It will look only for words in
order of "optical" fist then any other words numbering up to five,
and finally "engineer" but not the other way around.
http://www.vivisimo.com
An automated, hierarchical, conceptual, just-in-time clustering
engine, Vivisimo is much more than meta-search. There are many
reasons, but the most relevant for this article is its ability to
offer total control. You can search with the most advanced
traditional commands like image:, title:, url:, link:, linktext:,
host:, site:, domain:, related:, and text:, in addition to every
form of Boolean both traditional and Fuzzy like AND, +, OR, |, AND
NOT, -, NEAR and ~.
Since this is not a search engine of its own but rather uses
results from Yahoo, MSN, Fast, Netscape, Open Directory, Direct
Hit, Looksmart, AskJeeves, Lycos, AOL and HotBot, use the advanced
commands as you would with any or all of those. Notice the absence
of Google and AltaVista? Also, be aware that Near is only used by
AOL and Lycos, and that on Lycos Near means within 25 words.
Vivisimo should handle command translation for you so if you use
"host:" it should translate that to "url.host:" for Fast and
domain: for HotBot. If you want clarification on who uses what
commands and how refer to Danny Sullivan's easy reference chart at
http://www.searchenginewatch.com/facts/ataglance.html
Shally Steckerl's PENDULUM approach
An appetizer, awaiting a more complete essay...
- Start using four to eight keywords. More confuses the search engine and may return too few results, less may return far too many
- If your keywords are too general your results will be inaccurate.
If they are too specific, your results will be too sparse. Both approaches
mean wasted time.
- The best approach is one delivering 40 to 120 results.
Vary your choice of keywords until you get the best possible combination.
- Then you can limit results geographically.
And some 'simplicity' rules:
- Keep the search simple and save time.
- Complicated search strings confuse the search engine and waste time.
- The search should be limited to less than three AND's.
- Start with the most important keyword, separate them with AND, then add the OR's. Then swing the pendulum by changing each term one at a time starting from the right.
- Terms should be ranked in order of importance: left to right.
To adjust, modify the second term first, then the third and finally add one more if needed.
Bk:flange of myth
(c) 1952-2032: [fravia+], all rights
reserved