|
|
|
Copernic 2001 Pro (Version 5.0)
Light Version from: http://wwww.copernic.com/
[Use it to find its bigger brother ;)]
W32Dasm 8.93 - Recommended HexWorkshop - Essential Tool Filemon - Essential Tool C Compiler - Language for Tool Writing
I have been on a quest to find the query URL's and structure of queries as part
of my quest for data for my local search bot. After my last essay was finished
and the targets data has been extracted. With a fresh set of data in my hands,
I sat down and started writing a converter to put the data into a common file format.
This was where this essay begins, I had decided on a basic subset of the data
to use, but thought I should check it against other sources (in other bots),
first on the pile was webferret, a search-bot about which
Laurent has written and essay that you will find
here.
As is my usual trend I did not let the software within wire distance of the
internet, so did not get the updates and the dataset provided as standard is
pretty poor - so threw it in the bin.
Laurent had mentioned to me that I might find copernic interesting. Umm
Could this be a good target, I had heard of it, but had until recently steered
clear of all these search-bot programs. This was because I know you do not get anything
for nothing, and the thing that makes them money is knowing your searches, and
being able to make you sit through advert after advert after advert...
So off to the web, do a search for copernic and read some reviews. Seems like
another of these local search bots, where the main advantage is it knowing how
to talk to the search engines and co-ordinate the replies and present them to
the user in a nice simple way. This sounded interesting and it seemed to support
a large number of search engines but no specific numbers were given. I went to
some lengths to avoid visiting any of the copernic sites, for reasons, which will
become apparent later.
So the target was picked, next step was to go find it on the web.
So off to the web and Grabbed the Pro version, did not even go near their
site, so if they are busy checking logs you will not find me ;)
The Pro version came with a key - nice!
Out came the clean PC. This machine was not connected to any network or the internet,
after all we did not want any uncontrolled data to go out ;). Filemon was started
and left running and then copernic was installed on the pc. After the installation
the program was not run, and the installation process finished. The filemon log
of installation was then saved for later reference. So now to clear the Filemon log
and leave it running, to log files accessed by program.
Next step is to run the program and set it to point to the local proxy. Right - first
thing it does it ask you some registration details, when all data has been entered and
proxy set up it
tries to connect to get an update. [This is very optimistic of the company - that
all people who install and run it first time will be connected to the internet]
Right, so look at logs on proxy and there are a number of requests to "updates.copernic.com"
Now lets try a search, for 'searchlores' . At this point I know it is not going to get
any results, as the proxy does not connect to the internet, just returns 404 for every
request, as though routing was broken. So did the search. Look at proxy logs and in
amongst the requests for search engine pages, there is one that stands out to
"regcards.copernic.com".
Now follows an explanation of these requests, as they are quite interesting. They go
to the copernic.com domain so they must contain some user data or be used to track
users of this program in some way.
Firstly lets look at the update requests:HEAD http://updates.copernic.com/copernic2001upd/copernic2001plus.cui HTTP/1.1
This is the request sent:
HEAD http://updates.copernic.com/copernic2001upd/copernic2001plus.cui HTTP/1.1 Host: updates.copernic.com Accept: */* Connection: close User-Agent: Copernic Pragma: no-cacheSecond it does a : GET http://updates.copernic.com/copernic2001upd/copernic2001plus.cui HTTP/1.1
GET http://updates.copernic.com/copernic2001upd/copernic2001plus.cui HTTP/1.1 Host: updates.copernic.com Accept: */* Connection: close User-Agent: Copernic Pragma: no-cacheWhy do a HEAD, if when it fails you go on to do the GET anyway, why not simply do a GET, this seems very pointless ;)
GET http://www.copernic.com/cgi-bin/nph-osnvs2.pl?ns=##########################&iu=%7B********-****-****-****-************%7D&lo=http://updates.copernic.com/copernic2001upd/copernic2001plus.cui&cl=0 HTTP/1.1 Host: www.copernic.com Accept: */* Connection: close User-Agent: Copernic Pragma: no-cacheThe field marked with '*'s will be explained in the next request as it is a common parameter which is passed in both requests. The field marked with '#'s also seems to be a number of some form to be sent to their server.
Now lets look at the regcard information: POST http://regcards.copernic.com/cgi-bin/regcard HTTP/1.1
This is the request sent:
POST http://regcards.copernic.com/cgi-bin/regcard HTTP/1.1 Host: regcards.copernic.com Accept: */* Connection: close User-Agent: Copernic Content-Type: application/x-www-form-urlencoded Content-Length: 129 %5Ejohndoe%40mort.somewhere%5EUnited%20States%5E12345%5E0%5E0%5EENGPRO%5E5001%5E%********-****-****-****-************%7D%5EFrom%20web%20site%5E%5E0%5EJohn%20DoePlain text of last line: ^johndoe@mort.somewhere^United States^12345^0^0^EENGPRO^5001^{********-****-****-****-************}^From the web site^^0^John Doe
| Value | Description |
| johndoe@mort.somewhere | Email Address |
| United States | Country |
| 12345 | Zip Code |
| 0 | Unknown |
| 0 | Unknown |
| ENGRPRO | Version of Software |
| 5001 | Registration Card Version |
| {********-****-****-****-************} | GUID |
| from web site | Referrer for Product |
| Unknown | |
| 0 | Unknown |
| John Doe | Username |
"http://regcards.copernic.com/cgi-bin/regcard" "http://updates.copernic.com/copernic2001upd/" "http://www.copernic.com/cgi-bin/nph-osnvs2.pl" "www.copernic.com"The first ones can be nullified by writing "http://127.0.0.1/" at the start of the strings. This then will prevent all accesses to their servers. This is a good alternative to the hosts file, as the program seems to bypass the hosts if using a proxy and just sends the requests straight to the proxy.
So next step is to close the program, save the filemon log and have a look around my system.
I had a browse through the install filemon log file and made a note of the location of files
added to my system. The first thing that hit me was a load of '.csf' files which had
the names of search engines, and a list of '.ssf' files which seemed to represent
categories.
The next thing is to look at the run filemon log, it seems to read the .ssf and .csf files
and then create a set of files, under the directory 'data' which seems to be a user profile
with the users name as the folder name. Ummm, so some kind of translation or copying going
on, but a lot fewer files get written than read.
So to open up the main executable in our favourite hex viewer and have a quick browse, but
first to extract all the strings from the file. Had a browse through the strings and it
looks like it was coded in DELPHI. This was just a hunch and I remembered having a copy of
DFM-Explorer around , so tried it on the file and sure enough out came all the resources,
so it is for sure delphi. so the task is now to find a delphi decompiler. My thinking here
was that even though it might not be needed, if it is then it might make the program code
a bit easier to understand. Also better to check this option to start with rather than
later. As a teacher once told me "Always get all your tools ready before starting any task!"
The catch is : this is a delphi application, warning bloatware imminent. I had thought that
the executable was a bit on the large side for something so seemingly simple, and this explained
it. No extra DLL's or files, so the delphi libs must be statically linked. I remember when
applications used to fit on a floppy, now the icon files will not ;(.
First step is to grab ye ole webbrowser and search for a delphi decompiler (I must admit shame
and say I had never used one before). Right the one that pops up the most in the list when
ranked is 'DeDe' by DaFixer!. Ok so lets grab it and let it rip.
A few sips of my drink later and it has finished downloading, so lets run it and see what
it comes up with. DeDe recognises the file and does its stuff, and yes it is delphi because
I now have the forms and pascal code nicely disassembled on my HD. So a quick browse through
them to get an idea of the structure. umm
I noticed that DeDe also supports exporting all its references to a W32dasm project. Since
one of the steps I was going to do was to disassemble the file, I ran Wdasm and generated
a project file, then pointed DeDe to it and let it do its stuff. Hopefully when it finishes
it will leave a nice big file with the combined references, so that should make life easier
later on. Being able to see the references to the Pascal and Delphi bits should make the code
a bit easier to follow.
While that was running (it takes some time) my next step was to search all the .pas files
for references to 'ssf' and 'csf' to find where it loaded the data files, I did not find
any references of these strings in any of the .pas files. Ok time to load up the W32Dasm
project and have a look in that file. OK PROBLEM! - the project is still being accessed
during the combining of references, so that option is out for an hour or so, as it seems
to take quite some time (35Mb File to process).
So lets have a look around, there are some DLL's in the directory, so lets check them out:
c4dll.dll is Database Engine Library (Sequiter CodeBase Components for Delphi)
xcdunz32.dll is a Zip Library [Xceed Zip Compression Library]
SSCE5253.dll is the Sentry Spelling-Checker Engine [Wintertree Software]
Zip Library - is this just there for the installation or unpacking updates, or might it
be used on the data files? Time to check, if the data files are zipped then they should
be fairly easy to unpack. That would make life very easy ;)
So lets look at the files that were generated when the program was run, the files in
what looked like a profile directory.
channel.ctb seems the most likely candidate, and matches (by some coincidence) roughly
the size of all the .ssf and .csf files. (1,158,690 bytes)
All .ssf - category files (73,718 bytes). All .csf - engine files (1,131,657 bytes)
This seems a strange coincidence, as opening up this file shows it does have the engine
names and the category names (from filenames) but also contains a LOT of space characters,
so given this is in a directory called after the user, this should be the users preferences
for searches or something similar.
Back to the data files, as the only files looking good candidates are the '*.*sf' files
which fit the bill perfectly. So opened one up in notepad and it looks unreadable.
So right, copied three .ssf and three .csf files of different sizes to a temporary
directory to start looking at them. Opened the first one in a hex viewer and noticed
that it is not plain text, ok so it was expected they would be packed or encrypted
in some way, they would not leave their whole product out in the open. But one thing
that did jump out was the pattern of the characters.
Here is an excerpt from one of the files: (Boxes are unprintable characters)
Sssx?y[SSsS3SrQSSSSSSSSsx;SSss=
SS'3rrrQPSsS3SrQsS3rpSSssx;[yzys3|
xySSss\_yX[yyx;yxSSss?[[ۜ
SSss=yzX|SSss;x3SSSSSssx
;xSSss}۸[ySSss=Xy?X|S
Sss=Xy?3X|SSs"SSs|xSSss?9X
[;y9Xs3xxy99xx{zyٛ;99xSSss;
;__ysSSSss;}xyӐSQP9[yx2Q?|Q
ӐSs2Qxٸ;QrRpSSss;Pyp?yy8ظ98{
'SSss;Pypy8ҙQy90=XP3p0}
xy0=XP3p0Q;x0=XP3p0s0=XP3p0'SSss;P
};;p
Notice the repeated 'SS','SSs' and 'SSss' sequences. Instinct at this point says
that this is not a packed file as these repeats would have been eliminated
by the compression process. There are other repeated sequences present in
the encoded text.
This is the header common to the 1K category files: Auctions and Buyhardware
9D9D5373F41473F414DF78F8F93FDB79
F85BF213535373F05333F073F3515353
125353125353F414535373F31FF9B978
3BDBF91BF41453537373BE3DB8989BF2
11D3535313F0923372727251505373F0
5333F073
. . . . (more data)
F414
This is also the same in Buysoftware which is a 2k file, apart from one byte
9D9D5373F41473F414DF78F8F93FDB79
F85BF213535373F05333F073 72 [changed F3 to 72] 515353
125353125353F414535373F31FF9B978
3BDBF91BF41453537373BE3DB8989BF2
11D3535313F0923372727251505373F0
5333F073
. . . . (more data)
F414
This seems the only difference but is not the same in all 2k files...
in the copernic.csf file it is:
9D9D5373F41473F414DF78F8F93FDB79
F85BF213535373F05333F0 53 [changed 73 to 53] 72 [changed F3 to 72] 515353
125353125353F414535373F31FF9B978
3BDBF91BF41453537373BE3D
. . . . (more data)
F414
different after this..
So this looks like they are all encoded with the same method, and this is some kind of
common header to the files.. Also all files seem to end with 'F414'
This does not look like an xor'd pkzip.. as the header is wrong. IF this was a zip
file with a zip header, you would expect more bytes to be different, if this was a
zip file with the header removed then the data would not show the same repetitive
patterns at such regular intervals. This lead me towards thinking they were just
encrypted in some way. This was backed up by the observation that they are all sizes
from 926 bytes to 3,000 bytes (in all steps) so they are not a fixed structure.
(but they do have a header and a footer which seems to be common, could just be some
text at start of file, or could designate something else - seems to me like it would
be a constant bit at the start of the decoded file, rather than being a packed header
or else more of it would change.. so it looks like they are just mildly
encrypted and are not packed? hopefully anyway. ;)
The 'F414' sequence bothered me as soon as I saw it, the spacing throughout the file
and also the positioning of it, together with the fact that it appeared in the header
made me think that this could be '0d0a' or a newline in a text file. This fits with the
decoded file being plain text. So made a little tool which copied the file and just
changed those bytes over - the result was a file with what looked like reasonable line
lengths for a text configuration file. So I was on the right track, or so it seemed.
Here is a snippet of the above file: (with line splits inserted)
Ss
s
x?y[SSsS3SrQSSSSSS
SSsx;
SSss=SS'3rrrQPSsS3SrQsS3rp
SSssx;[yzys3|xy
SSss\_yX[yyx;yx
SSss?[[ۜ
SSss=yzX|
SSss;x3SSS
SSssx;x
SSss}۸[y
SSss=Xy?X|
SSss=Xy?3X|
SSss;Pypy8ҙQy90=XP3p0}xy0=XP3p0Q;x0=XP3p0s0=XP3p0'
SSss;P};;p
This seems to fit the structure of a configuration file, short line lengths. Later in the
file are longer lines, about the size of a query URL, so this seems right ;) There is also
a pattern to the characters at the start of the line, and notable is that the repeated 'SS'
combination appears at the end of strings - this means (hopefully) that it is not a
position dependent (or offset) substitution.
After a bit of thinking I was convinced that these files are protected by a substitution
cipher, and more looking at the file content seemed to back this up as there are many
repeating patterns, as you would expect to see in a file with URL's inside it. So the
target was to find the translation function or table. I by this time had discounted a
packed format and had also discarded a binary file, it is a plain text file - this may
seem like a jump but if you had been sitting on my shoulder you would have seen it
the same way.
So there are two methods they could use to achieve this, the first would be to use a
lookup table to do the translation and the second would be to use a function to do the
same thing. In order to confirm some options, another look at the running program was
required, when viewed it seemed they did include all lower and uppercase chars and also
European characters - this was important as it means they have to use all 8 bits of the
character and cannot throw any away in the function, whereas if they had not included
any European characters they might be able to throw a bit away somewhere in the function
and this could affect the findings dramatically. It was also obvious that they used
normal ASCII characters as the patterns would have been different if they had used
some form of unicode or multi-byte character set. This gives us more ammunition
for the coming hunt.
One thing I must add at this point is that there are many known attacks on substitution
ciphers - these were discarded because they assume a language and work from character
occurence probability tables. They are very effective but were discarded for this
target as the contents of the configuration file was known not to match normal text
as it would be using (presumably) repeated keywords and values which would either be
meta tags and/or url's, this meant that they might give some results but would
probably not. So I discounted them to save time!
Getting Hands Dirty
DeDe has now finished, so we can start looking at the assembler for the file. First task
is to hunt down the references to any .ssf or .csf files. When looking through the file you
will find a few references to this string. These were used as a starting point and breakpoints
were set on them.
I shall take a wander here - bear with me! When I started looking at DeDe, I was intending
to work from the disassembled files and track through the code in order to find the
decryption routine which would restore the files to plaintext. Now my priorities had
changed somewhat, what I was now after was a portion of the plaintext file and hopefully
all of one of the files in memory so that it could be saved. The fact that the cipher
seemed to be a substitution one from the data shown above means that although to find
the decryption routine would be nice, to find a portion of the plaintext would be just
as nice in helping find the result. If they have used a table then hopefully once we
have a portion of the plaintext and what it maps to in the encrypted file, finding the
table in memory would be very easy. This seems a nicer and quicker approach that
reading through page after page of disassembled code trying to put it together. This
point is made more by the fact that the app is in delphi, so a simple instruction
could quite easily call many functions all over the place.
So trying to stop the urge to go through the code and reassemble what happens, which
is very hard. I start the code running in W32Dasm with breakpoints set on every
instance of a string that ends in '.ssf' and '.csf'. It soon breaks on one of them.
At this point I set auto-api stop, and show parameters for local and system calls
and set it running again. What I am hoping for is one of the calls to have a
pointer to the plaintext in the call to it.
Here is the bit of code that loads 'Copernic.csf', which is thought to be the
master configuration file.
* Possible StringData Ref from Code Obj ->"Copernic.csf"
|
:52A00A BAB8A75200 mov edx, 52A7B8
:52A00F E8FCA0EDFF call 404110
:52A014 8B55E0 mov edx, dword ptr [ebp-20]
:52A017 8B45FC mov eax, dword ptr [ebp-04]
:52A01A 8B4020 mov eax, dword ptr [eax+20]
:52A01D 8B08 mov ecx, dword ptr [eax]
:52A01F FF5158 call [ecx+58]
:52A022 8B45FC mov eax, dword ptr [ebp-04]
:52A025 8B4020 mov eax, dword ptr [eax+20]
// This following call seems to handle the
// file and contains a call which exposes the
// plaintext
:52A028 E8970AFAFF call 4CAAC4 // HANDLEFILE
:52A02D 85C0 test eax, eax
:52A02F 7425 je 52A056
:52A031 6A00 push 0
:52A033 6A00 push 0
:52A035 A1C4255B00 mov eax, dword ptr [5B25C4]
:52A03A 8B00 mov eax, dword ptr [eax]
:52A03C 8B4050 mov eax, dword ptr [eax+50]
:52A03F BA02000000 mov edx, 2
The code below is the start of the HANDLEFILE routine:
* Referenced by a CALL at Addresses:
|:4EB84D, :52A028, :599F7B, :59A81A
:4CAAC4 55 push ebp
.
... next part is further down the function.
.
:4CAAFA 8D55E8 lea edx, dword ptr [ebp-18]
:4CAAFD 8B45FC mov eax, dword ptr [ebp-04]
:4CAB00 8B08 mov ecx, dword ptr [eax]
:4CAB02 FF511C call [ecx+1C]
:4CAB05 8B45E8 mov eax, dword ptr [ebp-18]
:4CAB08 BA01000000 mov edx, 1
// This function has the plain text for the
// line from the file passed into and outof
// it, so the decoding must happen before this!!!
:4CAB0D E892EDFFFF call 4C98A4
// [ebp-10] points to the start of text, both into
// and out of this function
So we have found a function that is called with one of the parameters as the plaintext
for the file currently being handled. This is what we were after, so remove all other
breakpoints and set a new breakpoint on 0x004CAB0D and make sure we tick the display
parameters to local calls in W32Dasm. Right now every time we hit this function filemon
tells us which file we are reading and the parameter display gives us the location of
the string.
After placing the breakpoint and grabbing a string of plaintext,
The start of the plaintext is: "FF01" - 0x46 0x46 0x30 0x31 0x0d 0x0a
While looking at this, I noticed a bit of code further down the disassembly
listing, which jumped out at me as some possible plaintext.
This is the code that seems to handle parsing the configuration files:
* Possible StringData Ref from Code Obj ->"DisplayName"
:599FA0 BA14A65900 mov edx, 59A614
:599FA5 8B45E4 mov eax, dword ptr [ebp-1C]
:599FA8 E8AB4DF2FF call 4BED58
:599FAD 8D45C4 lea eax, dword ptr [ebp-3C]
:599FB0 33D2 xor edx, edx
:599FB2 E8B5B6E6FF call 40566C
:599FB7 8D4DC4 lea ecx, dword ptr [ebp-3C]
this code is repeated with the following string references:
* Possible StringData Ref from Code Obj ->"Description"
* Possible StringData Ref from Code Obj ->"HomePage"
So this bit of code is parsing a file of some kind looking for the identifiers
given in the string references, and so that means our file MUST contain some
of the above strings, as they do not seem to be used in any other files.
Decoding files
So now we have a portion of the plaintext written down (or in a file)
and this looks very good, and seems to confirm a lot of things. The string
pointed to is shown below, and when looking for the first time you should
also refer back to the previous text and see what bells ring ;)
A portion of the plaintext:
FF01
0015Register
0011_Conv="4002->3999 (01-03-09, 10:37:59)"
0011DisplayName="123India"
0011HomePage="http://www.altavista.in/"
The order is slightly changed from the order in the file (only a couple of
entries swapped) but note the line lengths as these are a giveaway. So we now
know for sure that we are on the right track - GOOD! Now you can call me stupid
if you want, but '0011' looks a bit like 'SSss' and also the '001' would mean more
with the 'SSs' occurences as well.
So this data was saved to a file, and a file was created with the lines mixed and
grouped in pairs of matching line length. Then a bit of code to read the lines in
and generate a mapping table from the characters in an encoded line to the
matching character in the decoded file. This table was then saved to a file as
a 256 byte list. Obviously this did not include all characters from the table as
the chances were that not all characters would be used in this one file, but
the thought was that as I stated above it would either give enough of a clue to
find the lookup table in memory, or a clue to the function. It was more
appealing than running through lines and lines of code. So the map table was
created and any holes were left with their original values, so that errors could
be spotted and added. Then this substitution lookup was loaded into the decoder and
compiled ready for use. At this point I decided to view the encrypted values with
the decrypted values in the form of the table, luckily there was a good spread in
the table and luckily I had picked a file with European characters inside it so
there were some of those represented in the table.
The original encoded file was then decoded using this partial table as a sortof
proof-of-concept for the code and the idea. Rightly so the file was decrypted
and shown in total plain text. So I had proved to myself that I was on the right
track and I had not even bothered to hunt the disassembly file for the decode
routine.
The next step was to check for a lookup table in any of the files, so I took a
portion of the substitution table that contained proper plaintext values and did
a search of all the files in the root folder for copernic. NOTHING! - so it seems
they either do not have it in the files, they generate it or the data is encoded
by a function. This was good news, because the last two options both mean that
it is created by a function without a lookup table, which means there has to be
a simple logic to it, as there are only so many ways to scramble 256 entries
using code and without loosing any entries or values.
Now at this point I should really have dived into the dead listing and tried to
find the routine, but I took a different approach. I instead turned my attention
to the output of my lookup table creator, and the results it had given me. I was
trying to look for a pattern within the mapping
This is a partial dump of the lookup table and values, showing the relationship
between the encoded and decoded characters: (all values are HEX)
Encoded Decoded
10 2a
11 22
12 3a
13 32
14 0a
Encoded Decoded
18 6a f8 6d
19 62 f9 65
1a 7a fa 7d
1b 72 fb 75
1c 4a fc 4d
1d 42 fd 45
1e 5a fe 5d
1f 52 ff 55
38 6b 58 68
39 63 59 60
3a 7b 5a 78
3b 73 5b 70
3c 4b 5c 48
3d 43 5d 40
It did not take long for one to jump out at me, did you pay attention to the
above table, did any bells go off? I left holes in the table on purpose so
you had to look at it. Have you seen the pattern, it is a nice one I must
admit - if you just arrange the table with the characters showing instead
of the hex, a pattern does jump out, but not as much as when viewing the
hex bytes. Hopefully you should agree with me when I now say that the dead listing
approach suddenly lost a LOT of its appeal for this target.
This is a regular pattern based substitution, done by a bit of code which
is not very complex or large. I have already gone down the road of abandoning
the dead listing, and it is now firmly in the bin. So to reverse this encoding
we simply need to analyse the pattern.
It also appears as though the resulting value is made up from two separate
nibbles (4bits) and they are bolted together, this is shown by the way they
seem to change out of step with each other.
Pseudo code:
Variables:
IN_A = encoded_byte
IN_H = encoded_byte_high_nibble
IN_L = encoded_byte_low_nibble
OUT_H = decoded_byte_high_nibble
OUT_L = decoded_byte_low_nibble
to set up the code do the following:
IN_A = read_from_file();
IN_H = IN_A & 0xf0;
IN_L = IN_A & 0x0f;
before exiting:
OUT_A = OUT_H | OUT_L;
Taking the examples:
0x38 -> 0x6B and 0x39 -> 0x63
It seems like there are two values for the lower nibble, and these seem to
be offset by 8, so no matter what the lower value is the higher one is that
plus 8. (Look at the table above to confirm this) The use of this value seems
to be dependent on the lower bit of IN_A. So the final step is to take
the low bit of IN_A and if it is clear to add 0x08 to the output byte.
You can also see that the lower nibble of decoded char (OUT_L) is related to
the upper nibble of encoded data (IN_H). And that the upper nibble of decoded
char (OUT_H) is related to lower nibble of encoded char (IN_L).
Look at the 0x*8 and 0x*9 values they all map to 0x6*, just like 0x*A and 0x*B
values map to 0x7*, and like 0x*E and 0x*F map to 0x5*. Now look at 0xff, the
lower value for the lower nibble is '5' so 0xf* -> *5 and 0x*F -> 0x5*.
If you do more checking it will reassure you, what is of interest is that these
mappings seem to be the same for both halves, which should make life a lot easier.
So now that we have isolated the components, lets create a mapping for the nibbles,
just taking the values from the previous table.
Original Nibble Output Nibble
0x0,0x1 0x2
0x2,0x3 0x3
0x4,0x5 0x0
0x6,0x7 0x1
0x8,0x9 0x6
0xA,0xB 0x7
0xC,0xD 0x4
0xE,0xF 0x5
So Putting this together gives us:
Variables:
IN_A = encoded_byte
IN_H = encoded_byte_high_nibble
IN_L = encoded_byte_low_nibble
OUT_H = decoded_byte_high_nibble
OUT_L = decoded_byte_low_nibble
LOOKUP = [2,2,3,3,0,0,1,1,6,6,7,7,4,4,5,5]
to set up the code do the following:
IN_A = read_from_file()
IN_H = (IN_A & 0xf0)>>4 // Get high nibble into low nibble
IN_L = IN_A & 0x0f // Isolate low nibble
OUT_H = lookup[IN_L]<<4 // To get into high nibble
OUT_L = lookup[IN_H] // this is low nibble
OUT_A = OUT_H | OUT_L; // merge the two
if ((IN_A & 0x01) == 0) // This does the offset on
OUT_A = OUT_A + 0x08 // the lower nibble
This can be simplified to the code below:
char lookup[]={2,2,3,3,0,0,1,1,6,6,7,7,4,4,5,5};
int decode_character(int encoded)
{
if (encoded & 0x01)
return( (lookup[encoded&0xf]<<4) + lookup[(encoded&0xf0)>>4] );
else
return( (lookup[encoded&0xf]<<4) + lookup[(encoded&0xf0)>>4] +8 );
}
I have not looked in the executable for this code or the bit that does the
same function as that does not matter. If you use the above function as
a decoder for each character in all the '*.ssf' and '*.csf' files within
the programs directorys it will convert them to the plaintext (unencoded)
versions.
So I had the files in plain text form and they were all text configuration
files as I had thought, so I counted (in the version I have) 754 search
engines or URL's - that is quite a lot of data, and also this product
has also got them grouped nicely, which will help with the problem of
how to organise them, its already done.
So at this point I am pretty happy with how things have gone, I have a
routine which decodes their input files and have converted them all to
plain text, so the data is now usable. And to think this has been
achieved with only minimal time in front of code, only the period when
scanning for the plain text.
Scripting Language
When examination of the decoded files was started, one of the first files
looked at was 'copernic.csf' as this sits in the approot and is named the
same as the application, this was a good choice for master configuration or
some kind of global parameters file.
You should remember from earlier that most lines in the conf files seem to
have a 4 digit number (0011) of varying value at the start of the line. The
example given earlier did not show this as clearly as the following example
hopefully will. This is an instruction for the internal scripting language
to tell it how to handle the rest of the line.
This is the decoded version of 'copernic.csf':
FF01
1
TimeStamp=2001-03-09 00:00:00
0015Register
0011ChannelSet="Ad"
0011ChannelSet3="Ad"
0011Version=2525
0011FileVersion=0
0011SoftwareVersions="eng;engplus;engpro;fra;fraplus;frapro"
0016
0015Init
0011UseCookies=True
1001
0011SearchQuerySeparator="+"
1003
0011Key=SearchQuery
0011RNDSEED=""
0018Length(RNDSEED)<>12
0011RNDSEED=String(Random(99999999)*Random(9999))
0019
0011T=Random(999999)
0011PromoT=Numeric(Substring(RNDSEED,8,1))
0011PromoTI=Numeric(Substring(RNDSEED,9,1))
0011Random100=Numeric(Substring(RNDSEED,10,2))
0011SourceFLYCAST=Replace("ENG|1|http://ad-adex3.flycast.com/server/_img/Copernic/software/$RANDOMNUMBER$|http://ad-adex3.flycast.com/server/click/Copernic/software/$RANDOMNUMBER$","$RANDOMNUMBER$",String(T))
0011Source247ENG=Replace(Replace("ENG|1|http://connect.247media.ads.link4ads.com/serv/2/Copernic/ros/468x60/40543;uniq=$RANDOMNUMBER$?$KEY$|http://connect.247media.ads.link4ads.com/click/2/Copernic/ros/468x60/40543;uniq=$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T))
0011Source247FRA=Replace(Replace("FRA|1|http://connect.247media.ads.link4ads.com/serv/2/fr-Copernic/ros/468x60/40543;uniq=$RANDOMNUMBER$?$KEY$|http://connect.247media.ads.link4ads.com/click/2/fr-Copernic/ros/468x60/40543;uniq=$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T))
0011SourceUFS="UFS|1|http://banner.unifiedweb.com/cgi-bin/getimage.exe/copernic?GROUP=copernic|http://banner.unifiedweb.com/cgi-bin/redirect.exe/copernic"
0011SourceVALUECLICK="VALUECLICK|1|http://kansas.valueclick.com/cycle?host=hs0136917&b=1&noscript=1|http://kansas.valueclick.com/redirect?host=hs0136917&b=1&v=0"
0011SourceVALUECLICKOLD="VALUECLICK|1|http://kansas.valueclick.com/cycle?host=hs0194203&size=468x60&b=indexpage&noscript=1|http://kansas.valueclick.com/redirect?host=hs0194203&size=468x60&b=indexpage&v=0"
0011SourceSERVERFRA4552=Replace(Replace("BANNERSERVER|1|http://bannerpush.copernicserver.com/RealMedia/ads/adstream_nx.cgi/copernicclient/free/fra/recent/$RANDOMNUMBER$?$KEY$|http://bannerpush.copernicserver.com/RealMedia/ads/click_nx.cgi/copernicclient/free/fra/recent/$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T))
0011SourceSERVERENG4552=Replace(Replace("BANNERSERVER|1|http://bannerpush.copernicserver.com/RealMedia/ads/adstream_nx.cgi/copernicclient/free/eng/recent/$RANDOMNUMBER$?$KEY$|http://bannerpush.copernicserver.com/RealMedia/ads/click_nx.cgi/copernicclient/free/eng/recent/$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T))
0011SourceSERVERFRA4551=Replace(Replace("BANNERSERVER|1|http://bannerpush.copernicserver.com/RealMedia/ads/adstream_nx.cgi/copernicclient/free/fra/old/$RANDOMNUMBER$?$KEY$|http://bannerpush.copernicserver.com/RealMedia/ads/click_nx.cgi/copernicclient/free/fra/old/$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T))
0011SourceSERVERENG4551=Replace(Replace("BANNERSERVER|1|http://bannerpush.copernicserver.com/RealMedia/ads/adstream_nx.cgi/copernicclient/free/eng/old/$RANDOMNUMBER$?$KEY$|http://bannerpush.copernicserver.com/RealMedia/ads/click_nx.cgi/copernicclient/free/eng/old/$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T))
0012Find("ENGUFS",Edition)<>0
0011SourceUrl=Entry(3,SourceUFS,"|")
0011TargetUrl=Entry(4,SourceUFS,"|")
0013
0012(Find("PLUS",Edition)<>0)or(Find("PRO",Edition)<>0)
0012BuildNumber>4551
0011SourceUrl=Entry(3,SourceVALUECLICK,"|")
0011TargetUrl=Entry(4,SourceVALUECLICK,"|")
0013
0011SourceUrl=Entry(3,SourceVALUECLICKOLD,"|")
0011TargetUrl=Entry(4,SourceVALUECLICKOLD,"|")
0014
0013
0012BuildNumber>4551
0011SelfPromoPercent=0
0013
0012Substring(Edition,1,3)="FRA"
0011SelfPromoPercent=0
0013
0011SelfPromoPercent=10
0014
0014
0012Random1004551
0012Substring(Edition,1,3)="FRA"
0011SourceUrl=Entry(3,SourceSERVERFRA4552,"|")
0011TargetUrl=Entry(4,SourceSERVERFRA4552,"|")
0013
0011SourceUrl=Entry(3,SourceSERVERENG4552,"|")
0011TargetUrl=Entry(4,SourceSERVERENG4552,"|")
0014
0013
0012Random100>54
0012Substring(Edition,1,3)="FRA"
0011SourceUrl=Entry(3,Source247FRA,"|")
0011TargetUrl=Entry(4,Source247FRA,"|")
0013
0011SourceUrl=Entry(3,Source247ENG,"|")
0011TargetUrl=Entry(4,Source247ENG,"|")
0014
0013
0011SourceUrl=Entry(3,SourceVALUECLICKOLD,"|")
0011TargetUrl=Entry(4,SourceVALUECLICKOLD,"|")
0014
0014
0014
0014
0014
0011RotationInterval=120000
0016
11A2
This is a table giving the function for each command string:
String COMMAND Description
0011 SET SET variable=value
0012 IF IF expression THEN
0013 ELSE ELSE
0014 ENDIF ENDIF
0015 FUNC Function Definition Start
0016 ENDFUNC End Function Def
0018 WHILE WHILE expression DO
0019 WEND End While Loop
Also there are some functions:
Replace(String A,String B,String B)
This takes the string A, it then finds all occurrences of string B and replaces
them with the string in C. So Replace("ABCCCBA","CCC","YYY) would return "ABYYYBA"
Substring(String A,Number B,Number C)
This takes the string A and grabs C characters, starting at position B. So Substring("ENGPRO",1,3)
would return "ENG"
Numeric(Number A)
This returns the number represented in A as a string. So Numeric("100") would return 100
Length(String A)
This returns the length of the String passed in. So Length("ENG") would return 3
Random(Number A)
This returns a random number between upto the value of A. So Random(99999) could return 99999.
String(Number A)
This returns the string representation of the Number A. So String(100) would return "100"
Find(String A,String B)
This returns true if string A is found in string B. So Find("PRO","ENGPRO") would return true
Entry(3,Source247FRA,"|")
Entry(Number A, String B, String C)
This returns an entry in a string which contains delimited values. A is the number
of the data segment to return. B is the string which holds the data.
C is the character used for the separator.
Using the example Entry(NUM,"AAA|BBB|CCC|DDD","|")
if NUM is set to 1 it would return "AAA", if NUM is 2 then "BBB", if NUM is 3 then "CCC".
Using the above command table, if we translate the script into normal code
language we get the script below:
FF01
1
TimeStamp=2001-03-09 00:00:00
FUNC Register
SET ChannelSet="Ad"
SET ChannelSet3="Ad"
SET Version=2525
SET FileVersion=0
SET SoftwareVersions="eng;engplus;engpro;fra;fraplus;frapro"
ENDFUNC
FUNC Init
SET UseCookies=True
1001
SET SearchQuerySeparator="+"
1003
SET Key=SearchQuery
SET RNDSEED=""
WHILE Length(RNDSEED)<>12
SET RNDSEED=String(Random(99999999)*Random(9999))
WEND
SET T=Random(999999)
SET PromoT=Numeric(Substring(RNDSEED,8,1))
SET PromoTI=Numeric(Substring(RNDSEED,9,1))
SET Random100=Numeric(Substring(RNDSEED,10,2))
SET SourceFLYCAST=Replace("ENG|1|http://ad-adex3.flycast.com/server/_img/Copernic/software/$RANDOMNUMBER$|http://ad-adex3.flycast.com/server/click/Copernic/software/$RANDOMNUMBER$","$RANDOMNUMBER$",String(T))
SET Source247ENG=Replace(Replace("ENG|1|http://connect.247media.ads.link4ads.com/serv/2/Copernic/ros/468x60/40543;uniq=$RANDOMNUMBER$?$KEY$|http://connect.247media.ads.link4ads.com/click/2/Copernic/ros/468x60/40543;uniq=$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T))
SET Source247FRA=Replace(Replace("FRA|1|http://connect.247media.ads.link4ads.com/serv/2/fr-Copernic/ros/468x60/40543;uniq=$RANDOMNUMBER$?$KEY$|http://connect.247media.ads.link4ads.com/click/2/fr-Copernic/ros/468x60/40543;uniq=$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T))
SET SourceUFS="UFS|1|http://banner.unifiedweb.com/cgi-bin/getimage.exe/copernic?GROUP=copernic|http://banner.unifiedweb.com/cgi-bin/redirect.exe/copernic"
SET SourceVALUECLICK="VALUECLICK|1|http://kansas.valueclick.com/cycle?host=hs0136917&b=1&noscript=1|http://kansas.valueclick.com/redirect?host=hs0136917&b=1&v=0"
SET SourceVALUECLICKOLD="VALUECLICK|1|http://kansas.valueclick.com/cycle?host=hs0194203&size=468x60&b=indexpage&noscript=1|http://kansas.valueclick.com/redirect?host=hs0194203&size=468x60&b=indexpage&v=0"
SET SourceSERVERFRA4552=Replace(Replace("BANNERSERVER|1|http://bannerpush.copernicserver.com/RealMedia/ads/adstream_nx.cgi/copernicclient/free/fra/recent/$RANDOMNUMBER$?$KEY$|http://bannerpush.copernicserver.com/RealMedia/ads/click_nx.cgi/copernicclient/free/fra/recent/$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T))
SET SourceSERVERENG4552=Replace(Replace("BANNERSERVER|1|http://bannerpush.copernicserver.com/RealMedia/ads/adstream_nx.cgi/copernicclient/free/eng/recent/$RANDOMNUMBER$?$KEY$|http://bannerpush.copernicserver.com/RealMedia/ads/click_nx.cgi/copernicclient/free/eng/recent/$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T))
SET SourceSERVERFRA4551=Replace(Replace("BANNERSERVER|1|http://bannerpush.copernicserver.com/RealMedia/ads/adstream_nx.cgi/copernicclient/free/fra/old/$RANDOMNUMBER$?$KEY$|http://bannerpush.copernicserver.com/RealMedia/ads/click_nx.cgi/copernicclient/free/fra/old/$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T))
SET SourceSERVERENG4551=Replace(Replace("BANNERSERVER|1|http://bannerpush.copernicserver.com/RealMedia/ads/adstream_nx.cgi/copernicclient/free/eng/old/$RANDOMNUMBER$?$KEY$|http://bannerpush.copernicserver.com/RealMedia/ads/click_nx.cgi/copernicclient/free/eng/old/$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T))
IF Find("ENGUFS",Edition)<>0 // if ENGUFS version
SET SourceUrl=Entry(3,SourceUFS,"|")
SET TargetUrl=Entry(4,SourceUFS,"|")
ELSE
IF (Find("PLUS",Edition)<>0)or(Find("PRO",Edition)<>0)
// PRO or PLUS
IF BuildNumber>4551 // BUILD > 4551
SET SourceUrl=Entry(3,SourceVALUECLICK,"|")
SET TargetUrl=Entry(4,SourceVALUECLICK,"|")
ELSE // BUILD <= 4551
SET SourceUrl=Entry(3,SourceVALUECLICKOLD,"|")
SET TargetUrl=Entry(4,SourceVALUECLICKOLD,"|")
ENDIF
ELSE
IF BuildNumber>4551 // BUILD > 4551
SET SelfPromoPercent=0 // clear addshow variable
ELSE
IF Substring(Edition,1,3)="FRA" // FRENCH
SET SelfPromoPercent=0 // clear addshow variable
ELSE // ENGLISH
SET SelfPromoPercent=10 // set addshow to 10%
ENDIF
ENDIF
IF Random100<SelfPromoPercent // if random < addshow
SET SourceUrl=Entry(3,SourceSERVERENG4551,"|")
SET TargetUrl=Entry(4,SourceSERVERENG4551,"|")
ELSE // if random >= addshow
IF BuildNumber>4551 // BUILD > 4551
IF Substring(Edition,1,3)="FRA" // FRENCH
SET SourceUrl=Entry(3,SourceSERVERFRA4552,"|")
SET TargetUrl=Entry(4,SourceSERVERFRA4552,"|")
ELSE // ENGLISH
SET SourceUrl=Entry(3,SourceSERVERENG4552,"|")
SET TargetUrl=Entry(4,SourceSERVERENG4552,"|")
ENDIF
ELSE // BUILD <= 4551
IF Random100>54 // if random > 54
IF Substring(Edition,1,3)="FRA" // FRENCH
SET SourceUrl=Entry(3,Source247FRA,"|")
SET TargetUrl=Entry(4,Source247FRA,"|")
ELSE // ENGLISH
SET SourceUrl=Entry(3,Source247ENG,"|")
SET TargetUrl=Entry(4,Source247ENG,"|")
ENDIF
ELSE // random <= 54
SET SourceUrl=Entry(3,SourceVALUECLICKOLD,"|")
SET TargetUrl=Entry(4,SourceVALUECLICKOLD,"|")
ENDIF
ENDIF
ENDIF
ENDIF
ENDIF
SET RotationInterval=120000
ENDFUNC
11A2
So this is a script which seems to control all the adverts, so surely a bit of
creative writing is called for. As we already have a decoder we can simply
reverse the process to encode the file after we have created the new one.
We can also figure out a couple of other things, the first is that the following
segment is the header for each file, this does not seem to contain any of the
found script commands, or even the characters for them. This segment seems to be
present at start of all the files:
FF01
1
TimeStamp=2001-03-09 00:00:00
The second is this entry at the end of the file, which seems to be a footer of
some kind - when first looked at it appears that is possibly some form of CRC.
11A2
How about if you are told that the length of this file in HEX is 0x11C4.
Another example is a file with 03AC and a file length of 0x3CE.
So if we do 0x11c4 - 0x11a2 we get 0x22 , and 0x3CE - 0x3AC = 0x22, this means
that this entry is the length of the file minus 0x22 (34 dec). So if we are to
alter the config file (with the hope of replacing it) then we should put the
correct value into this entry as well as encoding the file.
It should be noted that in experiments the file was not parsed and loaded unless
this filelength value was correct, so copernic probably uses it to parse the
input file, to strip the header and so it must give the data length within
the file. This value should be set to the correct value!
Search Query Spying
It should be noted that all adverts that are grabbed from the two servers
"bannerpush.copernicserver.com" and "connect.247media.ads.link4ads.com" contain
the user query variable from the script in the request.
This means that if your parameters cause adverts to be grabbed from either of these
two locations then they are getting details on what you are searching for.
Your can verify this for yourself by looking at the above script and finding
the entries for these two servers.
Advert Removal
Even though the 'PRO' version has a tick box to turn off adverts, the
assumption was made that the free version probably displays loads of
adverts. Also why would anyone with the pro version have the tick
box turned on - that really puzzles me, apart from if they use the
same dialog and just have it set so it is ticked and disabled in the
free version so the user cannot change it - I will not verify this. But
this gave me an idea, if all versions use the config files then we can
make a new one for the free version, thus removing that part of the
whole advert problem.
So the task was to create a new version of 'copernic.csf' which has the
references to the advert servers removed, because I was not sure of the
effect of returning empty strings, I chose to instead point the requests
to the local machine. This should at least save remote requests and also
save the user the bandwidth in getting the advert images.
This is my version of the script:
FF01
1
TimeStamp=2001-03-09 00:00:00
0015Register
0011ChannelSet="Ad"
0011ChannelSet3="Ad"
0011Version=2525
0011FileVersion=0
0011SoftwareVersions="eng;engplus;engpro;fra;fraplus;frapro"
0016
0015Init
0011UseCookies=True
1001
0011SearchQuerySeparator="+"
1003
0011SelfPromoPercent=0
0011SourceUrl="http://127.0.0.1/"
0011TargetUrl="http://127.0.0.1/"
0011RotationInterval=120000
0016
11A2
We should not forget to change the size value at the end, so set it to the
length of the file minus 0x22, and write the encoded file to 'copernic.csf'.
Also 'updates.copernic.com', 'regcards.copernic.com' and 'www.copernic.com'
should be added to your hosts file as local host, or to the banned list
for your local proxy ;) This is to stop any updates or personal data transfer
from happening. This should stop the software from any phone home tactics and
hopefully should remove all adverts without having to touch any of the code.
After all we are simply using the programs scripts against itself.
I have not tested this but it should work, and I see no reason why it would
not have the desired effect!
Adding a Group
Looking at the decoded .ssf and .csf files you will see that they share the
same scripting language with a few additions. So the thought was, as it
parses all the files in the set directories and not specific ones, could
a new file or files be added and so add engines and groups to the copernic
engine. This would mean that we are no longer tied to the ones they supply
it would also prove how it works.
Using one of the groups file as an example, the following file was created:
FF01
1
TimeStamp=2001-03-15 00:00:00
0015Register
0011_Conv="4002->3999 (01-03-15, 10:58:42)"
0011DisplayName="Custom"
0011DisplayNames("FRA")="Custom French"
0011DisplayNames("DEU")="Custom German"
0011DisplayNames("ITA")="Custom Italian"
0011DisplayNames("ESP")="Custom Spanish"
0011DisplayNames("POR")="Custom Portugese"
0011Description="Custom Search Group"
0011Descriptions("FRA")="Custom Search Group"
0011Descriptions("DEU")="Custom Search Group"
0011Descriptions("ITA")="Custom Search Group"
0011Descriptions("ESP")="Custom Search Group"
0011Descriptions("POR")="Custom Search Group"
0011ResultsPerChannel=10
0011TotalResults=1000
0011Version=3000
0011FileVersion=1
0011AutoUpdate=True
0011SearchType="keywords"
0016
0015AfterDownload
0016
This file was saved as 'Custom.ssf' , encoded using the encode routine
and placed in the 'Categories' directory. Now to run the application
and see if the group is now in the lists. The puzzling thing was that
the group did not appear in the drop down of groups, or the main tab
on the left giving all the groups, but if we do a search and then in
that screen browse the groups it is there at the bottom of the list.
This might be because we have no search engines assigned to this
group. When we find the group setting in the category dialog it shows
no engines under the group. This is a good sign.
Note that the group appears only at the end of the list in the
categories dialog until you have either done a search using that group
or closed the program and reopened it, then it seems to be alpha sorted
into the list.
Adding a Search Engine
So to create a search engine file, I will use searchlores own Namazu
engine as an example, the following file was created:
FF01
1
TimeStamp=2001-03-09 00:00:00
0015Register
0011_Conv="4002->3999 (01-03-09, 10:52:49)"
0011DisplayName="Namazu"
0011HomePage="http://www.searchlores.org/"
0011SupportNew=True
0011Category="Custom"
0011Version=3000
0011FileVersion=2
0011AutoUpdate=True
0011ChannelSet="Custom"
0011ChannelSet3="Custom"
0011SupportOr=True
0011SupportAnd=True
0011SupportQuotes=True
0016
0015Init
0011SourceUrl="http://www.searchlores.org/cgi-bin/search?query="
0011ResultsPerPage=20
100A("")
1004("searchlores.org")
0011Rules("Range").StartMarker="Search Results for"
0011Rules("Range").EndMarker=""
0011Rules("Address").Key=True
0011Rules("Title").StartMarker=">"
0011Rules("Title").EndMarker=""
0011Rules("Title").StartLine=0
0011Rules("Title").NbLines=1
0011Rules("Description").StartMarker=""
0011Rules("Description").EndMarker=""
0011Rules("Description").StartLine=0
0011Rules("Description").NbLines=1
0011SearchQuerySeparator="+"
1003
0016
0015BeforeDownload
1001
1002("query="+SearchQuery)
1002("result=normal")
1002("sort=score")
1002("max=20")
0016
0015AfterDownload
0016
This file was saved as 'Namazu.csf' , encoded using the encode routine
and placed in the 'Categories\Engines' directory. Now to run the application
and see if the group is now in the lists.
Nope the group is not in the normal lists, but is still in the category
dialog, and also if you click on a group to do a search it is in the
dropdown box, and when viewing it you can see the Namazu engine within
the group. So that worked quite well, still have to figure out how to get
it in the quick groups dropdown and the left hand list in the main view.
But I can select the group and also the search engine, and the request does
seem to go out (to local proxy). So the engine configuration and group
configuration will add in any files you place in the app directorys. This
is really nice and opens up a lot of possible routes.
It should be noted that file above file for namazu is not quite complete as
the results parsing bit has been taken from another file and may not match
but the parameters passed in are correct. Examination of the engine configuration
files is recommended as their scripting language allows some very nice things
to be performed and is certainly powerful enough for the task required.
After a bit of looking round the menus in copernic (I had not used it before)
I spotted in the Tools Menu, Options. In options there is a button labelled
'Category Bar' settings. Ok so lets click on it. So ok we have all the other
groups on the right hand side as being part of the category bar (the groups
shortcut menus) and Custom sitting alone on the right hand side (not included)
so this seems simple. Select the group and add it to the other list using
the supplied button, use up or down to put it where you want. Right now exit
from this dialog. LO and BEHOLD the groups list on the right hand side now
contains the group 'Custom' and if we look inside Custom there is 'Namazu'.
So adding groups and engines is now possible with copernic.
Conclusions
My aim was not to take the program apart too much, just to get to the data on
the search engines, without spending hours looking at assembler code.
But during this task I have found many things out about how
this program does other things - some are good and some are bad. There is a lot
of hardcoded bits, especially to do with language and syntax (lexicon) which
cannot be updated by updates as it is hardcoded, or at least that is how it
appears to be. I do not like at all the intrusive phone home features of this
product - at least this product uses the proxy you give it for these requests
and does not try to bypass it like some similar products.
I was very disappointed with the encryption on the data files, mind you the
application was coded in delphi. But seriously you would have thought the
developers would have put a bit more in, after all if you are going to
put some encryption in, at least make it worthwhile.
The task was also made a bit easier by the fact that the filenames and directory
structure of the configuration files told you exactly what group or engine
each file related to and what to expect in each
file. It seems like the author wants you to get the data out of
the program, or at least not make our task too hard.
On hindsight (always a good thing) once it had been decided that the
method of encryption was a substitution cipher, if the request URL's from the
proxy server, the strings from the executable and the details
in the groups files were collected it would have
been possible to do a known plaintext attack on the encoded files
and got enough data to recover the encoding method. This would have worked
equally as well as the path I chose to follow, but might have taken a bit
longer - but would have had the same result and without having to even touch
a disassembler or debugger. I
chose to grab the plaintext from the program, so a whole file of plaintext could
be grabbed in one go, and a translation table built easily but a partial plaintext
lookup generator program would have worked equally as well.
The scripting language they have included interested me most ,it has some nice
ideas in it, even though it seems to have its roots in a BASIC type language.
Bot writers and OSLSE project fans should examine this and how
it works to learn many things. It can provide many pointers and ideas to
programmers of VSL's for Bots and other such programs, as it can be very
versatile and is simple in concept
but offers expandability and flexibility. It also seems a lot more flexible
than a simple macro type vsl, where you include commands into strings and
then parse them out, as in webferret. This is not meant to mean that one is
better or worse than the other, but that both are interesting and that it
would be easier to include the webferret idea into this than the other way
around. From looking at it, it would be
very simple to parse and implement because of its defined structure and
the flexibility of being text based and not some form of microcode. This
also makes it very suitable for inclusion in a format such as XML, as an
embedded script.
Final Thoughts
Firstly I would like to point out that you should try and learn about how your
target works before trying to take it apart, reading the essay you should
hopefully have seen how the clues picked up early on helped later in the
process. While you are installing LOG what the program does. When you run the
program for the first and subsequent times LOG what the program does. These log
files will not cost you anything to make (apart from the time to start filemon
and regmon) and will save you doing it later. Then when a question comes up you
do not have to think - oh I must uninstall and reinstall to get a log of every
change - not all may be removed or put back on - it depends on the program. So
do it the first time. Pick your target and work it, right from the start.
After the script code I realise that I was trying to over complicate matters
and produce some fancy parsing macro type thing for the parsing part of
my bot, seeing this has brought me back to a simple but very expandable
idea, which will be much easier to implement and expand as development
requires. Sometimes it takes seeing another point of view to bring some
clarity to your thoughts and put you back on the right track.
If you are going to write a paper on a subject you normally would research
other works on the same subject first, surely the same should be done if
you are working on some software. This might save you from reinventing the
wheel as a square. I am not saying use their ideas exactly as they do,
but you should observe and learn from them, then create a solution which
brings all the parts most suited to your task together.
I would also like to point out that people tend to download and use software
without really understanding what it does, or what data about them goes where.
You should take care of what software you use and should understand the
hidden datas that they send about you. A prime example is the entry in the
advert request in this product which gives them what you are searching for,
quite apart from the update and regcard information. Most products of this
type seem to conduct this form of activity and the users should be made
aware of this before using the products.
The use of adverts in products is actually robbing, yes robbing the users
of their precious bandwidth, while they are showing adverts you are loosing
bandwidth and I believe that reducing the advert shown to a 1x1 image or
simply hiding the advert is not a solution as you are still using bandwidth
the only proper method of advert removal is to make sure the request never
gets out, or at least not as far as your internet connection.
Disclaimer
I must point out that during the writing of this essay, at no point was Copernic
allowed to interact with the internet in any way shape or form. It has now
been removed from the PC it was installed on and will not be returning.
A lot information was gained from log files, and some reversing of course! ;).
Hope you enjoyed reading.
Copyright (c) 2001, WayOutThere

Back to essays

Back to bots lab
(c) III Millennium: [fravia+], all rights
reserved