**** 8-bit document **** æøåÆØÅ

FFW - Freetext search For Word wide web

Release 2.2

Copyright (C) 1994, 1995 by

	Telenor R&D, Norway

Please see the COPYRIGHT file for further details.



For new features and bug fixes in 2.2 see the CHANGES file.
Release 2.2 is a bug fix release only, no new functionality is
added.

**** IMPORTANT NOTICE ********
The index format was changed in FFW 2.0. If you are upgrading from
version 1 you MUST rebuild your old indexes from scratch! 

Version 2.1 added a date index file(.dix). You must upgrade your old
2.0 indexes by running ffwdateindex(1).
******************************


Introduction
------------

FFW is a package made to provide easy-to-use searching facilities 
over HTML documents. The output is intended as input to CGI-scripts
providing the user interface. A sample script is provided.
FFW is basically intended to replace similar solutions based on the
Wais search engine.

FFW features:

- Traditional inverted index, considerably smaller than a Wais index.
  On test datasets we have seen FFW indexes at 1/3 the size of a Wais index.
  This of course will depend on data set size and content. 

- Full HTML parsing on input, reserved HTML words are not indexed.
  Input parser can easily be replaced with parser for other formats.

- Low semantic content words like and, or, not, if, etc. can be filtered 
  out of the index to reduce index size. This is done by providing exclusion 
  lists. Also numbers and short/long words can be ignored by index.

- Flexible indexer, can take document list from input, stdin or parameter 
  files.

- Memory conservative merge program allows efficient incremental building of 
  huge indexes. Two FFW indexes can be quickly merged into one. Building
  huge indexes can generally be a problem because indexer program size
  outgrows machine physical memory, leading to excessive paging load. 
  ffwmerge solves this problem.

- Can search in several indexes at the same time.

- Self-contained index, does not need access to the data files to construct
  the user presentation. URL's and document 'title' are stored directly in 
  the index, index server can be totally independent of the server holding
  the documents. No access to the source files needed to present the search
  result to the user.

- Written in compiled C++ for efficiency.

- Searching supports a formal expression grammar with AND, OR, NOT and ().

- Can search for truncated words like ffw* or for any word with just *.

- Documents can be dated. It is then possible to request only documents dated
  in a given time period. Using * as search expression all documents between
  given dates can be fetched.

- Program messages are separated in one file for easy nationalisation.
  Norwegian and English versions are provided.

- Support for using several indexes with one CGI script, no need to use one 
  script for each searchable area.

- 8-bit characters fully supported, HTML character escape codes are changed 
  into their 8-bit ISO8859-1 equivalents where possible. This makes words
  with escape codes in them searchable.

- The size of a search can be delimited by webmaster to restrict resource
  usage on the Web server machine

Platforms and tools
-------------------

FFW is known to compile on 
- SunOS 4.1.3 with gcc 2.6.2
- Solaris 2.3 with gcc 2.5.8
- Linux 1.2.0
- SCO ODT 3.0/UNIX 3.2 release 4.2 with gcc 2.5.8

You'll also need the g++ library.

We do however not expect big problems making it compile on other platforms.
Reports on successful compiles (or fixes needed) are gratefully accepted.

Installation
------------

This package was made using the gcc (2.4.3) compiler with C++, 
so you need a C++ compiler.

From distribution top directory do:

1: Have a look at the toplevel Makefile and change the variables there as 
   needed. You should need to change no other files unless you don't have
   a working makedepend, see step 3. 

   Notice the BINDIR path will be used from directories one level down
   from the toplevel directory, so if you specify a relative path you
   must start with ../ to find the toplevel directory.

2: Decide on Norwegian or English version, do a 'make norwegian'
   or 'make english'. The distribution is default set for english.

3: 'make depend'

   You need to do this only if you are going to make changes in the sources.
   This is not required for a one-time compile.
   
   If your makedepend is badly configured and
   can't find all the necessary includes try using -I statements in OPTIONS
   to tell makedepend where the gcc includes are.

   'make depend' will build the include lists, 'make noinc' will remove them.

4: 'make'

To put the binaries into one binary directory do:

5: 'make install' ( Makefile default is set to ../bin, the toplevel ./bin 
   directory )

Provided you did an install to ./bin and selected english language 
you can do:

6: 'make testbase'

7: 'make tests' ( These should at least not dump core! )

To use FFW:

8: Build an index, see examples in Makefile.

9: Put the sample ffwcgi{.en, .no} script into your cgi-bin and modify 
   the path in it to find your index directory. Make sure you set the x
   bit as required. Invoke it once from the command line (no parameters needed)
   to check it works properly with your Perl interpreter.

10: Invoke it by http://www.your.place/cgi-bin/ffwcgi.en/<indexname>

Assume your indexes are in /indexes that contains index files ix1 and ix2.
You then put "/indexes" into the ffwcgi script in your cgi-bin.
URL's will be like http:://your.machine/cgi-bin/ffwcgi/ix1
which will make searches look in ix1. To look in both ix1 and
ix2 use http://your.machine/cgi-bin/ffwcgi/ix1/ix2 etc.

Look at the manpages for details on the programs. In particular you should
look at the manpage for ffwindex for an explanation of how document names
are mapped into filenames and URLs.

Contents
--------

These are the parts of this distribution:
- ./lib contains the general library doing most of the work
- ./index contains ffwindex, the indexing program
- ./dindex contains ffwdateindex which generates .dix files at need
- ./merge contains ffwmerge, the index merger
- ./search contains ffwsearch, the index search program
- ./stat contains ffwstat, a program to write out various index information
- ./testdata contains a few simple html docs for testing
- ./doc contains documentation in man format
- ./cgi contains some versions of the ffwcgi script

Contacts
--------

You can reach us as ffw@nta.no. Also look at the FFW infobase
http://www.nta.no/produkter/ffw/ffw.html
It contains various material about FFW, including release information
and an overview of reported and fixed bugs in the current release.

Please notice we cannot guarantee fixing bugs or responding to problems, 
on the other hand we don't guarantee we won't!

If you report a bug, please get as much info as possible included in the 
message, info like: compiler, platform and of course what the problem is.

Acknowledgements
----------------

First release of FFW was written by Ken Ronny Schouten and Haiyan Yang on 
a contract from Telenor Research. Responsible at Telenor Research was 
Bård Håfjeld.

Responsible for FFW release 2.2 is Bård Håfjeld.

These persons contributed patches and valuable input:
frutig@nic.embratel.net.br 
Matt Heffron, heffron@falstaff.css.beckman.com

Thanks to everyone else that reported bugs and problems. All the reported 
problems should be fixed now.







