FLEMMv2 is a Perl5 program that performs 
inflectional analysis on French texts which have previously 
been tagged (eg. by the Brill tagger). This is a small program,
(60kb in a zipped format) 
mainly rule-based (i.e. only a 3000 words lexicon is used in order 
to deal with exceptions). It runs on PCs or Workstation, under Unix, 
Linux or Windows95/NT OS.

This is the second version; the differences with the first one are :
- bugs have been fixed,
- the input text can be tagged either with Brill or with TreeTagger, 
  (in the v1.1 only Brill was possible)
- the command line has bezen slightly modified

See below for more details.

============
Availability
============
This program is provided within WinBrill, the 
Windows port of the BRILL tagger (trained for French).

If you want to use  the FLEMM program alone, you need to provide it 
with a tagged text as input. At the time being, the only recognized 
tagset are Brill 
(http://www.ciril.fr/INALF/inalf.presentation/analyseur.htm#Brill)
and Treetagger 
(http://www.ims.uni-stuttgart.de/Tools/DecisionTreeTagger.html). 

===========
Description
===========
FLEMM  computes the lemma of each inflected word (according 
to the tag) and also provided its  main morphological features :
- gender and number for adjectives, determiners, participles
- number for nouns
- gender, number, person and case for pronouns
- number, person, tense, mood and conjugation group for verbs

The array below summarizes the structure of the analysed words, and below 
the potential values of the features:

=====================================================================
GraM. Cat (TAG)   |            Format	    
=====================================================================
verbs (VCJ)       |InflectedWd/Tag:person:nb:tense:mood/Lemma:group/  
participles,      |
adjectives, nouns,|
determiners,      |
relative pronouns |
(VPAR, VNCNT,     |
ADJ1PAR, ADJ2PAR, |
EPAR, APAR, ANCNT,|
ENCNT, ADJ, SBC,  |
DTN, DTC, REL)    |InflectedWd/Tag:gender:nb/Lemma/     
==================================================================
personal pronouns |
(PRO/PRV)	  |InflectedWd/Tag:person:gender:nb:case/Lemma/
==================================================================
Other categories  |Word/Tag/Word
==================================================================

==================================================================
Feature	          |		Possible values	
==================================================================
person			|	1p , 2p , 3p , _
gender			|	m, f, _
nb				|	s, p, _
tense				|	pst, impft, fut, ps
mood				|	ind, subj, cond, imper
group				|	1g, 2g, 3g
case    			|	n, a, d, o, _
==================================================================

Remarks : 
---------
-	"_"  is the undefined value.
-	tense and mood morphosyntactic ones. So "pst, impft, fut, ps" 
	respectively mean "present", "imperfect", "future" and "simple past". 
	As far as "ind, subj, cond, imper", they hold for : "indicative", 
	"subjunctive", "conditional" and "imperative".
-	The case values are : (n)ominative, (a)ccusative, (d)ative and 
	(o)blique.


Ambiguous analyses are factorized as disjunctive sets limited by " {"  
and " } ", and separated by " | ".

Examples (given in the Brill format):
ex1 :  {bruissant/PPRES:m:s/bruisser:1g/|bruissant/PPRES:m:s/bruire:3g/}
ex2 : allions/VCJ:1p:{impft:ind|pst:subj}/aller:3g/


=======
Example
=======
The example below illustrates the output result from a Brill 
input format:

1) Brill tagged Input file :
----------------------------

La/DTN:sg IIIe/ADJ:sg Rpublique/SBC:sg nous/PRV:pl avait/ACJ:sg 
promis/VPAR:sg que/SUB$ la/DTN:sg Premire/SBP:sg Guerre/SBP:sg 
mondiale/ADJ:sg serait/ECJ:sg aussi/ADV la/DTN:sg dernire/SBC:sg 
,/, "/" la/DTN:sg der/SBC:sg des/DTC:pl der/SBC:sg "/" ;/; pour/PREP 
tenir/VNCFF parole/SBC:sg ,/, elle/PRV:sg nous/PRV:pl offrit/VCJ:sg 
la/DTN:sg ligne/SBC:sg Maginot/SBP:sg ,/, qui/REL eut/ACJ:sg l'/DTN:sg 
utilit/SBC:sg que/SUB$ l'_on/PRV:sg sait/VCJ:sg ./. 
Mais/COO il/PRV:sg serait/ECJ:sg malvenu/ADJ:sg de/PREP gloser/VNCFF 
sur/PREP le/DTN:sg pitoyable/ADJ:sg dsastre/SBC:sg de/PREP 1940/CAR ./. 
Mieux/ADV vaut/VCJ:sg se/PRV:++ souvenir/VNCFF de/PREP l'/DTN:sg 
clatante/ADJ:sg victoire/SBC:sg de/PREP 1945/CAR ,/, victoire/SBC:sg 
clbre/ADJ2PAR:sg "/" entre_nous/ADV "/" ,/, puisque/SUB de/PREP 
Gaulle/SBP:sg descendit/VCJ:sg seul/ADJ:sg les/DTN:pl 
Champs-lyses/SBP:pl ,/, Churchill/SBP:sg restant/VNCNT /PREP 
Londres/SBP:sg ,/, Roosevelt/SBP:sg infirme/ADJ:sg /PREP 
Washington/SBP:sg ,/, et/COO Staline/SBP:sg /PREP 
Moscou/SBP:sg ./. 

2) Brill formatted Output file :
--------------------------------

La/DTN:f:s/le IIIe/ADJ:f:s/iii Rpublique/SBC:_:s/rpublique 
nous/PRV:1p:_:p:_/lui avait/ACJ:3p:s:impft:ind/avoir:3g 
promis/VPAR:m:s/promettre que/SUB$/que la/DTN:f:s/le 
Premire/SBP/premire Guerre/SBP/guerre mondiale/ADJ:f:s/mondial 
serait/ECJ:3p:s:pst:cond/tre:3g aussi/ADV/aussi la/DTN:f:s/le 
dernire/SBC:_:s/dernire ,/, "/" la/DTN:f:s/le der/SBC:_:s/der 
des/DTC:_:p/du der/SBC:_:s/der "/" ;/; pour/PREP/pour tenir/VNCFF/tenir
 parole/SBC:_:s/parole ,/, elle/PRV:3p:f:s:{n|d|o}/lui 
nous/PRV:1p:_:p:_/lui offrit/VCJ:3p:s:ps:ind/offrir:3g la/DTN:f:s/le 
ligne/SBC:_:s/ligne Maginot/SBP/maginot ,/, qui/REL:_:_/qui 
eut/ACJ:3p:s:ps:ind/avoir:3g l'/DTN:_:s/le utilit/SBC:_:s/utilit 
que/SUB$/que l'_on/PRV:3p:m:s:_/l'_on 
sait/VCJ:3p:s:pst:ind/savoir:3g ./. 
Mais/COO/mais il/PRV:3p:m:s:n/lui serait/ECJ:3p:s:pst:cond/tre:3g 
malvenu/ADJ:m:s/malvenu de/PREP/de gloser/VNCFF/gloser sur/PREP/sur 
le/DTN:m:s/le pitoyable/ADJ:_:s/pitoyable dsastre/SBC:_:s/dsastre 
de/PREP/de 1940/CAR/1940 ./. 
Mieux/ADV/mieux vaut/VCJ:3p:s:pst:ind/valoir:3g se/PRV:3p:_:_:{a|d}/lui 
souvenir/VNCFF/souvenir de/PREP/de l'/DTN:_:s/le 
clatante/ADJ:f:s/clatant victoire/SBC:_:s/victoire de/PREP/de 
1945/CAR/1945 ,/, victoire/SBC:_:s/victoire 
clbre/ADJ2PAR:f:s/clbrer "/" entre_nous/ADV/entre_nous "/" ,/, 
puisque/SUB/puisque de/PREP/de Gaulle/SBP/gaulle 
descendit/VCJ:3p:s:ps:ind/descendre:3g seul/ADJ:m:s/seul 
les/DTN:_:p/le Champs-lyses/SBP/champs-lyses ,/, 
Churchill/SBP/churchill restant/VNCNT:m:s/rester:1g /PREP/ 
Londres/SBP/londres ,/, Roosevelt/SBP/roosevelt 
infirme/ADJ:_:s/infirme /PREP/ Washington/SBP/washington ,/, 
et/COO/et Staline/SBP/staline /PREP/ Moscou/SBP/moscou ./. 

The example below illustrates the output result from a TreeTagger
input format:

1) TreeTagger tagged Input file :
---------------------------------

C'	PRO:demo:pred	ce
est	VER:pres	tre
en	PRE	en
Egypte	NOM	<unknown>
,	PON:comma	,
vers	PRE	vers
la	DET:def	le
fin	NOM	fin
de	PRE	de
la	DET:def	le
guerre	NOM	guerre
,	PON:comma	,
que	PRO:rela	que
je	PRO:pers:conj	je
fis	VER:simp	faire
la	DET:def	le
connaissance	NOM	connaissance
de	PRE	de
Sophia	NOM	<unknown>
.	PON:sep	.

2) TreeTagger formatted Output file :
-------------------------------------

C'	PRO(demo:pred):3p:_:_:n	ce
est	VER(pres):3p:s:pst:ind	tre:3g
en	PRE	en
Egypte	NOM:_:s	egypte
,	PON:comma	,
vers	PRE	vers
la	DET(def):f:s	le
fin	NOM:_:s	fin
de	PRE	de
la	DET(def):f:s	le
guerre	NOM:_:s	guerre
,	PON:comma	,
que	PRO(rela):_:_	que
je	PRO(pers:conj):1p:_:s:n	lui
fis	VER(simp):{1|2}p:s:ps:ind	faire:3g
la	DET(def):f:s	le
connaissance	NOM:_:s	connaissance
de	PRE	de
Sophia	NOM:_:s	sophia
.	PON:sep	.

=====================
Other Functionalities
=====================
Moreover, FLEMM checks and fixes some segmentation or tagging 
errors. When asked by the user, the detected errors, together 
with the corresponding corrections,  are reported in special 
files.

Examples : 

1) tagging log file
-------------------

phytoplancton / VNCFF ==>  phytoplancton/SBC
phytoplanctivores / ADJ2PAR ==>  phytoplanctivores/ADJ

2) Segmentation log file
-------------------------

,inhibiteurs  est rduit  inhibiteurs (SBC) 

=================
Program structure
=================

flemm.perl
entrees_sorties (reformatting Brill notations)
entrees_sortiesTT (reformatting TreeTagger notations)
lemmatizer
exceptions
EXCEP/

The startup program file is "flemm.perl". Il calls either the 
"entrees_sorties" module or the "entrees_sortiesTT" modules, according 
to which option is given (see below); both modules deal with 
input and output formats, and call in turn the main linguistic 
module: "lemmatizer". 
This module performs morphological analysis and calls the exception 
lists handler ("exceptions" module, which examines the exception 
files in the "EXCEP" directory).

==============
Command line :
==============

perl flemm.perl --entree INPUT_FILE 
               [--repertoire PROGRAM_DIRECTORY]
               [--sortie OUTPUT_FILE]
               [--log |--nolog]
               [--tagger TAGGER_NAME]

All options values (INPUT_FILE, PROGRAM_DIRECTORY, OUTPUT_FILE) are 
global adresses,excepted TAGGER_NAME which is a simple identifier.
INPUT_FILE is mandatory. 
The other options are optional, as indicated by '[ ...]'. 

- if the --repertoire option is not provided, the FLEMM installation 
		  address is supposed to be the current directory ( . ).
- if the --sortie option is not provided, the output file default 
	  value is the INPUT_FILE adress with the " .lemm " extension.
- if --log is given, the INPUT_FILE.seg and INPUT_FILE.etiq are 
	  created and store respectively the segmentation and the 
	  tagging errors detected and corrected by the program. 
	  If either --nolog or no-option is given, no log file 
	  are produced.
- if --tagger is omitted, the program assumes the input file to 
          be tagged with Brill. If any value is given to --tagger 
	  the program assumes the input file to be tagged with 
	  TreeTagger.

Examples of a command line and of the message which is displayed on 
the standard output, before the starting of the analysis process:

Example 1
=========
f:\LEMMAT\PGM> perl flemm.perl --entree f:/DATA/fic1 --log

Valeur par defaut de l'adresse d'installation de FLEMM : .
Fichier d'entree : f:\DATA\fic1

L'etiqueteur est Brill

Par defaut, le fichier de sortie s'appelle : f:\DATA\fic1.lemm
Les fichiers log s'appellent f:\DATA\fic1.etiq (etiquetage) et 
f:\DATA\fic1.seg (segmentation)

Example 2
=========
f:\LEMMAT> perl flemm.perl --entree f:/DATA/fic1 
           --repertoire ./PGM --sortie ./fic1.out --tagger TT


Repertoire d'installation de FLEMM : .\PGM
Fichier d'entree : f:\DATA\fic1

L'etiqueteur est TreeTagger

Fichier de sortie : f:\LEMMAT\fic1.out
Les fichiers log s'appellent f:\DATA\fic1.etiq (etiquetage) et 
f:\DATA\fic1.seg (segmentation)






