TWiki
>
FST Web
>
FstEfficiency
(2017-01-23,
KyleGorman
)
(raw view)
E
dit
A
ttach
---+ <nop>Efficiency By reading the [[FstQuickTour][Quick Tour]] and [[FstConventions][Conventions]] and working through the [[FstExamples][examples]], users typically can create _correct_ implementations of applications. What requires more experience is creating _efficient_ implementations. ---++ Efficient Representations * *FST Type:* The specific [[FstAdvancedUsage#Fst_Types][FstType]] can affect both time and space in an application. * _Mutable:_ =VectorFst= is recommended for general mutations, while =EditFst= can be used for small changes to an existing FST. * _Immutable:_ =ConstFst= uses similar space to =VectorFst= but is much faster to read and write and possibly to access since it has less fragmentation. The =CompactFst= types permit smaller memory and file representations of acceptors, of unweighted acceptors and transducers and weighted and unweighted strings. These types admit optional memory-mapping (coming soon). * _Specialized:_ For some applications, specialized FST types can be created that are tailored to the structure of the automaton. One relatively simple way to do this is to extend the =CompactFst= types by defining a new compactor. A more general, but complex, way is to define an entirely new FST type. For example, =NGramFst= can be used to very compactly represent n-gram language models. This type admits memory-mapping. * *Arc Type:* A particularly simple way to improve the representation size is to use an [[FstAdvancedUsage#Arcs][Arc]] that has =StateId=, =Label=, and =Weight= types appropriately sized to a problem. * *Interface Type:* When creating [[FstAdvancedUsage#State_Iterators][state]] or [[FstAdvancedUsage#Arc_Iterators][arc iterators]], the specific [[FstQuickTour#OperationCallingCpp][interface type]] can greatly affect efficiency. Using the most specific Fst type as the template argument of the iterators will access specializations that avoid virtual function and other overhead of more generic calls. E.g., prefer =ArcIterator<VectorFst<Arc>>(fst, s)= to =ArcIterator<Fst<Arc>>(fst, s)=, =ArcIterator<ExpandedFst<Arc>>(fst, s)=, or =ArcIterator<MutableFst<Arc>>(fst, s)= if performance is critical. ---++ Efficient Algorithms ---++++ General Issues * *Association Order:* Although an operation may be associative, efficiency may greatly vary depending on the order of application. For example, prefer =(A ∪ B) ∪ C= to =A ∪ (B ∪ C)= if =A= and =B= are small and =C= is large. Similar considerations apply to concatenation and intersection/composition. * *Optimization:* What optimizations should be applied is very problem-dependent. Epsilon removal, determinization and minimization of an unweighted automaton will give in the smallest, irredundant result. However this could be exponentially larger than the (non-deterministic) input, so applying some or none of these operations may be preferred. In the weighted and/or transducer case, the input may not even be determinizable although it can be determinized and minimized as an appropriately [[EncodeDecodeDoc][encoded]] automaton. * *Eager vs Delayed Computation:* If only a small portion of the result is visited (e.g., of a composition), then prefer [[FstQuickTour#OperationCallingCpp][delayed]] computation (e.g., invoke =ComposeFst=). On the other hand, if much of the result is visited, then prefer the eager version (e.g., invoke =Compose=) due to lower overhead. * *Caching:* Various operations such as the [[FstQuickTour#OperationCallingCpp][delayed]] FST operations and the [[FstAdvancedUsage#Fst_Types][CompactFst]] classes may use [[FstAdvancedUsage#Caching][caching]] to avoid recomputation on repeated access. The size of the cache and whether (LRU) garbage collection is performed is controlled by options. In some cases, caching can also be disabled per state by [[FstQuickTour#Arc_Iterators][arc iterator]] flags. * *Properties:* An FST maintains stored [[FstAdvancedUsage#Properties][property bits]] such as whether it is an acceptor, acyclic, or input-label sorted. Many algorithms check for various of these properties, which will be computed in linear time if they are not already stored. This linear pass can thus be avoided if the needed bits are already set (e.g., by the user). #AlgoSpecificEfficiency ---++++ Algorithm-Specific Issues * *Composition:* The choice of the [[FstAdvancedUsage#Matchers][matcher]], [[FstAdvancedUsage#Composition_Filters][filter]], and [[FstAdvancedUsage#State_Tables][state table]], all which may be changed by options, can have significant affect on the efficiency of [[ComposeDoc][composition]]: * _Matchers:_ * _SortedMatcher:_ By default, the composition algorithm matches labels using binary search. This requires one or both FSTs to be appropriately label-sorted and this, in turn, requires property checking. If only one FST is sorted or, in fact, only one FST has the sorted property bit set, then that FST will be used for the binary search. In this case, it is important that this FST has the higher out-degree for efficient matching. * _SigmaMatcher,RhoMatcher:_ These matchers can be used to reduce time and space for matching _all_ or _otherwise unmatched_ labels at a state. * _ArcLookAheadMatcher,LabelLookAheadMatcher:_ These matchers can be used to reduce time by checking for a match with the following label or the following non-epsilon label on a path. * _Filters:_ * _SequenceComposeFilter,AltSequenceComposeFilter,MatchComposeFilter:_ These filters control how epsilons are matched and can give very different sized (if equivalent) composition results. * _LookAheadFilter_: needed with lookahead matchers * _State Tables:_ By default, a hash mapping is used from state pair to composition state ID. Other data structures can give better performance in specialized cases. Several alternative state tables are provided by the library. * *Determinization:* The pruning thresholds that can be passed to [[DeterminizeDoc][Determinize]] can greatly help in the determinization of acyclic weighted automata. For example, suppose the input has already been pruned (e.g. with [[PruneDoc][Prune]]) to a weight threshold of θ. If =Determinize= is passed the same threshold, then it will be used _during_ determinization to control it but with an effect the same as if =Prune= were called on the result. Using the same threshold will have an effect since determinization can change the structure of the result. * *Epsilon Removal:* By default, the [[RmEpsilonDoc][epsilon removal]] algorithm reassigns/copies non-epsilon transitions that _follow_ epsilon paths. An alternate choice is to reassign/copy non-epsilon transitions that _precede_ epsilon paths, which can be requested with the =reverse=true= option. This latter choice is equivalent to performing the former on the [[ReverseDoc][reverse]] of the FST. The two methods can give very different sized results depending on the input. * *Shortest Path/Distance:* The shortest path/distance algorithms use a [[FstAdvancedUsage#State_Queues][queue discipline]] to determine the order of expansion of states: * _LifoQueue:_ when the input is unweighted * _StateOrderQueue:_ when the input is topologically-ordered * _TopOrderQueue:_ when the input is acyclic, but not topologically-ordered * _ShortestFirstQueue:_ when the input is cyclic, but has no negative weights * _FifoQueue:_ otherwise * %ICON{hand}% Thus, the queue is selected based on the FST properties. This choice, with possible property testing, can be overridden by passing the queue as an option. With the =ShortestFirstQueue=, the shortest path algorithm can be passed the =first_path=true= option to stop when the first solution is discovered (correct if the weights are between <span style="text-decoration:overline">0</span> and <span style="text-decoration:overline">1</span>). This is a case where delayed computation of the FST being searched may be preferred. -- Main.MichaelRiley - 12 May 2012
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r8
<
r7
<
r6
<
r5
<
r4
|
B
acklinks
|
V
iew topic
|
WYSIWYG
|
M
ore topic actions
Topic revision: r8 - 2017-01-23
-
KyleGorman
FST
Log In
or
Register
FST Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Webs
Contrib
FST
Forum
GRM
Kernel
Main
Sandbox
TWiki
Main
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback