Query language
Introduction
The Algorithms Library Toolkit is an opensource project aiming to provide an implementation of algorithms from areas including automata theory, stringology, arbology and others. The project started with bachelor theses of students of Faculty of Information Technology of Czech Technical University in Prague. The original idea behind the library, formerly called Automata Library, is to provide another source material for students – show the implementation of algorithms that manipulate automata. Soon it got extended with algorithms that use automata – mainly from the area of stringology and arbology. Later, the library was used to base implementation of some state-of-the-art algorithms including efficient determinisation of subclass pushdown automata and some tree searching and indexing algorithms.
The library is designed to be mostly about stateless algorithms which manipulate datastructures. Data manipulated by algorithms can vary from automata through regexps to strings and others. The design allows a simple extension of functionality. If a new algorithm manipulates already-existing datatype, it immediately fits into the ecosystem. Of course, new datatypes can be introduced as well. Datatypes and algorithms are designed to be maximally independent of each other, which allows easier maintenance.
Besides using algorithms to manipulate data, the datatypes allow casting when appropriate, i.e. DFA to NFA and similar.
The user interaction with the library is through a built-in interactive command line interface or a graphical interface.
The command line interface aql2
binary allows interaction with all algorithms through an interactive command line interface.
The syntax of the language is similar to shell known from Unix operating systems.
Internally it handles the passing of algorithm’s results to other algorithm’s parameters directly without any transformation or manipulation.
Such an approach allows the fastest interpretation of the algorithm interconnection.
Even though the command line interface tries to hide some implementation details, a user can feel the fact that the implementation language of the library is C++ in some cases.
For example, the command line interface is aware of datatypes and it is also unable to handle templates (used to design datatypes and algorithms) otherwise than statically.
This, however, does not influence the overall functionality.
The command line interface binary at first automatically loads existing datatypes and algorithm provided by the linked libraries. The command line interface also provides a load and an unload command to load and unload an arbitrary library which can contribute to currently available datatypes and algorithms.
The command line interface is planned to be extended to support some procedural like language which may cause a drop of backward compatibility.
A graphical interface allowing to connect algorithms in a complex algorithm is also provided.
The toolkit is extensible and as long as the added datatype or algorithm connects itself to already existing code via casts or conversion algorithms, the extension can benefit from features already implemented.
Structure of the code
The library consists of modules compiled to libraries and some accessors represented by binaries.
Modules provide the functionality – either some core code, datatypes, or algorithms.
Accessors are either the command line interface binary aql2
or graphical interface.
Structure and overall description of modules
Each module can be compiled separately.
It shares a common configuration file used by the CMake generator.
Source files of the module are placed into src
directory.
Modules are tested with appended unit tests.
Unit tests are placed in a separate directory test-src
.
Integration tests are in tests/
subdirectory of the root directory.
Existing modules
The algorithm library toolkit currently consists of three types of modules: core, feature, and experimental.
The core modules include:
alib2std
(C++ standard library extensions),alib2measure
(support for measurements),alib2abstraction
(storage of registered datatypes and algorithms,alib2common
(base datatypes),alib2xml
(xml export and import of basic datatypes), andalib2cli
(command line interface implementation).
The feature modules include
alib2data
(automata, grammar, regexps, and other datatypes),alib2aux
(helper algorithms),alib2str
(parsing and composing a string representation of some datatypes),alib2raw
(parsing and composing of a raw representation of tree and string datatypes),alib2algo
(most of the algorithms).
The experimental modules are available mainly for testing purposes:
alib2data_experimental
(experimental datatypes),alib2algo_experimental
(experimental algorithms),alib2elgo
(more efficient implementation of some algorithms),alib2graph_data
(graph datatypes),alib2graph_algo
(graph algorithms), andalib2dummy
(playground library).
Standard library extensiton
The module alib2std
provides extensions to the C++ standard library.
The extensions are placed in namespace ext
, not to collide with the same classes, functions… in namespace std
and also because the standard disallows placing anything new into the std namespace except specialisations of templated types.
This module exists to simplify some common operations with standard library containers or to serve as a place for backported code from newer standards then used currently. To name a few examples now consider a three-way comparison of standard library containers, a print of containers to standard stream with an overloaded operator, implementation of copy on write shared pointer, and some standard library containers modified to store an array of pointers instead of an array of values, while providing almost the same interface.
To use these extended features use prepared includes.
For example to use extensions of a standard vector include alib2/vector
.
alib2measure
The measuring module contains datastructures needed for measurements.
Statistics of code execution can be measured using this module when measure
include is used.
Measurements are represented by frames where each frame can contain subframes.
Each frame remembers an amount of time spent within the frame (not including the subframes), a difference in memory usage caused by dynamic allocations and deallocations, and value general purpose counters.
alib2abstraction
Abstraction is one of the most fundamental concepts of the library.
Implementation of it is generalised to a separate module to allow reuse if needed.
This module greatly co-operates with alib2cli
module.
Every algorithm, cast operation, and datatype implemented in the algorithm library toolkit is so-called registered to the internal structures of the abstraction module. The registration allows later execution of each algorithm from the command line interface.
The registration is achieved through the construction of global variables placed to unnamed namespaces or via a function call.
The abstraction itself is universally represented by classes OperationAbstraction
and ValueProvider
.
OperationAbstraction
classes can be interlinked together in representation of algorithm (or any other entity) and parameter.
alib2common
Basic datastructures including representation of primitive datatypes, standard containers, exceptions, and common Object
wrapper of any datatype in the library’s type hierarchy is implemented in this module.
Some core functionality like visitor pattern support classes, components classes, and stack trace printing handler of segmentation fault is implemented here as well.
alib2xml
XML is one of the most common communication formats nowadays.
This module provides a base for XML parsing and composing with actual parsing and composing callbacks for basic datatypes from alib2common
module.
This module also provides some registration facility similar to the ones in the abstraction module. XML composing and parsing (including parsing of raw sets of given type) is registered here.
alib2data
Nontrivial datatypes are implemented in this module. The state of implementation is historically like so. The module is a subject of change in later versions of the library so that each group of datatypes is separated into a standalone module. So far everything is in one module.
Datatypes present include some support types like labels and alphabet types, some main datatypes including variants of strings, trees, automata, grammars, regexps, regular tree expressions, and some string or tree indexes.
This module also contains code extending the XML module, hence all datatypes from the module are registered within the XML registration structures.
alib2aux
Auxiliary or helper functions of comparing automata, grammars, or string, and conversion of automata to DOT, GasTeX, LaTeX, or TikZ formats are located in this module. These helper functions are registered as algorithms into the abstraction module.
alib2str
Similar to the alib2xml
, datatypes can be converted to and from strings.
Such string representation is currently designed for finite automata (DFA, NFA, EpsilonNFA and NFA with multiple initial states), grammars (Regular and Context-Free), regular expressions, strings, and trees.
There is also a registration facility present in the module to allow interfacing command line interface.
alib2raw
Some datatypes like strings and trees correspond to some basic file formats. String correspond to any file, tree to XML. This module allows interaction between mentioned datatypes and files of the given format: either reading the file and creating the representation of data or creating the file based on some provided data.
alib2algo
This is the main module where algorithms are implemented. Right now most of the algorithms are present in this module. Algorithms include automata, grammar, regexp operation and conversions. String and tree matching and indexing. Some random data generators are implemented for trees and automata.
Most of the algorithms are also registered so that the command line interface can call those. Those not registered ones are either support algorithms, where it does not make much sense or algorithms accepting something not yet supported on the command line interface level (like nontrivial function callback).
alib2cli
The module is responsible for parsing of the command line interface commands. The module implements a simple LL1 grammar-based parser (and lexer) as recursive descent parser producing an internal representation of the command as an abstract syntax tree. The representation can be converted to graph consisting of abstractions over algorithm, file reads and writes, etc.
The command line interface language is limited, however, the language will be extended to support procedures in future releases. It supports some introspection into the registered algorithm, their overloads, known datatypes, and casts. Execution of colon of commands is line-by-line. Each line can be quite complex, but there is no procedural extension, except variables.
The language is described in more details later.
Concepts used in the implementation
Components
Many data structures are defined as an n-tuple where the components are of different types and have some defined constraints. The components concept is present to aid with the design of data structures having this exact definition. A components concept is still in a stage of proof of concept. It does not support different types than sets and values. When extended, it will support maps, vectors, and trees to allow complete definition of most data structures.
Component as a base of datastructure
To compose a datatype using components requires to derive your datatype from a templated class code::Components
.
The code::Components
class uses the Curiously recurring template pattern (CRTP) in its first template parameter.
Next parameters come three at a time.
The first of the triplet is defining the internal datatype of the component, the second is the component behaviour scheme, and the third is a name (or names) of the component.
template < class SymbolType, class StateType >
class DFA final : public AutomatonBase, public core::Components < DFA < SymbolType, StateType >, ext::set < SymbolType >, module::Set, InputAlphabet, ext::set < StateType >, module::Set, std::tuple < States, FinalStates >, StateType, module::Value, InitialState >;
In this example, the deterministic finite automaton is constructed from a set of symbols represented by InputAlphabet
component, two sets of states represented by States and FinalStates
components, and a state represented by InitialState
component.
The internal type of component must provide a common interface required by the component behaviour scheme.
So far supported schemes are module::Set
and module::Value
.
The component set scheme expects the internal datatype to implement the interface of a set from the standard library.
The component value scheme expects the internal datatype to behave like a primitive type.
The name of the component is specified by a class name, where typically the class used here is incomplete. The class name can be used directly in case the component is contained only once or if more components have the same structure and behaviour in the datatype definition a tuple of class names can be used as well. If more names are provided, the resulting components are instantiated for each name in a set, and they are therefore independent instances.
Access to component content
Depending on the behaviour scheme of the component, each component provides some simplified interface to the underlying data type.
The common operations of all components are get
and set
, to get the data and set the data as a whole.
The module::Set
scheme additionally supports empty
, remove
, and add
, where these are equivalent to standard set operations empty
, erase
, and insert
.
Constraint specification
To maintain consistency of datatypes constructed from components, some constraints need to be specified. These constraints are specified by additional code inside a class specialised with the concrete data type and component name. Given the component behaviour scheme, the constraints may differ.
The module::Value
behavior scheme requires existence of available and valid static methods inside the ElementConstraint
class.
The method available represents a check that the value to be set is available in other related components.
The valid method represents an additional check for the validity of the set object.
If needed, the method is supposed to throw an exception.
template<class SymbolType, class StateType >
class ElementConstraint< automaton::DFA<SymbolType, StateType>, StateType, automaton::InitialState > {
public:
static bool available ( const automaton::DFA<SymbolType, StateType> & automaton, const StateType & state ) {
return automaton.getStates ( ).count ( state );
}
static void valid ( const automaton::DFA<SymbolType, StateType> &, const StateType & ) {
}
};
The module::Set
behavior scheme requires existence of used, available, and valid static methods inside the SetContraint
class.
The methods available and valid are the same as in the case of the module::Value
behaviour scheme.
The additional method used is a representation of a check whether the removed object from the set component is used in other related components.
template<class SymbolType, class StateType >
class SetConstraint< automaton::DFA<SymbolType, StateType>, StateType, automaton::FinalStates > {
public:
static bool used ( const automaton::DFA<SymbolType, StateType> &, const StateType & ) {
return false;
}
static bool available ( const automaton::DFA<SymbolType, StateType> & automaton, const StateType & state ) {
return automaton.getStates ( ).count ( state );
}
static void valid ( const automaton::DFA<SymbolType, StateType> &, const StateType & ) {
}
};
Abstraction
As mentioned, the alib2abstraction
module provides a facility to register algorithms, casts, datatypes, etc.
to its internal structures for later execution on demand from command line interface.
Here in this section more detailed description of the use and internal behaviour of the abstraction concept is provided.
First, the documentation focuses on the overview of the concept. Next, the registration of algorithms, Last, the casts and variables printing and other possibilities.
Abstraction concept overview
To provide an on-demand execution of algorithm based on the algorithm name and algorithm parameters, a lookup within the available algorithms must be possible. The C++ language does not provide any form of introspection that would allow detection of the existence of methods within a class. Similarly, detection of the number of parameters, their types, qualifications of a given method is not available in the C++ language as well.
The abstraction concept is designed to address this limitation of C++ language by registering available algorithms, casts, etc. internally, via some registration calls and maintaining registered callables.
There are two approaches to registration in the abstraction module. Variables of registration class type, when constructed, carry on information about the registered algorithm to the inside of the abstraction module via a call of registration function. Such an approach was chosen to easily hook some code before the execution of the main function so that all registrations are done beforehand at a load time of a shared library. Registration function can also be called directly from any context to avoid the creation of a global variable. Hence any algorithm can be registered at any time before or during the execution of the main function.
Execution of algorithms
The abstraction is in general mostly about algorithms and their execution. The algorithm to execute is specified by its name and parameter types (also with the category which is however unused now). Then a list of overloads of the algorithm is selected from registered ones. The set is filtered based on the number of actual parameters, parameter types. The best candidate is selected and returned for execution. Effectively the overload resolution implemented within the abstraction concept is capable of multiple dispatch.
Such a selection should also take into account casts between actual parameters and formal parameters of individual overloads to establish the quality of overloads. Qualifiers such as const, volatile, etc. and parameter passing by r-value reference, l-value reference, or by value are also limits of viable overloads to investigate further.
The abstraction is not fully implemented yet, but it tries to be as close to C++ language behaviour as possible. The current implementation does not fully implement the selection algorithm, but in final implementation, the selection should mimic the C++ function overload selection scheme completely. So far if the selection finds two or more viable overloads, an error is reported, even though best of those overloads should be selected.
In the registration, the algorithm name is provided via template parameter. Hence the algorithm name is derived from an existing type. The type can be either a class, in that case, the algorithm name is the class name and the algorithm itself is categorised by possible namespaces. The class can be itself templated, in which case, the templates are also registered as part of the name. During the execution of the algorithm, these templates must be specified along with the algorithm name to successfully find it.
Registration of algorithms via constructors
The registration of algorithms is performed via registration::AbstractRegister
class.
The class is templated with types Algorithm
, ReturnType
, and using variadic template also ParameterTypes
.
The Algorithm type will serve as a source of identifier naming the algorithm.
ReturnType
and ParameterTypes
are exact types of returned value and parameters including const
qualifications, reference specifiers, etc.
A constructor of the class accepts a function pointer, optionally algorithm category (unused yet), and optionally strings representing parameter names (up to the number of parameters specified by templates).
Parameter names default to arg0
to argn
.
The function pointer must exactly match the requirements set up by ReturnType
and ParameterTypes
.
auto RegExpEmptyFormalRegExp = registration::AbstractRegister < regexp::properties::RegExpEmpty, bool, const regexp::FormalRegExp < > & > ( regexp::properties::RegExpEmpty::languageIsEmpty );
Already registered algorithm, cast, printer of datatype can be re-registered as an algorithm.
Such a registration is available via registration::WrapperRegister
class.
The class is templated with types Algorithm and ParameterTypes with the same meaning as in the case of AbstractRegister.
The class constructor accepts a function pointer to a delegate function capable of looking up the registered algorithm, cast, …
Also, parameter names can be specified as well.
auto xmlParse = registration::WrapperRegister < xml::Parse, ext::deque < sax::Token > && > ( xml::Parse::abstractionFromTokens, "arg0" );
Last option is registration by a method.
The class allowing this kind of registration is registration::MethodRegister
.
It is parametrised again with types Algorithm
, ReturnType
, additionally ObjectType
, and again ParameterTypes
.
The ObjectType
is a decayed type of object on which to call the registered method.
A constructor of the method registration class accepts a pointer to a method, string representing the method name, and optionally names of parameters.
auto fooBar = registration::MethodRegister < Foo, int, Foo, int > ( & Foo::bar, "bar" );
The registration::AbstractRegister
and registration::MethodRegister
also register the exact return type for a normalisation automatically.
Registration of algorithms via function call
The basic registration is of a callback to the algorithm.
The registration is equivalent to the use of registration::AbstractRegister
class.
The call is to a function abstraction::AlgorithmRegistry::registerAlgorithm
which is parametrized with following types: Algorithm
, ReturnType
and ParameterTypes
.
As parameters, the registration accepts the callback of the algorithm, the category and parameter names.
The same requirements for the callback as specified for the registration via constructor apply.
The parameter names are accepted in the form of a standard C++ array of strings of size equal to the number of parameters.
abstraction::AlgorithmRegistry::registerAlgorithm < Divide > ( Divide::divide, abstraction::AlgorithmCategories::AlgorithmCategory::DEFAULT, std::array < std::string, 2 > ( "numerator", "denominator" ) );
A wrapper of already registered abstraction can be registered as an algorithm via function call too.
The function is abstraction::AlgorithmRegistry::registerWrapper
and it is templated by Algorithm
and ParameterTypes
.
The call accepts a callback to the same delegate function as in the case of registration via a constructor.
Parameter names must be specified by a string array of size equal to the number of parameters (ParameterTypes
template).
abstraction::AlgorithmRegistry::registerWrapper < xml::Compose, const Type \& > ( xml::Compose::abstractionFromType, std::array < std::string, 1 > { { "arg0" } } );
A method can be registered via abstraction::AlgorithmRegistry::registerMethod
function.
The template parameters are Algorithm
, ObjectType
, ReturnType
, and ParameterTypes
.
The call parameters are a method pointer, a method name, and parameter names.
Note that method effectively needs an extra parameter which is the object call the method is called on.
abstraction::AlgorithmRegistry::registerMethod < ComponentName > ( getMethod, "get", emptyNames );
Registration of functions does not register for normalisation.
Registration of casts
Full-featured selection of an algorithm to call, as implemented in C++ overload resolution algorithm, uses information about casts. Abstraction can keep track of casts available for such a purpose. The cast can also be executed manually when algorithm overload resolution would report multiple call candidates.
The cast is, in general, a conversion from one type to another. Again there are two possible registration approaches either via the creation of a global variable or a function call. The cast can later be referenced using the string representing the target type.
Registration of casts via constructor
Registration of cast can be execute via creation of registraion::CastRegister
class.
Template parameters To and From of the class specify the two types participating in the cast.
The class can be constructed either using the default constructor or with a constructor accepting a so-called cast function as a pointer to function.
The cast function must take a single argument of constant reference type identified by From
template parameter and return the type specified by To
template parameter.
Both constructors use the registration via function call internally again and both approaches do register the cast target type for normalisation.
The default constructor requires an expression using standard c-like cast From and To the participating types to be valid.
Registration of casts via function call
The same information about types participating in the cast must be provided to the registration via the function call. There are again overloads allowing either cast by C-like syntax or via a casting function. All overloads need to be templated with the To and From types.
The casting function must produce the To type of the cast.
There are however two overloads of registration of cast function.
One for the parameter being a const
reference to From
type and second for From
type being a value.
This gives in total three overloads of registration functions.
Since the target type is the source of a name used to access the cast operation and the target type name may not be appropriate to use, all three overloads also allow specification of target and source type names.
Component registration
All datatypes are constructed using the unified representation of its internals.
This datatype internals can also be registered as an algorithm.
The registration is possible either by variable construction.
The type of the variable is ComponentRegister
and it is parametrised with the datatype which components are to be registered.
Internally the datatype’s function registerComponent
is called.
The call to this function can serve as an equivalent to function call registration.
Normalization registration
For the purpose of normalisation of results of any operation, the datatype can be registered for normalisation. The need for normalisation was discussed in section Normalisation{reference-type=”ref” reference=”section:normalisation”}.
Registration via constructor
The normalisation registration is achieved by constructing NormalizationRegister
class.
The class conditionally calls registration function if the registration is needed.
The need of registration is computed by examining the result of normalisation function for the datatype parameter.
If they differ, normalisation is needed.
Registration via function call
The registration function call NormalizeRegistry::registerNormalize
can be used directly.
The function is templated with the normalised datatype.
Optionally additional argument specifying the name of the datatype to normalise can be provided.
The name, however, is not used from the cli directly.
Therefore this option has very limited external use.
Container construction registration
Immediate values of primitive types are internally supported, however, containers need some additional care builtin to registration framework. Containers are specified by their type and value. So far construction of sets is implemented. Such registration is possible again via constructor or function call.
Registration via constructor
Using class SetRegister
templated with the resulting set template parameter.
The constructor internally calls the registration function.
Registration via function call
Static function ContainerRegistry::registerSet
templated with the resulting set template parameter internally registers set of given type to be possible to construct.
Value printing registration
In order for the results of algorithms to be easily printed later in the command line interface, the datatype needs to be registered for printing.
An overload of operator<<
allowing the datatype to be printed to the output stream needs to be present for successful registration.
Registration via constructor
A variable of type ValuePrinterRegister
can be used to register a type provided by template parameter for printing.
Internally the constructor calls registration function.
Registration via function call
A function ValuePrinterRegistry::registerValuePrinter
parametrized with type to print can be called directly to register the type for printing as well.
Normalisation
Use of templates in the implementation requires the compiler to instantiate the templated code before it can be used in the executable. This mainly affects algorithms since they are processing templated datatypes. Internal use of algorithms implemented in the library can benefit from templates. The compiler can instantiate the algorithm template while preparing the executable. However, to be able to use templated algorithms in a statically prepared environment, templates need to instantiated for some normalised types.
All datatypes can be normalized by means of specialisation of a templated class core::normalize
.
The class must provide a public static method eval
accepting a templated datatype it normalises and it provides the same datatype with predefined, fixed template types.
template < class SymbolType, class StateType >
struct normalize < automaton::CompactNFA < SymbolType, StateType > > {
static automaton::CompactNFA < > eval ( automaton::CompactNFA < SymbolType, StateType > && value ) {
ext::set < DefaultSymbolType > alphabet = alphabet::SymbolNormalize::normalizeAlphabet ( std::move ( value ).getInputAlphabet ( ) );
ext::set < DefaultStateType > states = automaton::AutomatonNormalize::normalizeStates ( std::move ( value ).getStates ( ) );
DefaultStateType initialState = automaton::AutomatonNormalize::normalizeState ( std::move ( value ).getInitialState ( ) );
ext::set < DefaultStateType > finalStates = automaton::AutomatonNormalize::normalizeStates ( std::move ( value ).getFinalStates ( ) );
automaton::CompactNFA < > res ( std::move ( states ), std::move ( alphabet ), std::move ( initialState ), std::move ( finalStates ) );
for ( std::pair < ext::pair < StateType, ext::vector < SymbolType > >, ext::set < StateType > > && transition : ext::make\_mover ( std::move ( value ).getTransitions ( ) ) ) {
DefaultStateType from = automaton::AutomatonNormalize::normalizeState ( std::move ( transition.first.first ) );
ext::vector < DefaultSymbolType > input = alphabet::SymbolNormalize::normalizeSymbols ( std::move ( transition.first.second ) );
ext::set < DefaultStateType > targets = automaton::AutomatonNormalize::normalizeStates ( std::move ( transition.second ) );
res.addTransitions ( std::move ( from ), std::move ( input ), std::move ( targets ) );
}
return res;
}
};
Many normalisation helpers are prepared for symbols, ranked symbols, states, their sets, etc.
The normalisation class is used after all operations registered via abstraction that produces other than normalised type.
The library in its compiled version has all algorithms prepared for the normalised types by default.
Summary on datastructures
The library provides high-level data structures from the area of stringology, arbology and language processing. Namely, representations for strings and various trees are provided. Finite automata, regular grammars and regular expressions cover the regular languages area. Pushdown automata, context free grammars and their variants cover context free language area. Finally, tree automata and tree regular expressions represent data structures related to tree languages.
All of these data structures are designed as abstract data types, each accepting template parameters to specify types of individual datatype components.
Automata
Automata types provided by the library are available in automata namespace. Finite automata variants include deterministic finite automaton (DFA), nondeterministic and epsilon nondeterministic finite automaton (NFA and EpsilonNFA, respectively), nonterministic and epsilon nondeterministic multi initial state finite automaton (MultiInitialStateNFA and MultiInitialStateEpsilonNFA, respectively), compact finite automaton (CompactNFA), and extended finite automaton (ExtendedNFA).
All of these finite automata accept templates specifying the type of symbols read by the automaton instance and type of states used in the automaton. Additionally, the epsilon variants of automata accept the type of epsilon representation.
Tree automata implemented are those traversing trees starting at its leaves ending in the root, namely deterministic and nondeterministic finite tree automaton (DFTA and NFTA, respectively).
The tree automata accept template parameters similar to the finite automata. Symbols read by the automata are nodes of ranked trees. Therefore the symbol and rank types are needed. State type can also be specified via the template parameter.
Pushdown automata are like finite automata specified by symbols read, states used to represent the automaton, and possibly the type of epsilon instance. However, the pushdown automata use the pushdown store as an additional internal component which itself is parametrised with types of symbols it is using.
Grammars
Grammars are placed to grammar namespace and feature grammars designed to generate regular languages and context free languages. Regular languages correspond to left and right, regular and linear grammars (LeftRG, RightRG, LeftLG, and RightLG, respectively).
Context free grammars are represented by a general context free grammar (CFG), epsilon free context free grammar (EpsilonFreeCFG), linear grammar (LG), and context free grammars in Greibach and Chomsky normal forms (GNF and CNF, respectively)
All these grammars accept the type of terminal symbols and the type of nonterminal symbols.
Reguar (tree) rexpression
To describe regular language, the library provides formal and unbounded regular expressions (FormalRegExp and UnboundedRegExp) inside regexp namespace. These regular expressions are templated with the type of symbols used inside the expression.
The regular tree languages are described by formal regular tree expression (FormalRTE) placed inside rte namespace. The regular tree expressions describe tree languages and are templated with symbol and rank types of trees they describe.
Strings and Trees
The library provides a formal representation of strings (LinearString) placed in string namespace.
Ranked and unranked trees are represented inside the library using their natural and hierarchical structure (RankedTree, UnrankedTree), or using linear representations (PrefixRankedTree, PrefixRankedBarTree, PrefixBarTree, PostfixRankedTree). Additionally datastructures representing tree patterns and nonlinear tree patterns are implemented as hierarchical (RankedPattern, RankedNonlinearPattern, UnrankedPattern, UnrankedNonlinearPattern), or in their linear representations (PrefixRankedPattern, PrefixRankedNonlinearPattern, PrefixRankedBarPattern, PrefixRankedBarNonlinearPattern).
All the tree representations are templated with symbol type and rank type of tree nodes forming the tree.
The query language
This chapter mostly focuses on the description of the current status of the command line interface so-called aql. The command line interface was created to replace the interface represented by multiple binaries. The command line interface uses the abstraction module and registered algorithms. The algorithms are available to be called with appropriate arguments. The language of the command line interface is similar to bash, featuring subqueries and pipes.
Execution parameters
The query language shell binary accepts the following parameters:
-c allows execution of a single command
-f executes commands from a file
-e sets an enviroment variable
-i open interactive mode
The default behavior of the shell binary is to open the interactive mode.
The default behavior is disabled if any of -c
or -f
is specified.
The shell binary will stay in interactive mode if -i
is used with -c
or -f
parameter.
The -c
, -f
and -i
parameters are executed in order of appearance.
The environment variables are set up before any command is executed.
Any quit command executed from parameter, a file, or from interactive shell stops the complete sequence of execution plan.
Language description
The language is described as a grammar, see code.
Introspection
The query language supports the listing of registered algorithms, their overloads, datatypes, and casts. The introspection command is executed by ‘introspect’ keyword followed by the introspected entity, i.e. algorithms, overloads, datatypes, and casts.
Algorithms introspection may use namespace names to narrow the query. The namespace name must end with four-dot. The result of the introspection is a list of all algorithms (filtered by namespace if specified) one per line.
Overloads introspection requires exact algorithm name. The result is a list of signatures of available overloads, one per line. Each signature consists of the return type followed by parameter types.
Datatypes introspection allows printing datatypes that can be exported or imported as an XML file.
Casts introspection prints all available casts. Each cast is represented by to type and from type. The query may be limited to casts from or to the specific type.
An introspection command can list variables available in the environment of the command line interface. The type of variable can be also introspected by suffixing the variable name after introspect variables command.
Environment bindings are handled in the introspection similarly to variables.
Some examples of introspection commands follow.
introspect algorithms
introspect algorithms automaton::
introspect overloads automaton::determinize::Determinize
introspect datatypes
introspect casts
introspect casts :from int
introspect casts :to int
introspect variables
introspect bindings
Execute
The execute and print commands are intended to evaluate a sequence of statements. Statements are vertical bar separated algorithms introduced by the name, casts to specified type, or one can use a variable, file, string or integer constant, binded value, or set constructor. (The set constructor will in the future accompanied with constructor of other common structures.)
The syntax of execute and print commands is similar to the syntax of bash. In the simple case the sequence of statements is simply a sequence of algorithms and their parameters.
print string::Parse @Automaton "NFA a b
>0 0 1
1 - 2
<2 - -" | Determinize -
Result of a print command will be printed to the console, but result of execution can also be printed to a file and later read.
execute string::Parse @Automaton "NFA a b
>0 0 1
1 - 2
<2 - -" | Determinize - > automaton.xml
print < automaton.xml | Minimize -
File reading is parametrized by some clauses, the argument inside the brackets is a specification of the file type, the argument after the colon is the specification of the content of the file, and arguments after at sign give template parameters is needed.
The file type specifications are ‘xml’, ‘string’, ‘raw’, and ‘file’. The ‘xml’ file type contains XML representation of some datatype. The ‘string’ file type represents string representation of an automaton, grammar, regexp, or some other type (available to be listed via introspection. The ‘raw’ file type is a representation of an unranked tree, i.e. XML or linear string, i.e. sequence of chars. The file type reads the content of the file as a standard string.
The string and raw types require the content specification to be set with type argument introduced by a colon. The template parameter is extra information not used now.
execute "abc" | cli::builtin::WriteFile "file.txt" -
print < [ raw ] :string::LinearString file.txt
The use of the in_redirect symbol is overloaded and allows the creation of subquery.
print Determinize <( string::Parse @Automaton "NFA a b
>0 0 1
1 - 2
<2 - -")
Out_redirect knows the type of the value printed, hence only the file type specification is required. The syntax is again an identifier in brackets. Based on the specified file type the argument of out_redirect can be limited to certain types only.
execute Determinize <( string::Parse @Automaton "NFA a b
>0 0 1
1 - 2
<2 - -" ) > [ string ] dfa.txt
The out_redirect is also overloaded to set up a variable. That can later be used as a parameter of an algorithm.
execute Determinize <( string::Parse @Automaton "NFA a b
>0 0 1
1 - 2
<2 - -" ) > \$dfa
print Minimize \$dfa
The command line interface can be exited with quit command that can also report exit status of the execution. The syntax of the quit is same as execute in that case, however only integer and boolean results are interpreted as exit status. Integer result is used directly, boolean value true is interpreted as 0 exit status and false as 1. The behavior matches the expected results of commands in Unix like systems.
Command line examples
Interactive
execute cli::builtin::ReadFile examples2/automaton/DFA.txt | string::Parse @ automaton::Automaton ^ -
Command line execution
./aql2 -e input=../examples2/automaton/DFA.txt -c 'execute cli::builtin::ReadFile #input# > $in' -c 'execute string::Parse @ automaton::Automaton ^ $in'
./aql2 -c 'execute automaton::simplify::efficient::EpsilonRemoverIncoming <#stdin | automaton::determinize::Determinize - | automaton::simplify::Trim - | automaton::simplify::Minimize - | automaton::simplify::Normalize - > #stdout'
echo 'a + a' | ./aql2 -c 'execute cli::builtin::ReadFile \#stdin | string::Parse @ regexp::RegExp ^ - | regexp::convert::ToAutomatonGlushkov (UnboundedRegExp) -'
Execution parameters
test.aql:
execute string::Parse @Automaton "
DFA 0 1
<>0 0 0" > $dfa
print "f1"
print "f2"
$ aql2 -c "print 1" -c "print 2" -f test.aql ; echo "retval -> $?"
1
2
f1
f2
retval -> 0
$ aql2 -c "print 1" -c "print 2" -f test.aql -i ; echo "retval -> $?"
1
2
f1
f2
> print "i"
i
> <EOF>
retval -> 0
$ aql2 -c "print 1" -c "quit 55" -f test.aql -i ; echo "retval -> $?"
1
retval -> 55