Kea 2.5.8
isc::dns::MasterLexer Class Reference

Tokenizer for parsing DNS master files. More...

#include <master_lexer.h>

+ Inheritance diagram for isc::dns::MasterLexer:

Classes

class  LexerError
 Exception thrown from a wrapper version of MasterLexer::getNextToken() for non fatal errors. More...
 
class  ReadError
 Exception thrown when we fail to read from the input stream or file. More...
 

Public Types

enum  Options { NONE = 0 , INITIAL_WS = 1 , QSTRING = 2 , NUMBER = 4 }
 Options for getNextToken. More...
 

Public Member Functions

 MasterLexer ()
 The constructor.
 
 ~MasterLexer ()
 The destructor.
 
const MasterTokengetNextToken (MasterToken::Type expect, bool eol_ok=false)
 Parse the input for the expected type of token.
 
const MasterTokengetNextToken (Options options=NONE)
 Parse and return another token from the input.
 
size_t getPosition () const
 Return the position of lexer in the pushed sources so far.
 
size_t getSourceCount () const
 Get number of sources inside the lexer.
 
size_t getSourceLine () const
 Return the input source line number.
 
std::string getSourceName () const
 Return the name of the current input source name.
 
size_t getTotalSourceSize () const
 Return the total size of pushed sources.
 
void popSource ()
 Stop using the most recently opened input source (file or stream).
 
bool pushSource (const char *filename, std::string *error=0)
 Open a file and make it the current input source of MasterLexer.
 
void pushSource (std::istream &input)
 Make the given stream the current input source of MasterLexer.
 
void ungetToken ()
 Return the last token back to the lexer.
 

Static Public Attributes

static const size_t SOURCE_SIZE_UNKNOWN
 Special value for input source size meaning "unknown".
 

Friends

class master_lexer_internal::State
 

Detailed Description

Tokenizer for parsing DNS master files.

The MasterLexer class provides tokenize interfaces for parsing DNS master files. It understands some special rules of master files as defined in RFC 1035, such as comments, character escaping, and multi-line data, and provides the user application with the actual data in a more convenient form such as a std::string object.

In order to support the $INCLUDE notation, this class is designed to be able to operate on multiple files or input streams in the nested way. The pushSource() and popSource() methods correspond to the push and pop operations.

While this class is public, it is less likely to be used by normal applications; it's mainly expected to be used within this library, specifically by the MasterLoader class and Rdata implementation classes.

Note
The error handling policy of this class is slightly different from that of other classes of this library. We generally throw an exception for an invalid input, whether it's more likely to be a program error or a "user error", which means an invalid input that comes from outside of the library. But, this class returns an error code for some certain types of user errors instead of throwing an exception. Such cases include a syntax error identified by the lexer or a misspelled file name that causes a system error at the time of open. This is based on the assumption that the main user of this class is a parser of master files, where we want to give an option to ignore some non fatal errors and continue the parsing. This will be useful if it just performs overall error checks on a master file. When the (immediate) caller needs to do explicit error handling, exceptions are not that a useful tool for error reporting because we cannot separate the normal and error cases anyway, which would be one major advantage when we use exceptions. And, exceptions are generally more expensive, either when it happens or just by being able to handle with try and catch (depending on the underlying implementation of the exception handling). For these reasons, some of this class does not throw for an error that would be reported as an exception in other classes.

Definition at line 303 of file master_lexer.h.

Member Enumeration Documentation

◆ Options

Options for getNextToken.

A compound option, indicating multiple options are set, can be specified using the logical OR operator (operator|()).

Enumerator
NONE 

No option.

INITIAL_WS 

recognize begin-of-line spaces after an end-of-line

QSTRING 

recognize quoted string

NUMBER 

recognize numeric text as integer

Definition at line 345 of file master_lexer.h.

Constructor & Destructor Documentation

◆ MasterLexer()

isc::dns::MasterLexer::MasterLexer ( )

The constructor.

Exceptions
std::bad_allocInternal resource allocation fails (rare case).

◆ ~MasterLexer()

isc::dns::MasterLexer::~MasterLexer ( )

The destructor.

It internally closes any remaining input sources.

Member Function Documentation

◆ getNextToken() [1/2]

const MasterToken & isc::dns::MasterLexer::getNextToken ( MasterToken::Type  expect,
bool  eol_ok = false 
)

Parse the input for the expected type of token.

This method is a wrapper of the other version, customized for the case where a particular type of token is expected as the next one. More specifically, it's intended to be used to get tokens for RDATA fields. Since most RDATA types of fixed format, the token type is often predictable and the method interface can be simplified.

This method basically works as follows: it gets the type of the expected token, calls the other version of getNextToken(Options), and returns the token if it's of the expected type (due to the usage assumption this should be normally the case). There are some non trivial details though:

  • If the expected type is MasterToken::QSTRING, both quoted and unquoted strings are recognized and returned.
  • A string with quotation marks is not recognized as a
  • MasterToken::STRING. You have to get it as a
  • MasterToken::QSTRING.
  • If the optional eol_ok parameter is true (very rare case), MasterToken::END_OF_LINE and MasterToken::END_OF_FILE are recognized and returned if they are found instead of the expected type of token.
  • If the next token is not of the expected type (including the case a number is expected but it's out of range), ungetToken() is internally called so the caller can re-read that token.
  • If other types or errors (such as unbalanced parentheses) are detected, the erroneous part isn't "ungotten"; the caller can continue parsing after that part.

In some very rare cases where the RDATA has an optional trailing field, the eol_ok parameter would be set to true. This way the caller can handle both cases (the field does or does not exist) by a single call to this method. In all other cases eol_ok should be set to false, and that is the default and can be omitted.

Unlike the other version of getNextToken(Options), this method throws an exception of type LexerError for non fatal errors such as broken syntax or encountering an unexpected type of token. This way the caller can write RDATA parser code without bothering to handle errors for each field. For example, pseudo parser code for MX RDATA would look like this:

const uint32_t pref =
lexer.getNextToken(MasterToken::NUMBER).getNumber();
// check if pref is the uint16_t range; no other check is needed.
const Name mx(lexer.getNextToken(MasterToken::STRING).getString());
@ NUMBER
A decimal number (unsigned 32-bit)
Definition: master_lexer.h:59
@ STRING
A single string.
Definition: master_lexer.h:57
The Name class encapsulates DNS names.
Definition: name.h:219

In the case where LexerError exception is thrown, it's expected to be handled comprehensively for the parser of the RDATA or at a higher layer. The token_ member variable of the corresponding LexerError exception object stores a token of type MasterToken::ERROR that indicates the reason for the error.

Due to the specific intended usage of this method, only a subset of MasterToken::Type values are acceptable for the expect parameter: MasterToken::STRING, MasterToken::QSTRING, and MasterToken::NUMBER. Specifying other values will result in an InvalidParameter exception.

Exceptions
InvalidParameterThe expected token type is not allowed for this method.
LexerErrorThe lexer finds non fatal error or it finds an
otherAnything the other version of getNextToken() can throw.
Parameters
expectExpected type of token. Must be either STRING, QSTRING, or NUMBER.
eol_oktrue iff END_OF_LINE or END_OF_FILE is acceptable.
Returns
The expected type of token.

◆ getNextToken() [2/2]

const MasterToken & isc::dns::MasterLexer::getNextToken ( Options  options = NONE)

Parse and return another token from the input.

It reads a bit of the last opened source and produces another token found in it.

This method does not provide the strong exception guarantee. Generally, if it throws, the object should not be used any more and should be discarded. It was decided all the exceptions thrown from here are serious enough that aborting the loading process is the only reasonable recovery anyway, so the strong exception guarantee is not needed.

Parameters
optionsThe options can be used to modify the tokenization. The method can be made reporting things which are usually ignored by this parameter. Multiple options can be passed at once by bitwise or (eg. option1 | option 2). See description of available options.
Returns
Next token found in the input. Note that the token refers to some internal data in the lexer. It is valid only until getNextToken or ungetToken is called. Also, the token becomes invalid when the lexer is destroyed.
Exceptions
isc::InvalidOperationin case the source is not available. This may mean the pushSource() has not been called yet, or that the current source has been read past the end.
ReadErrorin case there's problem reading from the underlying source (eg. I/O error in the file on the disk).
std::bad_allocin case allocation of some internal resources or the token fail.

Referenced by isc::dns::rdata::generic::Generic::Generic(), isc::dns::rdata::generic::detail::TXTLikeImpl< Type, typeCode >::TXTLikeImpl(), isc::dns::rdata::generic::detail::createNameFromLexer(), and isc::dns::rdata::createRdata().

◆ getPosition()

size_t isc::dns::MasterLexer::getPosition ( ) const

Return the position of lexer in the pushed sources so far.

This method returns the position in terms of the number of recognized characters from all sources that have been pushed by the time of the call. Conceptually, the position in a single source is the offset from the beginning of the file or stream to the current "read cursor" of the lexer. The return value of this method is the sum of the positions in all the pushed sources. If any of the sources has already been popped, the position of the source at the time of the pop operation will be used for the calculation.

If the lexer reaches the end for each of all the pushed sources, the return value should be equal to that of getTotalSourceSize(). It's generally expected that a source is popped when the lexer reaches the end of the source. So, when the application of this class parses all contents of all sources, possibly with multiple pushes and pops, the return value of this method and getTotalSourceSize() should be identical (unless the latter returns SOURCE_SIZE_UNKNOWN). But this is not necessarily guaranteed as the application can pop a source in the middle of parsing it.

Before pushing any source, it returns 0.

The return values of this method and getTotalSourceSize() would give the caller an idea of the progress of the lexer at the time of the call. Note, however, that since it's not predictable whether more sources will be pushed after the call, the progress determined this way may not make much sense; it can only give an informational hint of the progress.

Note that the conceptual "read cursor" would move backward after a call to ungetToken(), in which case this method will return a smaller value. That is, unlike getTotalSourceSize(), return values of this method may not always monotonically increase.

Exceptions
None

Referenced by isc::dns::MasterLoader::MasterLoaderImpl::getPosition().

◆ getSourceCount()

size_t isc::dns::MasterLexer::getSourceCount ( ) const

Get number of sources inside the lexer.

This method never throws.

◆ getSourceLine()

size_t isc::dns::MasterLexer::getSourceLine ( ) const

Return the input source line number.

If there is an opened source, the return value will be a non-0 integer indicating the line number of the current source where the MasterLexer is currently working. The expected usage of this value is to print a helpful error message when parsing fails by specifically identifying the position of the error.

If there is no opened source at the time of the call, this method returns 0.

Exceptions
None
Returns
The current line number of the source (see the description)

Referenced by isc::dns::rdata::createRdata().

◆ getSourceName()

std::string isc::dns::MasterLexer::getSourceName ( ) const

Return the name of the current input source name.

If it's a file, it will be the C string given at the corresponding pushSource() call, that is, its filename. If it's a stream, it will be formatted as "stream-%p" where p is hex representation of the address of the stream object.

If there is no opened source at the time of the call, this method returns an empty string.

Exceptions
std::bad_allocResource allocation failed for string construction (rare case)
Returns
A string representation of the current source (see the description)

Referenced by isc::dns::rdata::createRdata().

◆ getTotalSourceSize()

size_t isc::dns::MasterLexer::getTotalSourceSize ( ) const

Return the total size of pushed sources.

This method returns the sum of the size of sources that have been pushed to the lexer by the time of the call. It would give the caller some hint about the amount of data the lexer is working on.

The size of a normal file is equal to the file size at the time of the source is pushed. The size of other type of input stream is the size of the data available in the stream at the time of the source is pushed.

In some special cases, it's possible that the size of the file or stream is unknown. It happens, for example, if the standard input is associated with a pipe from the output of another process and it's specified as an input source. If the size of some of the pushed source is unknown, this method returns SOURCE_SIZE_UNKNOWN.

The total size won't change when a source is popped. So the return values of this method will monotonically increase or SOURCE_SIZE_UNKNOWN; once it returns SOURCE_SIZE_UNKNOWN, any subsequent call will also result in that value, by the above definition.

Before pushing any source, it returns 0.

Exceptions
None

Referenced by isc::dns::MasterLoader::MasterLoaderImpl::getSize().

◆ popSource()

void isc::dns::MasterLexer::popSource ( )

Stop using the most recently opened input source (file or stream).

If it's a file, the previously opened file will be closed internally. If it's a stream, MasterLexer will simply stop using the stream; the caller can assume it will be never used in MasterLexer thereafter.

This method must not be called when there is no source pushed for MasterLexer. This method is otherwise exception free.

Exceptions
isc::InvalidOperationCalled with no pushed source.

◆ pushSource() [1/2]

bool isc::dns::MasterLexer::pushSource ( const char *  filename,
std::string *  error = 0 
)

Open a file and make it the current input source of MasterLexer.

The opened file can be explicitly closed by the popSource() method; if popSource() is not called within the lifetime of the MasterLexer, it will be closed in the destructor.

In the case possible system errors in opening the file (most likely because of specifying a non-existent or unreadable file), it returns false, and if the optional error parameter is non null, it will be set to a description of the error (any existing content of the string will be discarded). If opening the file succeeds, the given error parameter will be intact.

Note that this method has two styles of error reporting: one by returning false (and setting error optionally) and the other by throwing an exception. See the note for the class description about the distinction.

Exceptions
InvalidParameterfilename is null
Parameters
filenameA non null string specifying a master file
errorIf non null, a placeholder to set error description in case of failure.
Returns
true if pushing the file succeeds; false otherwise.

Referenced by isc::dns::rdata::generic::Generic::Generic(), isc::dns::rdata::generic::detail::TXTLikeImpl< Type, typeCode >::TXTLikeImpl(), isc::dns::MasterLoader::MasterLoaderImpl::pushSource(), and isc::dns::MasterLoader::MasterLoaderImpl::pushStreamSource().

◆ pushSource() [2/2]

void isc::dns::MasterLexer::pushSource ( std::istream &  input)

Make the given stream the current input source of MasterLexer.

The caller still holds the ownership of the passed stream; it's the caller's responsibility to keep it valid as long as it's used in MasterLexer or to release any resource for the stream after that. The caller can explicitly tell MasterLexer to stop using the stream by calling the popSource() method.

The data in input must be complete at the time of this call. The behavior of the lexer is undefined if the caller builds or adds data in input after pushing it.

Except for rare case system errors such as memory allocation failure, this method is generally expected to be exception free. However, it can still throw if it encounters an unexpected failure when it tries to identify the "size" of the input source (see getTotalSourceSize()). It's an unexpected result unless the caller intentionally passes a broken stream; otherwise it would mean some system-dependent unexpected behavior or possibly an internal bug. In these cases it throws an Unexpected exception. Note that this version of the method doesn't return a boolean unlike the other version that takes a file name; since this failure is really unexpected and can be critical, it doesn't make sense to give the caller an option to continue (other than by explicitly catching the exception).

Exceptions
UnexpectedAn unexpected failure happens in initialization.
Parameters
inputAn input stream object that produces textual representation of DNS RRs.

◆ ungetToken()

void isc::dns::MasterLexer::ungetToken ( )

Return the last token back to the lexer.

The method undoes the lasts call to getNextToken(). If you call the getNextToken() again with the same options, it'll return the same token. If the options are different, it may return a different token, but it acts as if the previous getNextToken() was never called.

It is possible to return only one token back in time (you can't call ungetToken() twice in a row without calling getNextToken() in between successfully).

It does not work after change of source (by pushSource or popSource).

Exceptions
isc::InvalidOperationIf called second time in a row or if getNextToken() was not called since the last change of the source.

Friends And Related Function Documentation

◆ master_lexer_internal::State

friend class master_lexer_internal::State
friend

Definition at line 304 of file master_lexer.h.

Member Data Documentation

◆ SOURCE_SIZE_UNKNOWN

const size_t isc::dns::MasterLexer::SOURCE_SIZE_UNKNOWN
static

Special value for input source size meaning "unknown".

This constant value will be used as a return value of getTotalSourceSize() when the size of one of the pushed sources is unknown. Note that this value itself is a valid integer in the range of the type, so there's still a small possibility of ambiguity. In practice, however, the value should be sufficiently large that should eliminate the possibility.

Definition at line 339 of file master_lexer.h.


The documentation for this class was generated from the following file: