Title: | Parser Combinator for R |
---|---|
Description: | Parser generator for R using combinatory parsers. It is inspired by combinatory parsers developed in Haskell. |
Authors: | Chapman Siu |
Maintainer: | Chapman Siu <chpmn.siu@gmail.com> |
License: | MIT + file LICENSE |
Version: | 0.1.1 |
Built: | 2025-01-08 06:13:57 UTC |
Source: | https://github.com/sourdoughcat/ramble |
%alt%
is the infix notation for the alt
function.%alt%
is the infix notation for the alt
function.
p1 %alt% p2
p1 %alt% p2
p1 |
the first parser |
p2 |
the second parser |
Returns the first parser if it suceeds otherwise the second parser
(item() %alt% succeed("2")) ("abcdef")
(item() %alt% succeed("2")) ("abcdef")
%then%
is the infix operator for the then combinator.%then%
is the infix operator for the then combinator.
p1 %then% p2
p1 %then% p2
p1 |
the first parser |
p2 |
the second parser |
recognises anything that p1
and p2
would if placed in succession.
(item() %then% succeed("123")) ("abc")
(item() %then% succeed("123")) ("abc")
%thentree%
is the infix operator for the then combinator, and it is
the preferred way to use the thentree
operator.%thentree%
is the infix operator for the then combinator, and it is
the preferred way to use the thentree
operator.
p1 %thentree% p2
p1 %thentree% p2
p1 |
the first parser |
p2 |
the second parser |
recognises anything that p1
and p2
would if placed in
succession.
(item() %thentree% succeed("123")) ("abc")
(item() %thentree% succeed("123")) ("abc")
%using%
is the infix operator for using%using%
is the infix operator for using
p %using% f
p %using% f
p |
is the parser to be applied |
f |
is the function to be applied to each result of |
(item() %using% as.numeric) ("1abc")
(item() %using% as.numeric) ("1abc")
Alpha checks for single alphabet character
Alpha(...)
Alpha(...)
... |
additional arguments for the primitives to be parsed |
Digit
, Lower
, Upper
,
AlphaNum
, SpaceCheck
,
String
, ident
, nat
,
space
, token
, identifier
,
natural
, symbol
Alpha()("abc")
Alpha()("abc")
AlphaNum checks for a single alphanumeric character
AlphaNum(...)
AlphaNum(...)
... |
additional arguments for the primitives to be parsed |
Digit
, Lower
, Upper
,
Alpha
, SpaceCheck
,
String
, ident
, nat
,
space
, token
, identifier
,
natural
, symbol
AlphaNum()("123") AlphaNum()("abc123")
AlphaNum()("123") AlphaNum()("abc123")
alt
combinator is similar to alternation in BNF. the parser
(alt(p1, p2))
recognises anything that p1
or p2
would.
The approach taken in this parser follows (Fairbairn86), in which either is
interpretted in a sequential (or exclusive) manner, returning the result of
the first parser to succeed, and failure if neither does.%alt%
is the infix notation for the alt
function, and it is the
preferred way to use the alt
operator.
alt(p1, p2)
alt(p1, p2)
p1 |
the first parser |
p2 |
the second parser |
Returns the first parser if it suceeds otherwise the second parser
(item() %alt% succeed("2")) ("abcdef")
(item() %alt% succeed("2")) ("abcdef")
Digit checks for single digit
Digit(...)
Digit(...)
... |
additional arguments for the primitives to be parsed |
Lower
, Upper
,
Alpha
, AlphaNum
, SpaceCheck
,
String
, ident
, nat
,
space
, token
, identifier
,
natural
, symbol
Digit()("123")
Digit()("123")
ident
is a parser which matches zero or more alphanumeric
characters.ident
is a parser which matches zero or more alphanumeric
characters.
ident()
ident()
Digit
, Lower
, Upper
,
Alpha
, AlphaNum
, SpaceCheck
,
String
, nat
,
space
, token
, identifier
,
natural
, symbol
ident() ("variable1 = 123")
ident() ("variable1 = 123")
item
is a parser that consumes the first character of the string and
returns the rest. If it cannot consume a single character from the string, it
will emit the empty list, indicating the parser has failed.item
is a parser that consumes the first character of the string and
returns the rest. If it cannot consume a single character from the string, it
will emit the empty list, indicating the parser has failed.
item(...)
item(...)
... |
additional arguments for the parser |
item() ("abc") item() ("")
item() ("abc") item() ("")
literal
is a parser for single symbols. It will attempt to match the
single symbol with the first character in the string.literal
is a parser for single symbols. It will attempt to match the
single symbol with the first character in the string.
literal(char)
literal(char)
char |
is the character to be matched |
literal("a") ("abc")
literal("a") ("abc")
Lower checks for single lower case character
Lower(...)
Lower(...)
... |
additional arguments for the primitives to be parsed |
Digit
, Upper
,
Alpha
, AlphaNum
, SpaceCheck
,
String
, ident
, nat
,
space
, token
, identifier
,
natural
, symbol
Lower() ("abc")
Lower() ("abc")
many
matches 0 or more of pattern p
. In BNF notation,
repetition occurs often enough to merit its own abbreviation. When zero or
more repetitions of a phrase p
are admissible, we simply write
p*
. The many
combinator corresponds directly to this operator,
and is defined in much the same way.This implementation of many
differs from (Hutton92) due to the nature
of R's data structures. Since R does not support the concept of a list of
tuples, we must revert to using a list rather than a vector, since all values
in an R vector must be the same datatype.
many(p)
many(p)
p |
is the parser to match 0 or more times. |
Digit <- function(...) {satisfy(function(x) {return(grepl("[0-9]", x))})} many(Digit()) ("123abc") many(Digit()) ("abc")
Digit <- function(...) {satisfy(function(x) {return(grepl("[0-9]", x))})} many(Digit()) ("123abc") many(Digit()) ("abc")
maybe
matches 0 or 1 of pattern p
. In EBNF notation, this
corresponds to a question mark ('?').maybe
matches 0 or 1 of pattern p
. In EBNF notation, this
corresponds to a question mark ('?').
maybe(p)
maybe(p)
p |
is the parser to be matched 0 or 1 times. |
maybe(Digit())("123abc") maybe(Digit())("abc123")
maybe(Digit())("123abc") maybe(Digit())("abc123")
nat
is a parser which matches one or more numeric characters.nat
is a parser which matches one or more numeric characters.
nat()
nat()
Digit
, Lower
, Upper
,
Alpha
, AlphaNum
, SpaceCheck
,
String
, ident
,
space
, token
, identifier
,
natural
, symbol
nat() ("123 + 456")
nat() ("123 + 456")
natural
creates a token parser for natural numbersnatural
creates a token parser for natural numbers
natural(...)
natural(...)
... |
additional arguments for the parser |
Digit
, Lower
, Upper
,
Alpha
, AlphaNum
, SpaceCheck
,
String
, ident
, nat
,
space
, token
, identifier
,
symbol
Ramble allows you to write parsers in a functional manner, inspired by Haskell's Parsec library.
satisfy
is a function which allows us to make parsers that recognise single symbols.satisfy
is a function which allows us to make parsers that recognise single symbols.
satisfy(p)
satisfy(p)
p |
is the predicate to determine if the arbitrary symbol is a member. |
some
matches 1 or more of pattern p
. in BNF notation, repetition occurs often enough to merit its own abbreviation. When zero or
more repetitions of a phrase p
are admissible, we simply write
p+
. The some
combinator corresponds directly to this operator,
and is defined in much the same way.some
matches 1 or more of pattern p
. in BNF notation, repetition occurs often enough to merit its own abbreviation. When zero or
more repetitions of a phrase p
are admissible, we simply write
p+
. The some
combinator corresponds directly to this operator,
and is defined in much the same way.
some(p)
some(p)
p |
is the parser to match 1 or more times. |
Digit <- function(...) {satisfy(function(x) {return(grepl("[0-9]", x))})} some(Digit()) ("123abc")
Digit <- function(...) {satisfy(function(x) {return(grepl("[0-9]", x))})} some(Digit()) ("123abc")
space
matches zero or more space characters.space
matches zero or more space characters.
space()
space()
Digit
, Lower
, Upper
,
Alpha
, AlphaNum
, SpaceCheck
,
String
, ident
, nat
,
token
, identifier
,
natural
, symbol
space() (" abc")
space() (" abc")
SpaceCheck checks for a single space character
SpaceCheck(...)
SpaceCheck(...)
... |
additional arguments for the primitives to be parsed |
Digit
, Lower
, Upper
,
Alpha
, AlphaNum
,
String
, ident
, nat
,
space
, token
, identifier
,
natural
, symbol
SpaceCheck()(" 123")
SpaceCheck()(" 123")
String
is a combinator which allows us to build parsers which
recognise strings of symbols, rather than just single symbolsString
is a combinator which allows us to build parsers which
recognise strings of symbols, rather than just single symbols
String(string)
String(string)
string |
is the string to be matched |
Digit
, Lower
, Upper
,
Alpha
, AlphaNum
, SpaceCheck
,
ident
, nat
,
space
, token
, identifier
,
natural
, symbol
String("123")("123 abc")
String("123")("123 abc")
succeed
is based on the empty string symbol in the BNF notation The
succeed
parser always succeeds, without actually consuming any input
string. Since the outcome of succeed does not depend on its input, its result
value must be pre-detemined, so it is included as an extra parameter.succeed
is based on the empty string symbol in the BNF notation The
succeed
parser always succeeds, without actually consuming any input
string. Since the outcome of succeed does not depend on its input, its result
value must be pre-detemined, so it is included as an extra parameter.
succeed(string)
succeed(string)
string |
the result value of succeed parser |
succeed("1") ("abc")
succeed("1") ("abc")
symbol
creates a token for a symbolsymbol
creates a token for a symbol
symbol(xs)
symbol(xs)
xs |
takes in a string to create a token |
Digit
, Lower
, Upper
,
Alpha
, AlphaNum
, SpaceCheck
,
String
, ident
, nat
,
space
, token
, identifier
,
natural
symbol("[") (" [123]")
symbol("[") (" [123]")
then
combinator corresponds to sequencing in BNF. The parser
(then(p1, p2))
recognises anything that p1
and p2
would
if placed in succession.%then%
is the infix operator for the then combinator, and it is the
preferred way to use the then
operator.
then(p1, p2)
then(p1, p2)
p1 |
the first parser |
p2 |
the second parser |
recognises anything that p1
and p2
would if placed in
succession.
(item() %then% succeed("123")) ("abc")
(item() %then% succeed("123")) ("abc")
thentree
keeps the full tree representation of the results of parsing.
Otherwise, it is identical to then
.thentree
keeps the full tree representation of the results of parsing.
Otherwise, it is identical to then
.
thentree(p1, p2)
thentree(p1, p2)
p1 |
the first parser |
p2 |
the second parser |
recognises anything that p1
and p2
would if placed in
succession.
(item() %thentree% succeed("123")) ("abc")
(item() %thentree% succeed("123")) ("abc")
token
is a new primitive that ignores any space before and after
applying a parser to a token.token
is a new primitive that ignores any space before and after
applying a parser to a token.
token(p)
token(p)
p |
is the parser to have spaces stripped. |
Digit
, Lower
, Upper
,
Alpha
, AlphaNum
, SpaceCheck
,
String
, ident
, nat
,
space
, identifier
,
natural
, symbol
token(ident()) (" variable1 ")
token(ident()) (" variable1 ")
Unlist is the same as unlist, but doesn't recurse all the way to preserve the type. This function is not well optimised.
Unlist(obj)
Unlist(obj)
obj |
is a list to be flatten |
Upper checks for a single upper case character
Upper(...)
Upper(...)
... |
additional arguments for the primitives to be parsed |
Digit
, Lower
,
Alpha
, AlphaNum
, SpaceCheck
,
String
, ident
, nat
,
space
, token
, identifier
,
natural
, symbol
Upper()("Abc")
Upper()("Abc")
using
combinator allows us to manipulate results from a parser, for
example building a parse tree. The parser (p %using% f)
has the same
behaviour as the parser p
, except that the function f
is
applied to each of its result values.%using%
is the infix operator for using
, and it is the
preferred way to use the using
operator.
using(p, f)
using(p, f)
p |
is the parser to be applied |
f |
is the function to be applied to each result of |
The parser (p %using% f)
has the same behaviour as the
parser p
, except that the function f
is applied to each of
its result values.
(item() %using% as.numeric) ("1abc")
(item() %using% as.numeric) ("1abc")