mirror of
https://github.com/qpdf/qpdf.git
synced 2025-01-31 02:48:31 +00:00
remove files not needed for building
git-svn-id: svn+q:///qpdf/trunk@767 71b93d88-0707-0410-a8cf-f5a4172ac649
This commit is contained in:
parent
eb355c60c1
commit
f3bf8d3110
File diff suppressed because it is too large
Load Diff
@ -1,185 +0,0 @@
|
||||
Basic Installation
|
||||
==================
|
||||
|
||||
These are generic installation instructions that apply to systems that
|
||||
can run the `configure' shell script - Unix systems and any that imitate
|
||||
it. They are not specific to PCRE. There are PCRE-specific instructions
|
||||
for non-Unix systems in the file NON-UNIX-USE.
|
||||
|
||||
The `configure' shell script attempts to guess correct values for
|
||||
various system-dependent variables used during compilation. It uses
|
||||
those values to create a `Makefile' in each directory of the package.
|
||||
It may also create one or more `.h' files containing system-dependent
|
||||
definitions. Finally, it creates a shell script `config.status' that
|
||||
you can run in the future to recreate the current configuration, a file
|
||||
`config.cache' that saves the results of its tests to speed up
|
||||
reconfiguring, and a file `config.log' containing compiler output
|
||||
(useful mainly for debugging `configure').
|
||||
|
||||
If you need to do unusual things to compile the package, please try
|
||||
to figure out how `configure' could check whether to do them, and mail
|
||||
diffs or instructions to the address given in the `README' so they can
|
||||
be considered for the next release. If at some point `config.cache'
|
||||
contains results you don't want to keep, you may remove or edit it.
|
||||
|
||||
The file `configure.in' is used to create `configure' by a program
|
||||
called `autoconf'. You only need `configure.in' if you want to change
|
||||
it or regenerate `configure' using a newer version of `autoconf'.
|
||||
|
||||
The simplest way to compile this package is:
|
||||
|
||||
1. `cd' to the directory containing the package's source code and type
|
||||
`./configure' to configure the package for your system. If you're
|
||||
using `csh' on an old version of System V, you might need to type
|
||||
`sh ./configure' instead to prevent `csh' from trying to execute
|
||||
`configure' itself.
|
||||
|
||||
Running `configure' takes awhile. While running, it prints some
|
||||
messages telling which features it is checking for.
|
||||
|
||||
2. Type `make' to compile the package.
|
||||
|
||||
3. Optionally, type `make check' to run any self-tests that come with
|
||||
the package.
|
||||
|
||||
4. Type `make install' to install the programs and any data files and
|
||||
documentation.
|
||||
|
||||
5. You can remove the program binaries and object files from the
|
||||
source code directory by typing `make clean'. To also remove the
|
||||
files that `configure' created (so you can compile the package for
|
||||
a different kind of computer), type `make distclean'. There is
|
||||
also a `make maintainer-clean' target, but that is intended mainly
|
||||
for the package's developers. If you use it, you may have to get
|
||||
all sorts of other programs in order to regenerate files that came
|
||||
with the distribution.
|
||||
|
||||
Compilers and Options
|
||||
=====================
|
||||
|
||||
Some systems require unusual options for compilation or linking that
|
||||
the `configure' script does not know about. You can give `configure'
|
||||
initial values for variables by setting them in the environment. Using
|
||||
a Bourne-compatible shell, you can do that on the command line like
|
||||
this:
|
||||
CC=c89 CFLAGS=-O2 LIBS=-lposix ./configure
|
||||
|
||||
Or on systems that have the `env' program, you can do it like this:
|
||||
env CPPFLAGS=-I/usr/local/include LDFLAGS=-s ./configure
|
||||
|
||||
Compiling For Multiple Architectures
|
||||
====================================
|
||||
|
||||
You can compile the package for more than one kind of computer at the
|
||||
same time, by placing the object files for each architecture in their
|
||||
own directory. To do this, you must use a version of `make' that
|
||||
supports the `VPATH' variable, such as GNU `make'. `cd' to the
|
||||
directory where you want the object files and executables to go and run
|
||||
the `configure' script. `configure' automatically checks for the
|
||||
source code in the directory that `configure' is in and in `..'.
|
||||
|
||||
If you have to use a `make' that does not supports the `VPATH'
|
||||
variable, you have to compile the package for one architecture at a time
|
||||
in the source code directory. After you have installed the package for
|
||||
one architecture, use `make distclean' before reconfiguring for another
|
||||
architecture.
|
||||
|
||||
Installation Names
|
||||
==================
|
||||
|
||||
By default, `make install' will install the package's files in
|
||||
`/usr/local/bin', `/usr/local/man', etc. You can specify an
|
||||
installation prefix other than `/usr/local' by giving `configure' the
|
||||
option `--prefix=PATH'.
|
||||
|
||||
You can specify separate installation prefixes for
|
||||
architecture-specific files and architecture-independent files. If you
|
||||
give `configure' the option `--exec-prefix=PATH', the package will use
|
||||
PATH as the prefix for installing programs and libraries.
|
||||
Documentation and other data files will still use the regular prefix.
|
||||
|
||||
In addition, if you use an unusual directory layout you can give
|
||||
options like `--bindir=PATH' to specify different values for particular
|
||||
kinds of files. Run `configure --help' for a list of the directories
|
||||
you can set and what kinds of files go in them.
|
||||
|
||||
If the package supports it, you can cause programs to be installed
|
||||
with an extra prefix or suffix on their names by giving `configure' the
|
||||
option `--program-prefix=PREFIX' or `--program-suffix=SUFFIX'.
|
||||
|
||||
Optional Features
|
||||
=================
|
||||
|
||||
Some packages pay attention to `--enable-FEATURE' options to
|
||||
`configure', where FEATURE indicates an optional part of the package.
|
||||
They may also pay attention to `--with-PACKAGE' options, where PACKAGE
|
||||
is something like `gnu-as' or `x' (for the X Window System). The
|
||||
`README' should mention any `--enable-' and `--with-' options that the
|
||||
package recognizes.
|
||||
|
||||
For packages that use the X Window System, `configure' can usually
|
||||
find the X include and library files automatically, but if it doesn't,
|
||||
you can use the `configure' options `--x-includes=DIR' and
|
||||
`--x-libraries=DIR' to specify their locations.
|
||||
|
||||
Specifying the System Type
|
||||
==========================
|
||||
|
||||
There may be some features `configure' can not figure out
|
||||
automatically, but needs to determine by the type of host the package
|
||||
will run on. Usually `configure' can figure that out, but if it prints
|
||||
a message saying it can not guess the host type, give it the
|
||||
`--host=TYPE' option. TYPE can either be a short name for the system
|
||||
type, such as `sun4', or a canonical name with three fields:
|
||||
CPU-COMPANY-SYSTEM
|
||||
|
||||
See the file `config.sub' for the possible values of each field. If
|
||||
`config.sub' isn't included in this package, then this package doesn't
|
||||
need to know the host type.
|
||||
|
||||
If you are building compiler tools for cross-compiling, you can also
|
||||
use the `--target=TYPE' option to select the type of system they will
|
||||
produce code for and the `--build=TYPE' option to select the type of
|
||||
system on which you are compiling the package.
|
||||
|
||||
Sharing Defaults
|
||||
================
|
||||
|
||||
If you want to set default values for `configure' scripts to share,
|
||||
you can create a site shell script called `config.site' that gives
|
||||
default values for variables like `CC', `cache_file', and `prefix'.
|
||||
`configure' looks for `PREFIX/share/config.site' if it exists, then
|
||||
`PREFIX/etc/config.site' if it exists. Or, you can set the
|
||||
`CONFIG_SITE' environment variable to the location of the site script.
|
||||
A warning: not all `configure' scripts look for a site script.
|
||||
|
||||
Operation Controls
|
||||
==================
|
||||
|
||||
`configure' recognizes the following options to control how it
|
||||
operates.
|
||||
|
||||
`--cache-file=FILE'
|
||||
Use and save the results of the tests in FILE instead of
|
||||
`./config.cache'. Set FILE to `/dev/null' to disable caching, for
|
||||
debugging `configure'.
|
||||
|
||||
`--help'
|
||||
Print a summary of the options to `configure', and exit.
|
||||
|
||||
`--quiet'
|
||||
`--silent'
|
||||
`-q'
|
||||
Do not print messages saying which checks are being made. To
|
||||
suppress all normal output, redirect it to `/dev/null' (any error
|
||||
messages will still be shown).
|
||||
|
||||
`--srcdir=DIR'
|
||||
Look for the package's source code in directory DIR. Usually
|
||||
`configure' can determine that directory automatically.
|
||||
|
||||
`--version'
|
||||
Print the version of Autoconf used to generate the `configure'
|
||||
script, and exit.
|
||||
|
||||
`configure' also accepts some other, not widely useful, options.
|
@ -1,279 +0,0 @@
|
||||
|
||||
# Makefile.in for PCRE (Perl-Compatible Regular Expression) library.
|
||||
|
||||
|
||||
#############################################################################
|
||||
|
||||
# PCRE is developed on a Unix system. I do not use Windows or Macs, and know
|
||||
# nothing about building software on them. Although the code of PCRE should
|
||||
# be very portable, the building system in this Makefile is designed for Unix
|
||||
# systems. However, there are features that have been supplied to me by various
|
||||
# people that should make it work on MinGW and Cygwin systems.
|
||||
|
||||
# This setting enables Unix-style directory scanning in pcregrep, triggered
|
||||
# by the -f option. Maybe one day someone will add code for other systems.
|
||||
|
||||
PCREGREP_OSTYPE=-DIS_UNIX
|
||||
|
||||
#############################################################################
|
||||
|
||||
|
||||
#---------------------------------------------------------------------------#
|
||||
# The following lines are modified by "configure" to insert data that it is #
|
||||
# given in its arguments, or which it finds out for itself. #
|
||||
#---------------------------------------------------------------------------#
|
||||
|
||||
SHELL = @SHELL@
|
||||
prefix = @prefix@
|
||||
exec_prefix = @exec_prefix@
|
||||
top_srcdir = @top_srcdir@
|
||||
|
||||
mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs
|
||||
|
||||
# NB: top_builddir is not referred to directly below, but it is used in the
|
||||
# setting of $(LIBTOOL), so don't remove it!
|
||||
|
||||
top_builddir = .
|
||||
|
||||
# BINDIR is the directory in which the pcregrep, pcretest, and pcre-config
|
||||
# commands are installed.
|
||||
# INCDIR is the directory in which the public header files pcre.h and
|
||||
# pcreposix.h are installed.
|
||||
# LIBDIR is the directory in which the libraries are installed.
|
||||
# MANDIR is the directory in which the man pages are installed.
|
||||
|
||||
BINDIR = @bindir@
|
||||
LIBDIR = @libdir@
|
||||
INCDIR = @includedir@
|
||||
MANDIR = @mandir@
|
||||
|
||||
# EXEEXT is set by configure to the extention of an executable file
|
||||
# OBJEXT is set by configure to the extention of an object file
|
||||
# The BUILD_* equivalents are the same but for the host we're building on
|
||||
|
||||
EXEEXT = @EXEEXT@
|
||||
OBJEXT = @OBJEXT@
|
||||
# Note that these are just here to have a convenient place to look at the
|
||||
# outcome.
|
||||
BUILD_EXEEXT = @BUILD_EXEEXT@
|
||||
BUILD_OBJEXT = @BUILD_OBJEXT@
|
||||
|
||||
# The compiler, C flags, preprocessor flags, etc
|
||||
|
||||
CC = @CC@
|
||||
CFLAGS = @CFLAGS@
|
||||
CPPFLAGS = @CPPFLAGS@
|
||||
|
||||
CC_FOR_BUILD = @CC_FOR_BUILD@
|
||||
CFLAGS_FOR_BUILD = @CFLAGS_FOR_BUILD@
|
||||
CPPFLAGS_FOR_BUILD = @CPPFLAGS_FOR_BUILD@
|
||||
|
||||
UTF8 = @UTF8@
|
||||
NEWLINE = @NEWLINE@
|
||||
POSIX_MALLOC_THRESHOLD = @POSIX_MALLOC_THRESHOLD@
|
||||
LINK_SIZE = @LINK_SIZE@
|
||||
MATCH_LIMIT = @MATCH_LIMIT@
|
||||
NO_RECURSE = @NO_RECURSE@
|
||||
EBCDIC = @EBCDIC@
|
||||
|
||||
INSTALL = @INSTALL@
|
||||
INSTALL_DATA = @INSTALL_DATA@
|
||||
|
||||
# LIBTOOL enables the building of shared and static libraries. It is set up
|
||||
# to do one or the other or both by ./configure.
|
||||
|
||||
LIBTOOL = @LIBTOOL@
|
||||
LTCOMPILE = $(LIBTOOL) --mode=compile $(CC) -c $(CFLAGS) -I. -I$(top_srcdir) $(NEWLINE) $(LINK_SIZE) $(MATCH_LIMIT) $(NO_RECURSE) $(EBCDIC)
|
||||
@ON_WINDOWS@LINK = $(CC) $(CFLAGS) -I. -I$(top_srcdir) -L.libs
|
||||
@NOT_ON_WINDOWS@LINK = $(LIBTOOL) --mode=link $(CC) $(CFLAGS) -I. -I$(top_srcdir)
|
||||
LINKLIB = $(LIBTOOL) --mode=link $(CC) $(CFLAGS) -I. -I$(top_srcdir)
|
||||
LINK_FOR_BUILD = $(LIBTOOL) --mode=link $(CC_FOR_BUILD) $(CFLAGS_FOR_BUILD) -I. -I$(top_srcdir)
|
||||
|
||||
# These are the version numbers for the shared libraries
|
||||
|
||||
PCRELIBVERSION = @PCRE_LIB_VERSION@
|
||||
PCREPOSIXLIBVERSION = @PCRE_POSIXLIB_VERSION@
|
||||
|
||||
##############################################################################
|
||||
|
||||
|
||||
OBJ = maketables.@OBJEXT@ get.@OBJEXT@ study.@OBJEXT@ pcre.@OBJEXT@ @POSIX_OBJ@
|
||||
LOBJ = maketables.lo get.lo study.lo pcre.lo @POSIX_LOBJ@
|
||||
|
||||
all: libpcre.la @POSIX_LIB@ pcretest@EXEEXT@ pcregrep@EXEEXT@ @ON_WINDOWS@ winshared
|
||||
|
||||
pcregrep@EXEEXT@: libpcre.la pcregrep.@OBJEXT@ @ON_WINDOWS@ winshared
|
||||
$(LINK) -o pcregrep@EXEEXT@ pcregrep.@OBJEXT@ libpcre.la
|
||||
|
||||
pcretest@EXEEXT@: libpcre.la @POSIX_LIB@ pcretest.@OBJEXT@ @ON_WINDOWS@ winshared
|
||||
$(LINK) $(PURIFY) $(EFENCE) -o pcretest@EXEEXT@ pcretest.@OBJEXT@ \
|
||||
libpcre.la @POSIX_LIB@
|
||||
|
||||
libpcre.la: $(OBJ)
|
||||
-rm -f libpcre.la
|
||||
$(LINKLIB) -rpath $(LIBDIR) -version-info \
|
||||
'$(PCRELIBVERSION)' -o libpcre.la $(LOBJ)
|
||||
|
||||
libpcreposix.la: libpcre.la pcreposix.@OBJEXT@
|
||||
-rm -f libpcreposix.la
|
||||
$(LINKLIB) -rpath $(LIBDIR) libpcre.la -version-info \
|
||||
'$(PCREPOSIXLIBVERSION)' -o libpcreposix.la pcreposix.lo
|
||||
|
||||
pcre.@OBJEXT@: $(top_srcdir)/chartables.c $(top_srcdir)/pcre.c \
|
||||
$(top_srcdir)/internal.h $(top_srcdir)/printint.c \
|
||||
pcre.h config.h Makefile
|
||||
$(LTCOMPILE) $(UTF8) $(POSIX_MALLOC_THRESHOLD) $(top_srcdir)/pcre.c
|
||||
|
||||
pcreposix.@OBJEXT@: $(top_srcdir)/pcreposix.c $(top_srcdir)/pcreposix.h \
|
||||
$(top_srcdir)/internal.h pcre.h config.h Makefile
|
||||
$(LTCOMPILE) $(POSIX_MALLOC_THRESHOLD) $(top_srcdir)/pcreposix.c
|
||||
|
||||
maketables.@OBJEXT@: $(top_srcdir)/maketables.c $(top_srcdir)/internal.h \
|
||||
pcre.h config.h Makefile
|
||||
$(LTCOMPILE) $(top_srcdir)/maketables.c
|
||||
|
||||
get.@OBJEXT@: $(top_srcdir)/get.c $(top_srcdir)/internal.h \
|
||||
pcre.h config.h Makefile
|
||||
$(LTCOMPILE) $(top_srcdir)/get.c
|
||||
|
||||
study.@OBJEXT@: $(top_srcdir)/study.c $(top_srcdir)/internal.h \
|
||||
pcre.h config.h Makefile
|
||||
$(LTCOMPILE) $(UTF8) $(top_srcdir)/study.c
|
||||
|
||||
pcretest.@OBJEXT@: $(top_srcdir)/pcretest.c $(top_srcdir)/internal.h \
|
||||
$(top_srcdir)/printint.c \
|
||||
pcre.h config.h Makefile
|
||||
$(CC) -c $(CFLAGS) -I. $(UTF8) $(LINK_SIZE) $(top_srcdir)/pcretest.c
|
||||
|
||||
pcregrep.@OBJEXT@: $(top_srcdir)/pcregrep.c pcre.h Makefile config.h
|
||||
$(CC) -c $(CFLAGS) -I. $(UTF8) $(PCREGREP_OSTYPE) $(top_srcdir)/pcregrep.c
|
||||
|
||||
# Some Windows-specific targets for MinGW. Do not use for Cygwin.
|
||||
|
||||
winshared : .libs/@WIN_PREFIX@pcre.dll .libs/@WIN_PREFIX@pcreposix.dll
|
||||
|
||||
.libs/@WIN_PREFIX@pcre.dll : libpcre.la
|
||||
$(CC) $(CFLAGS) -shared -o $@ \
|
||||
-Wl,--whole-archive .libs/libpcre.a \
|
||||
-Wl,--out-implib,.libs/libpcre.dll.a \
|
||||
-Wl,--output-def,.libs/@WIN_PREFIX@pcre.dll-def \
|
||||
-Wl,--export-all-symbols \
|
||||
-Wl,--no-whole-archive
|
||||
sed -e "s#dlname=''#dlname='../bin/@WIN_PREFIX@pcre.dll'#" \
|
||||
-e "s#library_names=''#library_names='libpcre.dll.a'#" \
|
||||
< .libs/libpcre.lai > .libs/libpcre.lai.tmp && \
|
||||
mv .libs/libpcre.lai.tmp .libs/libpcre.lai
|
||||
sed -e "s#dlname=''#dlname='../bin/@WIN_PREFIX@pcre.dll'#" \
|
||||
-e "s#library_names=''#library_names='libpcre.dll.a'#" \
|
||||
< libpcre.la > libpcre.la.tmp && \
|
||||
mv libpcre.la.tmp libpcre.la
|
||||
|
||||
|
||||
.libs/@WIN_PREFIX@pcreposix.dll: libpcreposix.la libpcre.la
|
||||
$(CC) $(CFLAGS) -shared -o $@ \
|
||||
-Wl,--whole-archive .libs/libpcreposix.a \
|
||||
-Wl,--out-implib,.libs/@WIN_PREFIX@pcreposix.dll.a \
|
||||
-Wl,--output-def,.libs/@WIN_PREFIX@libpcreposix.dll-def \
|
||||
-Wl,--export-all-symbols \
|
||||
-Wl,--no-whole-archive .libs/libpcre.a
|
||||
sed -e "s#dlname=''#dlname='../bin/@WIN_PREFIX@pcreposix.dll'#" \
|
||||
-e "s#library_names=''#library_names='libpcreposix.dll.a'#"\
|
||||
< .libs/libpcreposix.lai > .libs/libpcreposix.lai.tmp && \
|
||||
mv .libs/libpcreposix.lai.tmp .libs/libpcreposix.lai
|
||||
sed -e "s#dlname=''#dlname='../bin/@WIN_PREFIX@pcreposix.dll'#" \
|
||||
-e "s#library_names=''#library_names='libpcreposix.dll.a'#"\
|
||||
< libpcreposix.la > libpcreposix.la.tmp && \
|
||||
mv libpcreposix.la.tmp libpcreposix.la
|
||||
|
||||
|
||||
wininstall : winshared
|
||||
$(mkinstalldirs) $(DESTDIR)$(LIBDIR)
|
||||
$(mkinstalldirs) $(DESTDIR)$(BINDIR)
|
||||
$(INSTALL) .libs/@WIN_PREFIX@pcre.dll $(DESTDIR)$(BINDIR)/@WIN_PREFIX@pcre.dll
|
||||
$(INSTALL) .libs/@WIN_PREFIX@pcreposix.dll $(DESTDIR)$(BINDIR)/@WIN_PREFIX@pcreposix.dll
|
||||
$(INSTALL) .libs/@WIN_PREFIX@libpcreposix.dll.a $(DESTDIR)$(LIBDIR)/@WIN_PREFIX@libpcreposix.dll.a
|
||||
$(INSTALL) .libs/@WIN_PREFIX@libpcre.dll.a $(DESTDIR)$(LIBDIR)/@WIN_PREFIX@libpcre.dll.a
|
||||
-strip -g $(DESTDIR)$(BINDIR)/@WIN_PREFIX@pcre.dll
|
||||
-strip -g $(DESTDIR)$(BINDIR)/@WIN_PREFIX@pcreposix.dll
|
||||
-strip $(DESTDIR)$(BINDIR)/pcregrep@EXEEXT@
|
||||
-strip $(DESTDIR)$(BINDIR)/pcretest@EXEEXT@
|
||||
|
||||
# An auxiliary program makes the default character table source
|
||||
|
||||
$(top_srcdir)/chartables.c: dftables
|
||||
./dftables $(top_srcdir)/chartables.c
|
||||
|
||||
dftables.@BUILD_OBJEXT@: $(top_srcdir)/dftables.c $(top_srcdir)/maketables.c \
|
||||
$(top_srcdir)/internal.h pcre.h config.h Makefile
|
||||
$(CC_FOR_BUILD) -c $(CFLAGS_FOR_BUILD) -I. $(top_srcdir)/dftables.c
|
||||
|
||||
dftables: dftables.@BUILD_OBJEXT@
|
||||
$(LINK_FOR_BUILD) -o dftables dftables.@OBJEXT@
|
||||
|
||||
install: all @ON_WINDOWS@ wininstall
|
||||
@NOT_ON_WINDOWS@ $(mkinstalldirs) $(DESTDIR)$(LIBDIR)
|
||||
@NOT_ON_WINDOWS@ echo "$(LIBTOOL) --mode=install $(INSTALL) libpcre.la $(DESTDIR)$(LIBDIR)/libpcre.la"
|
||||
@NOT_ON_WINDOWS@ $(LIBTOOL) --mode=install $(INSTALL) libpcre.la $(DESTDIR)$(LIBDIR)/libpcre.la
|
||||
@NOT_ON_WINDOWS@ echo "$(LIBTOOL) --mode=install $(INSTALL) libpcreposix.la $(DESTDIR)$(LIBDIR)/libpcreposix.la"
|
||||
@NOT_ON_WINDOWS@ $(LIBTOOL) --mode=install $(INSTALL) libpcreposix.la $(DESTDIR)$(LIBDIR)/libpcreposix.la
|
||||
@NOT_ON_WINDOWS@ $(LIBTOOL) --finish $(DESTDIR)$(LIBDIR)
|
||||
$(mkinstalldirs) $(DESTDIR)$(INCDIR)
|
||||
$(INSTALL_DATA) pcre.h $(DESTDIR)$(INCDIR)/pcre.h
|
||||
$(INSTALL_DATA) $(top_srcdir)/pcreposix.h $(DESTDIR)$(INCDIR)/pcreposix.h
|
||||
$(mkinstalldirs) $(DESTDIR)$(MANDIR)/man3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre.3 $(DESTDIR)$(MANDIR)/man3/pcre.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcreapi.3 $(DESTDIR)$(MANDIR)/man3/pcreapi.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcrebuild.3 $(DESTDIR)$(MANDIR)/man3/pcrebuild.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcrecallout.3 $(DESTDIR)$(MANDIR)/man3/pcrecallout.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcrecompat.3 $(DESTDIR)$(MANDIR)/man3/pcrecompat.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcrepattern.3 $(DESTDIR)$(MANDIR)/man3/pcrepattern.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcreperform.3 $(DESTDIR)$(MANDIR)/man3/pcreperform.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcreposix.3 $(DESTDIR)$(MANDIR)/man3/pcreposix.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcresample.3 $(DESTDIR)$(MANDIR)/man3/pcresample.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_compile.3 $(DESTDIR)$(MANDIR)/man3/pcre_compile.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_config.3 $(DESTDIR)$(MANDIR)/man3/pcre_config.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_copy_named_substring.3 $(DESTDIR)$(MANDIR)/man3/pcre_copy_named_substring.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_copy_substring.3 $(DESTDIR)$(MANDIR)/man3/pcre_copy_substring.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_exec.3 $(DESTDIR)$(MANDIR)/man3/pcre_exec.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_free_substring.3 $(DESTDIR)$(MANDIR)/man3/pcre_free_substring.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_free_substring_list.3 $(DESTDIR)$(MANDIR)/man3/pcre_free_substring_list.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_fullinfo.3 $(DESTDIR)$(MANDIR)/man3/pcre_fullinfo.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_get_named_substring.3 $(DESTDIR)$(MANDIR)/man3/pcre_get_named_substring.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_get_stringnumber.3 $(DESTDIR)$(MANDIR)/man3/pcre_get_stringnumber.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_get_substring.3 $(DESTDIR)$(MANDIR)/man3/pcre_get_substring.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_get_substring_list.3 $(DESTDIR)$(MANDIR)/man3/pcre_get_substring_list.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_info.3 $(DESTDIR)$(MANDIR)/man3/pcre_info.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_maketables.3 $(DESTDIR)$(MANDIR)/man3/pcre_maketables.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_study.3 $(DESTDIR)$(MANDIR)/man3/pcre_study.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_version.3 $(DESTDIR)$(MANDIR)/man3/pcre_version.3
|
||||
$(mkinstalldirs) $(DESTDIR)$(MANDIR)/man1
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcregrep.1 $(DESTDIR)$(MANDIR)/man1/pcregrep.1
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcretest.1 $(DESTDIR)$(MANDIR)/man1/pcretest.1
|
||||
$(mkinstalldirs) $(DESTDIR)$(BINDIR)
|
||||
$(LIBTOOL) --mode=install $(INSTALL) pcregrep@EXEEXT@ $(DESTDIR)$(BINDIR)/pcregrep@EXEEXT@
|
||||
$(LIBTOOL) --mode=install $(INSTALL) pcretest@EXEEXT@ $(DESTDIR)$(BINDIR)/pcretest@EXEEXT@
|
||||
$(INSTALL) pcre-config $(DESTDIR)$(BINDIR)/pcre-config
|
||||
|
||||
# We deliberately omit dftables and chartables.c from 'make clean'; once made
|
||||
# chartables.c shouldn't change, and if people have edited the tables by hand,
|
||||
# you don't want to throw them away.
|
||||
|
||||
clean:; -rm -rf *.@OBJEXT@ *.lo *.a *.la .libs pcretest@EXEEXT@ pcregrep@EXEEXT@ testtry
|
||||
|
||||
# But "make distclean" should get back to a virgin distribution
|
||||
|
||||
distclean: clean
|
||||
-rm -f chartables.c libtool pcre-config pcre.h \
|
||||
Makefile config.h config.status config.log config.cache
|
||||
|
||||
check: runtest
|
||||
|
||||
@WIN_PREFIX@pcre.dll : winshared
|
||||
cp .libs/@WIN_PREFIX@pcre.dll .
|
||||
|
||||
test: runtest
|
||||
|
||||
runtest: all @ON_WINDOWS@ @WIN_PREFIX@pcre.dll
|
||||
@./RunTest
|
||||
|
||||
# End
|
@ -1,154 +0,0 @@
|
||||
News about PCRE releases
|
||||
------------------------
|
||||
|
||||
Release 4.5 01-Dec-03
|
||||
---------------------
|
||||
|
||||
Again mainly a bug-fix and tidying release, with only a couple of new features:
|
||||
|
||||
1. It's possible now to compile PCRE so that it does not use recursive
|
||||
function calls when matching. Instead it gets memory from the heap. This slows
|
||||
things down, but may be necessary on systems with limited stacks.
|
||||
|
||||
2. UTF-8 string checking has been tightened to reject overlong sequences and to
|
||||
check that a starting offset points to the start of a character. Failure of the
|
||||
latter returns a new error code: PCRE_ERROR_BADUTF8_OFFSET.
|
||||
|
||||
3. PCRE can now be compiled for systems that use EBCDIC code.
|
||||
|
||||
|
||||
Release 4.4 21-Aug-03
|
||||
---------------------
|
||||
|
||||
This is mainly a bug-fix and tidying release. The only new feature is that PCRE
|
||||
checks UTF-8 strings for validity by default. There is an option to suppress
|
||||
this, just in case anybody wants that teeny extra bit of performance.
|
||||
|
||||
|
||||
Releases 4.1 - 4.3
|
||||
------------------
|
||||
|
||||
Sorry, I forgot about updating the NEWS file for these releases. Please take a
|
||||
look at ChangeLog.
|
||||
|
||||
|
||||
Release 4.0 17-Feb-03
|
||||
---------------------
|
||||
|
||||
There have been a lot of changes for the 4.0 release, adding additional
|
||||
functionality and mending bugs. Below is a list of the highlights of the new
|
||||
functionality. For full details of these features, please consult the
|
||||
documentation. For a complete list of changes, see the ChangeLog file.
|
||||
|
||||
1. Support for Perl's \Q...\E escapes.
|
||||
|
||||
2. "Possessive quantifiers" ?+, *+, ++, and {,}+ which come from Sun's Java
|
||||
package. They provide some syntactic sugar for simple cases of "atomic
|
||||
grouping".
|
||||
|
||||
3. Support for the \G assertion. It is true when the current matching position
|
||||
is at the start point of the match.
|
||||
|
||||
4. A new feature that provides some of the functionality that Perl provides
|
||||
with (?{...}). The facility is termed a "callout". The way it is done in PCRE
|
||||
is for the caller to provide an optional function, by setting pcre_callout to
|
||||
its entry point. To get the function called, the regex must include (?C) at
|
||||
appropriate points.
|
||||
|
||||
5. Support for recursive calls to individual subpatterns. This makes it really
|
||||
easy to get totally confused.
|
||||
|
||||
6. Support for named subpatterns. The Python syntax (?P<name>...) is used to
|
||||
name a group.
|
||||
|
||||
7. Several extensions to UTF-8 support; it is now fairly complete. There is an
|
||||
option for pcregrep to make it operate in UTF-8 mode.
|
||||
|
||||
8. The single man page has been split into a number of separate man pages.
|
||||
These also give rise to individual HTML pages which are put in a separate
|
||||
directory. There is an index.html page that lists them all. Some hyperlinking
|
||||
between the pages has been installed.
|
||||
|
||||
|
||||
Release 3.5 15-Aug-01
|
||||
---------------------
|
||||
|
||||
1. The configuring system has been upgraded to use later versions of autoconf
|
||||
and libtool. By default it builds both a shared and a static library if the OS
|
||||
supports it. You can use --disable-shared or --disable-static on the configure
|
||||
command if you want only one of them.
|
||||
|
||||
2. The pcretest utility is now installed along with pcregrep because it is
|
||||
useful for users (to test regexs) and by doing this, it automatically gets
|
||||
relinked by libtool. The documentation has been turned into a man page, so
|
||||
there are now .1, .txt, and .html versions in /doc.
|
||||
|
||||
3. Upgrades to pcregrep:
|
||||
(i) Added long-form option names like gnu grep.
|
||||
(ii) Added --help to list all options with an explanatory phrase.
|
||||
(iii) Added -r, --recursive to recurse into sub-directories.
|
||||
(iv) Added -f, --file to read patterns from a file.
|
||||
|
||||
4. Added --enable-newline-is-cr and --enable-newline-is-lf to the configure
|
||||
script, to force use of CR or LF instead of \n in the source. On non-Unix
|
||||
systems, the value can be set in config.h.
|
||||
|
||||
5. The limit of 200 on non-capturing parentheses is a _nesting_ limit, not an
|
||||
absolute limit. Changed the text of the error message to make this clear, and
|
||||
likewise updated the man page.
|
||||
|
||||
6. The limit of 99 on the number of capturing subpatterns has been removed.
|
||||
The new limit is 65535, which I hope will not be a "real" limit.
|
||||
|
||||
|
||||
Release 3.3 01-Aug-00
|
||||
---------------------
|
||||
|
||||
There is some support for UTF-8 character strings. This is incomplete and
|
||||
experimental. The documentation describes what is and what is not implemented.
|
||||
Otherwise, this is just a bug-fixing release.
|
||||
|
||||
|
||||
Release 3.0 01-Feb-00
|
||||
---------------------
|
||||
|
||||
1. A "configure" script is now used to configure PCRE for Unix systems. It
|
||||
builds a Makefile, a config.h file, and the pcre-config script.
|
||||
|
||||
2. PCRE is built as a shared library by default.
|
||||
|
||||
3. There is support for POSIX classes such as [:alpha:].
|
||||
|
||||
5. There is an experimental recursion feature.
|
||||
|
||||
----------------------------------------------------------------------------
|
||||
IMPORTANT FOR THOSE UPGRADING FROM VERSIONS BEFORE 2.00
|
||||
|
||||
Please note that there has been a change in the API such that a larger
|
||||
ovector is required at matching time, to provide some additional workspace.
|
||||
The new man page has details. This change was necessary in order to support
|
||||
some of the new functionality in Perl 5.005.
|
||||
|
||||
IMPORTANT FOR THOSE UPGRADING FROM VERSION 2.00
|
||||
|
||||
Another (I hope this is the last!) change has been made to the API for the
|
||||
pcre_compile() function. An additional argument has been added to make it
|
||||
possible to pass over a pointer to character tables built in the current
|
||||
locale by pcre_maketables(). To use the default tables, this new arguement
|
||||
should be passed as NULL.
|
||||
|
||||
IMPORTANT FOR THOSE UPGRADING FROM VERSION 2.05
|
||||
|
||||
Yet another (and again I hope this really is the last) change has been made
|
||||
to the API for the pcre_exec() function. An additional argument has been
|
||||
added to make it possible to start the match other than at the start of the
|
||||
subject string. This is important if there are lookbehinds. The new man
|
||||
page has the details, but you just want to convert existing programs, all
|
||||
you need to do is to stick in a new fifth argument to pcre_exec(), with a
|
||||
value of zero. For example, change
|
||||
|
||||
pcre_exec(pattern, extra, subject, length, options, ovec, ovecsize)
|
||||
to
|
||||
pcre_exec(pattern, extra, subject, length, 0, options, ovec, ovecsize)
|
||||
|
||||
****
|
@ -1,122 +0,0 @@
|
||||
Compiling PCRE on non-Unix systems
|
||||
----------------------------------
|
||||
|
||||
See below for comments on Cygwin or MinGW usage. I (Philip Hazel) have no
|
||||
knowledge of Windows sytems and how their libraries work. The items in the
|
||||
PCRE Makefile that relate to anything other than Unix-like systems have been
|
||||
contributed by PCRE users. There are some other comments and files in the
|
||||
Contrib directory on the ftp site that you may find useful.
|
||||
|
||||
The following are generic comments about building PCRE:
|
||||
|
||||
If you want to compile PCRE for a non-Unix system (or perhaps, more strictly,
|
||||
for a system that does not support "configure" and make files), note that PCRE
|
||||
consists entirely of code written in Standard C, and so should compile
|
||||
successfully on any machine with a Standard C compiler and library, using
|
||||
normal compiling commands to do the following:
|
||||
|
||||
(1) Copy or rename the file config.in as config.h, and change the macros that
|
||||
define HAVE_STRERROR and HAVE_MEMMOVE to define them as 1 rather than 0.
|
||||
Unfortunately, because of the way Unix autoconf works, the default setting has
|
||||
to be 0. You may also want to make changes to other macros in config.h. In
|
||||
particular, if you want to force a specific value for newline, you can define
|
||||
the NEWLINE macro. The default is to use '\n', thereby using whatever value
|
||||
your compiler gives to '\n'.
|
||||
|
||||
(2) Copy or rename the file pcre.in as pcre.h, and change the macro definitions
|
||||
for PCRE_MAJOR, PCRE_MINOR, and PCRE_DATE near its start to the values set in
|
||||
configure.in.
|
||||
|
||||
(3) Compile dftables.c as a stand-alone program, and then run it with
|
||||
the single argument "chartables.c". This generates a set of standard
|
||||
character tables and writes them to that file.
|
||||
|
||||
(4) Compile maketables.c, get.c, study.c and pcre.c and link them all
|
||||
together into an object library in whichever form your system keeps such
|
||||
libraries. This is the pcre library (chartables.c is included by means of an
|
||||
#include directive). If your system has static and shared libraries, you may
|
||||
have to do this once for each type.
|
||||
|
||||
(5) Similarly, compile pcreposix.c and link it (on its own) as the pcreposix
|
||||
library.
|
||||
|
||||
(6) Compile the test program pcretest.c. This needs the functions in the
|
||||
pcre and pcreposix libraries when linking.
|
||||
|
||||
(7) Run pcretest on the testinput files in the testdata directory, and check
|
||||
that the output matches the corresponding testoutput files. You must use the
|
||||
-i option when checking testinput2. Note that the supplied files are in Unix
|
||||
format, with just LF characters as line terminators. You may need to edit them
|
||||
to change this if your system uses a different convention.
|
||||
|
||||
If you have a system without "configure" but where you can use a Makefile, edit
|
||||
Makefile.in to create Makefile, substituting suitable values for the variables
|
||||
at the head of the file.
|
||||
|
||||
Some help in building a Win32 DLL of PCRE in GnuWin32 environments was
|
||||
contributed by Paul Sokolovsky. These environments are Mingw32
|
||||
(http://www.xraylith.wisc.edu/~khan/software/gnu-win32/) and CygWin
|
||||
(http://sourceware.cygnus.com/cygwin/). Paul comments:
|
||||
|
||||
For CygWin, set CFLAGS=-mno-cygwin, and do 'make dll'. You'll get
|
||||
pcre.dll (containing pcreposix also), libpcre.dll.a, and dynamically
|
||||
linked pgrep and pcretest. If you have /bin/sh, run RunTest (three
|
||||
main test go ok, locale not supported).
|
||||
|
||||
Changes to do MinGW with autoconf 2.50 were supplied by Fred Cox
|
||||
<sailorFred@yahoo.com>, who comments as follows:
|
||||
|
||||
If you are using the PCRE DLL, the normal Unix style configure && make &&
|
||||
make check && make install should just work[*]. If you want to statically
|
||||
link against the .a file, you must define PCRE_STATIC before including
|
||||
pcre.h, otherwise the pcre_malloc and pcre_free exported functions will be
|
||||
declared __declspec(dllimport), with hilarious results. See the configure.in
|
||||
and pcretest.c for how it is done for the static test.
|
||||
|
||||
Also, there will only be a libpcre.la, not a libpcreposix.la, as you
|
||||
would expect from the Unix version. The single DLL includes the pcreposix
|
||||
interface.
|
||||
|
||||
[*] But note that the supplied test files are in Unix format, with just LF
|
||||
characters as line terminators. You will have to edit them to change to CR LF
|
||||
terminators.
|
||||
|
||||
A script for building PCRE using Borland's C++ compiler for use with VPASCAL
|
||||
was contributed by Alexander Tokarev. It is called makevp.bat.
|
||||
|
||||
These are some further comments about Win32 builds from Mark Evans. They
|
||||
were contributed before Fred Cox's changes were made, so it is possible that
|
||||
they may no longer be relevant.
|
||||
|
||||
"The documentation for Win32 builds is a bit shy. Under MSVC6 I
|
||||
followed their instructions to the letter, but there were still
|
||||
some things missing.
|
||||
|
||||
(1) Must #define STATIC for entire project if linking statically.
|
||||
(I see no reason to use DLLs for code this compact.) This of
|
||||
course is a project setting in MSVC under Preprocessor.
|
||||
|
||||
(2) Missing some #ifdefs relating to the function pointers
|
||||
pcre_malloc and pcre_free. See my solution below. (The stubs
|
||||
may not be mandatory but they made me feel better.)"
|
||||
|
||||
=========================
|
||||
#ifdef _WIN32
|
||||
#include <malloc.h>
|
||||
|
||||
void* malloc_stub(size_t N)
|
||||
{ return malloc(N); }
|
||||
void free_stub(void* p)
|
||||
{ free(p); }
|
||||
void *(*pcre_malloc)(size_t) = &malloc_stub;
|
||||
void (*pcre_free)(void *) = &free_stub;
|
||||
|
||||
#else
|
||||
|
||||
void *(*pcre_malloc)(size_t) = malloc;
|
||||
void (*pcre_free)(void *) = free;
|
||||
|
||||
#endif
|
||||
=========================
|
||||
|
||||
****
|
@ -1,139 +0,0 @@
|
||||
#! /bin/sh
|
||||
|
||||
# This file is generated by configure from RunTest.in. Make any changes
|
||||
# to that file.
|
||||
|
||||
# Run PCRE tests
|
||||
|
||||
cf=diff
|
||||
testdata=@top_srcdir@/testdata
|
||||
|
||||
# Select which tests to run; if no selection, run all
|
||||
|
||||
do1=no
|
||||
do2=no
|
||||
do3=no
|
||||
do4=no
|
||||
do5=no
|
||||
|
||||
while [ $# -gt 0 ] ; do
|
||||
case $1 in
|
||||
1) do1=yes;;
|
||||
2) do2=yes;;
|
||||
3) do3=yes;;
|
||||
4) do4=yes;;
|
||||
5) do5=yes;;
|
||||
*) echo "Unknown test number $1"; exit 1;;
|
||||
esac
|
||||
shift
|
||||
done
|
||||
|
||||
if [ "@UTF8@" = "" ] ; then
|
||||
if [ $do4 = yes ] ; then
|
||||
echo "Can't run test 4 because UFT8 support is not configured"
|
||||
exit 1
|
||||
fi
|
||||
if [ $do5 = yes ] ; then
|
||||
echo "Can't run test 5 because UFT8 support is not configured"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
if [ $do1 = no -a $do2 = no -a $do3 = no -a $do4 = no -a\
|
||||
$do5 = no ] ; then
|
||||
do1=yes
|
||||
do2=yes
|
||||
do3=yes
|
||||
if [ "@UTF8@" != "" ] ; then do4=yes; fi
|
||||
if [ "@UTF8@" != "" ] ; then do5=yes; fi
|
||||
fi
|
||||
|
||||
# Show which release
|
||||
|
||||
./pcretest /dev/null
|
||||
|
||||
# Primary test, Perl-compatible
|
||||
|
||||
if [ $do1 = yes ] ; then
|
||||
echo "Testing main functionality (Perl compatible)"
|
||||
./pcretest $testdata/testinput1 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry $testdata/testoutput1
|
||||
if [ $? != 0 ] ; then exit 1; fi
|
||||
echo " "
|
||||
else exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
# PCRE tests that are not Perl-compatible - API & error tests, mostly
|
||||
|
||||
if [ $do2 = yes ] ; then
|
||||
echo "Testing API and error handling (not Perl compatible)"
|
||||
./pcretest -i $testdata/testinput2 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry $testdata/testoutput2
|
||||
if [ $? != 0 ] ; then exit 1; fi
|
||||
else exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
if [ $do1 = yes -a $do2 = yes ] ; then
|
||||
echo " "
|
||||
echo "The two main tests ran OK"
|
||||
echo " "
|
||||
fi
|
||||
|
||||
# Locale-specific tests, provided the "fr_FR" locale is available
|
||||
|
||||
if [ $do3 = yes ] ; then
|
||||
locale -a | grep '^fr_FR$' >/dev/null
|
||||
if [ $? -eq 0 ] ; then
|
||||
echo "Testing locale-specific features (using 'fr_FR' locale)"
|
||||
./pcretest $testdata/testinput3 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry $testdata/testoutput3
|
||||
if [ $? != 0 ] ; then
|
||||
echo " "
|
||||
echo "Locale test did not run entirely successfully."
|
||||
echo "This usually means that there is a problem with the locale"
|
||||
echo "settings rather than a bug in PCRE."
|
||||
else
|
||||
echo "Locale test ran OK"
|
||||
fi
|
||||
echo " "
|
||||
else exit 1
|
||||
fi
|
||||
else
|
||||
echo "Cannot test locale-specific features - 'fr_FR' locale not found,"
|
||||
echo "or the \"locale\" command is not available to check for it."
|
||||
echo " "
|
||||
fi
|
||||
fi
|
||||
|
||||
# Additional tests for UTF8 support
|
||||
|
||||
if [ $do4 = yes ] ; then
|
||||
echo "Testing UTF-8 support (Perl compatible)"
|
||||
./pcretest $testdata/testinput4 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry $testdata/testoutput4
|
||||
if [ $? != 0 ] ; then exit 1; fi
|
||||
else exit 1
|
||||
fi
|
||||
echo "UTF8 test ran OK"
|
||||
echo " "
|
||||
fi
|
||||
|
||||
if [ $do5 = yes ] ; then
|
||||
echo "Testing API and internals for UTF-8 support (not Perl compatible)"
|
||||
./pcretest $testdata/testinput5 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry $testdata/testoutput5
|
||||
if [ $? != 0 ] ; then exit 1; fi
|
||||
else exit 1
|
||||
fi
|
||||
echo "UTF8 internals test ran OK"
|
||||
echo " "
|
||||
fi
|
||||
|
||||
# End
|
1400
external-libs/pcre/config.guess
vendored
1400
external-libs/pcre/config.guess
vendored
File diff suppressed because it is too large
Load Diff
@ -1,107 +0,0 @@
|
||||
|
||||
/* On Unix systems config.in is converted by configure into config.h. PCRE is
|
||||
written in Standard C, but there are a few non-standard things it can cope
|
||||
with, allowing it to run on SunOS4 and other "close to standard" systems.
|
||||
|
||||
On a non-Unix system you should just copy this file into config.h, and set up
|
||||
the macros the way you need them. You should normally change the definitions of
|
||||
HAVE_STRERROR and HAVE_MEMMOVE to 1. Unfortunately, because of the way autoconf
|
||||
works, these cannot be made the defaults. If your system has bcopy() and not
|
||||
memmove(), change the definition of HAVE_BCOPY instead of HAVE_MEMMOVE. If your
|
||||
system has neither bcopy() nor memmove(), leave them both as 0; an emulation
|
||||
function will be used. */
|
||||
|
||||
/* If you are compiling for a system that uses EBCDIC instead of ASCII
|
||||
character codes, define this macro as 1. On systems that can use "configure",
|
||||
this can be done via --enable-ebcdic. */
|
||||
|
||||
#ifndef EBCDIC
|
||||
#define EBCDIC 0
|
||||
#endif
|
||||
|
||||
/* If you are compiling for a system that needs some magic to be inserted
|
||||
before the definition of an exported function, define this macro to contain the
|
||||
relevant magic. It apears at the start of every exported function. */
|
||||
|
||||
#define EXPORT
|
||||
|
||||
/* Define to empty if the "const" keyword does not work. */
|
||||
|
||||
#undef const
|
||||
|
||||
/* Define to "unsigned" if <stddef.h> doesn't define size_t. */
|
||||
|
||||
#undef size_t
|
||||
|
||||
/* The following two definitions are mainly for the benefit of SunOS4, which
|
||||
doesn't have the strerror() or memmove() functions that should be present in
|
||||
all Standard C libraries. The macros HAVE_STRERROR and HAVE_MEMMOVE should
|
||||
normally be defined with the value 1 for other systems, but unfortunately we
|
||||
can't make this the default because "configure" files generated by autoconf
|
||||
will only change 0 to 1; they won't change 1 to 0 if the functions are not
|
||||
found. */
|
||||
|
||||
#define HAVE_STRERROR 0
|
||||
#define HAVE_MEMMOVE 0
|
||||
|
||||
/* There are some non-Unix systems that don't even have bcopy(). If this macro
|
||||
is false, an emulation is used. If HAVE_MEMMOVE is set to 1, the value of
|
||||
HAVE_BCOPY is not relevant. */
|
||||
|
||||
#define HAVE_BCOPY 0
|
||||
|
||||
/* The value of NEWLINE determines the newline character. The default is to
|
||||
leave it up to the compiler, but some sites want to force a particular value.
|
||||
On Unix systems, "configure" can be used to override this default. */
|
||||
|
||||
#ifndef NEWLINE
|
||||
#define NEWLINE '\n'
|
||||
#endif
|
||||
|
||||
/* The value of LINK_SIZE determines the number of bytes used to store
|
||||
links as offsets within the compiled regex. The default is 2, which allows for
|
||||
compiled patterns up to 64K long. This covers the vast majority of cases.
|
||||
However, PCRE can also be compiled to use 3 or 4 bytes instead. This allows for
|
||||
longer patterns in extreme cases. On Unix systems, "configure" can be used to
|
||||
override this default. */
|
||||
|
||||
#ifndef LINK_SIZE
|
||||
#define LINK_SIZE 2
|
||||
#endif
|
||||
|
||||
/* The value of MATCH_LIMIT determines the default number of times the match()
|
||||
function can be called during a single execution of pcre_exec(). (There is a
|
||||
runtime method of setting a different limit.) The limit exists in order to
|
||||
catch runaway regular expressions that take for ever to determine that they do
|
||||
not match. The default is set very large so that it does not accidentally catch
|
||||
legitimate cases. On Unix systems, "configure" can be used to override this
|
||||
default default. */
|
||||
|
||||
#ifndef MATCH_LIMIT
|
||||
#define MATCH_LIMIT 10000000
|
||||
#endif
|
||||
|
||||
/* When calling PCRE via the POSIX interface, additional working storage is
|
||||
required for holding the pointers to capturing substrings because PCRE requires
|
||||
three integers per substring, whereas the POSIX interface provides only two. If
|
||||
the number of expected substrings is small, the wrapper function uses space on
|
||||
the stack, because this is faster than using malloc() for each call. The
|
||||
threshold above which the stack is no longer use is defined by POSIX_MALLOC_
|
||||
THRESHOLD. On Unix systems, "configure" can be used to override this default.
|
||||
*/
|
||||
|
||||
#ifndef POSIX_MALLOC_THRESHOLD
|
||||
#define POSIX_MALLOC_THRESHOLD 10
|
||||
#endif
|
||||
|
||||
/* PCRE uses recursive function calls to handle backtracking while matching.
|
||||
This can sometimes be a problem on systems that have stacks of limited size.
|
||||
Define NO_RECURSE to get a version that doesn't use recursion in the match()
|
||||
function; instead it creates its own stack by steam using pcre_recurse_malloc
|
||||
to get memory. For more detail, see comments and other stuff just above the
|
||||
match() function. On Unix systems, "configure" can be used to set this in the
|
||||
Makefile (use --disable-recursion). */
|
||||
|
||||
/* #define NO_RECURSE */
|
||||
|
||||
/* End */
|
1469
external-libs/pcre/config.sub
vendored
1469
external-libs/pcre/config.sub
vendored
File diff suppressed because it is too large
Load Diff
8927
external-libs/pcre/configure
vendored
8927
external-libs/pcre/configure
vendored
File diff suppressed because it is too large
Load Diff
@ -1,201 +0,0 @@
|
||||
dnl Process this file with autoconf to produce a configure script.
|
||||
|
||||
dnl This is required at the start; the name is the name of a file
|
||||
dnl it should be seeing, to verify it is in the same directory.
|
||||
|
||||
AC_INIT(dftables.c)
|
||||
|
||||
dnl A safety precaution
|
||||
|
||||
AC_PREREQ(2.57)
|
||||
|
||||
dnl Arrange to build config.h from config.in. Note that pcre.h is
|
||||
dnl built differently, as it is just a "substitution" file.
|
||||
dnl Manual says this macro should come right after AC_INIT.
|
||||
AC_CONFIG_HEADER(config.h:config.in)
|
||||
|
||||
dnl Provide the current PCRE version information. Do not use numbers
|
||||
dnl with leading zeros for the minor version, as they end up in a C
|
||||
dnl macro, and may be treated as octal constants. Stick to single
|
||||
dnl digits for minor numbers less than 10. There are unlikely to be
|
||||
dnl that many releases anyway.
|
||||
|
||||
PCRE_MAJOR=4
|
||||
PCRE_MINOR=5
|
||||
PCRE_DATE=01-December-2003
|
||||
PCRE_VERSION=${PCRE_MAJOR}.${PCRE_MINOR}
|
||||
|
||||
dnl Default values for miscellaneous macros
|
||||
|
||||
POSIX_MALLOC_THRESHOLD=-DPOSIX_MALLOC_THRESHOLD=10
|
||||
|
||||
dnl Provide versioning information for libtool shared libraries that
|
||||
dnl are built by default on Unix systems.
|
||||
|
||||
PCRE_LIB_VERSION=0:1:0
|
||||
PCRE_POSIXLIB_VERSION=0:0:0
|
||||
|
||||
dnl Checks for programs.
|
||||
|
||||
AC_PROG_CC
|
||||
AC_PROG_INSTALL
|
||||
AC_LIBTOOL_WIN32_DLL
|
||||
AC_PROG_LIBTOOL
|
||||
|
||||
dnl We need to find a compiler for compiling a program to run on the local host
|
||||
dnl while building. It needs to be different from CC when cross-compiling.
|
||||
dnl There is a macro called AC_PROG_CC_FOR_BUILD in the GNU archive for
|
||||
dnl figuring this out automatically. Unfortunately, it does not work with the
|
||||
dnl latest versions of autoconf. So for the moment, we just default to the
|
||||
dnl same values as the "main" compiler. People who are corss-compiling will
|
||||
dnl just have to adjust the Makefile by hand or set these values when they
|
||||
dnl run "configure".
|
||||
|
||||
CC_FOR_BUILD=${CC_FOR_BUILD:-'$(CC)'}
|
||||
CFLAGS_FOR_BUILD=${CFLAGS_FOR_BUILD:-'$(CFLAGS)'}
|
||||
BUILD_EXEEXT=${BUILD_EXEEXT:-'$(EXEEXT)'}
|
||||
BUILD_OBJEXT=${BUILD_OBJEXT:-'$(OBJEXT)'}
|
||||
|
||||
dnl Checks for header files.
|
||||
|
||||
AC_HEADER_STDC
|
||||
AC_CHECK_HEADERS(limits.h)
|
||||
|
||||
dnl Checks for typedefs, structures, and compiler characteristics.
|
||||
|
||||
AC_C_CONST
|
||||
AC_TYPE_SIZE_T
|
||||
|
||||
dnl Checks for library functions.
|
||||
|
||||
AC_CHECK_FUNCS(bcopy memmove strerror)
|
||||
|
||||
dnl Handle --enable-utf8
|
||||
|
||||
AC_ARG_ENABLE(utf8,
|
||||
[ --enable-utf8 enable UTF8 support],
|
||||
if test "$enableval" = "yes"; then
|
||||
UTF8=-DSUPPORT_UTF8
|
||||
fi
|
||||
)
|
||||
|
||||
dnl Handle --enable-newline-is-cr
|
||||
|
||||
AC_ARG_ENABLE(newline-is-cr,
|
||||
[ --enable-newline-is-cr use CR as the newline character],
|
||||
if test "$enableval" = "yes"; then
|
||||
NEWLINE=-DNEWLINE=13
|
||||
fi
|
||||
)
|
||||
|
||||
dnl Handle --enable-newline-is-lf
|
||||
|
||||
AC_ARG_ENABLE(newline-is-lf,
|
||||
[ --enable-newline-is-lf use LF as the newline character],
|
||||
if test "$enableval" = "yes"; then
|
||||
NEWLINE=-DNEWLINE=10
|
||||
fi
|
||||
)
|
||||
|
||||
dnl Handle --enable-ebcdic
|
||||
|
||||
AC_ARG_ENABLE(ebcdic,
|
||||
[ --enable-ebcdic assume EBCDIC coding rather than ASCII],
|
||||
if test "$enableval" == "yes"; then
|
||||
EBCDIC=-DEBCDIC=1
|
||||
fi
|
||||
)
|
||||
|
||||
dnl Handle --disable-stack-for-recursion
|
||||
AC_ARG_ENABLE(recursion,
|
||||
[ --disable-stack-for-recursion disable use of stack recursion when matching],
|
||||
if test "$enableval" = "no"; then
|
||||
NO_RECURSE=-DNO_RECURSE
|
||||
fi
|
||||
)
|
||||
|
||||
dnl There doesn't seem to be a straightforward way of having parameters
|
||||
dnl that set values, other than fudging the --with thing. So that's what
|
||||
dnl I've done.
|
||||
|
||||
dnl Handle --with-posix-malloc-threshold=n
|
||||
|
||||
AC_ARG_WITH(posix-malloc-threshold,
|
||||
[ --with-posix-malloc-threshold=5 threshold for POSIX malloc usage],
|
||||
POSIX_MALLOC_THRESHOLD=-DPOSIX_MALLOC_THRESHOLD=$withval
|
||||
)
|
||||
|
||||
dnl Handle --with-link-size=n
|
||||
|
||||
AC_ARG_WITH(link-size,
|
||||
[ --with-link-size=2 internal link size (2, 3, or 4 allowed)],
|
||||
LINK_SIZE=-DLINK_SIZE=$withval
|
||||
)
|
||||
|
||||
dnl Handle --with-match_limit=n
|
||||
|
||||
AC_ARG_WITH(match-limit,
|
||||
[ --with-match-limit=10000000 default limit on internal looping)],
|
||||
MATCH_LIMIT=-DMATCH_LIMIT=$withval
|
||||
)
|
||||
|
||||
dnl Now arrange to build libtool
|
||||
|
||||
AC_PROG_LIBTOOL
|
||||
|
||||
dnl "Export" these variables
|
||||
|
||||
AC_SUBST(BUILD_EXEEXT)
|
||||
AC_SUBST(BUILD_OBJEXT)
|
||||
AC_SUBST(CC_FOR_BUILD)
|
||||
AC_SUBST(CFLAGS_FOR_BUILD)
|
||||
AC_SUBST(EBCDIC)
|
||||
AC_SUBST(HAVE_MEMMOVE)
|
||||
AC_SUBST(HAVE_STRERROR)
|
||||
AC_SUBST(LINK_SIZE)
|
||||
AC_SUBST(MATCH_LIMIT)
|
||||
AC_SUBST(NEWLINE)
|
||||
AC_SUBST(NO_RECURSE)
|
||||
AC_SUBST(PCRE_MAJOR)
|
||||
AC_SUBST(PCRE_MINOR)
|
||||
AC_SUBST(PCRE_DATE)
|
||||
AC_SUBST(PCRE_VERSION)
|
||||
AC_SUBST(PCRE_LIB_VERSION)
|
||||
AC_SUBST(PCRE_POSIXLIB_VERSION)
|
||||
AC_SUBST(POSIX_MALLOC_THRESHOLD)
|
||||
AC_SUBST(UTF8)
|
||||
|
||||
dnl Stuff to make MinGW work better. Special treatment is no longer
|
||||
dnl needed for Cygwin.
|
||||
|
||||
case $host_os in
|
||||
mingw* )
|
||||
POSIX_OBJ=pcreposix.o
|
||||
POSIX_LOBJ=pcreposix.lo
|
||||
POSIX_LIB=
|
||||
ON_WINDOWS=
|
||||
NOT_ON_WINDOWS="#"
|
||||
WIN_PREFIX=
|
||||
;;
|
||||
* )
|
||||
ON_WINDOWS="#"
|
||||
NOT_ON_WINDOWS=
|
||||
POSIX_OBJ=
|
||||
POSIX_LOBJ=
|
||||
POSIX_LIB=libpcreposix.la
|
||||
WIN_PREFIX=
|
||||
;;
|
||||
esac
|
||||
AC_SUBST(WIN_PREFIX)
|
||||
AC_SUBST(ON_WINDOWS)
|
||||
AC_SUBST(NOT_ON_WINDOWS)
|
||||
AC_SUBST(POSIX_OBJ)
|
||||
AC_SUBST(POSIX_LOBJ)
|
||||
AC_SUBST(POSIX_LIB)
|
||||
|
||||
if test "x$enable_shared" = "xno" ; then
|
||||
AC_DEFINE([PCRE_STATIC],[1],[to link statically])
|
||||
fi
|
||||
|
||||
dnl This must be last; it determines what files are written as well as config.h
|
||||
AC_OUTPUT(Makefile pcre.h:pcre.in pcre-config:pcre-config.in RunTest:RunTest.in,[chmod a+x RunTest pcre-config])
|
@ -1,281 +0,0 @@
|
||||
Technical Notes about PCRE
|
||||
--------------------------
|
||||
|
||||
Many years ago I implemented some regular expression functions to an algorithm
|
||||
suggested by Martin Richards. These were not Unix-like in form, and were quite
|
||||
restricted in what they could do by comparison with Perl. The interesting part
|
||||
about the algorithm was that the amount of space required to hold the compiled
|
||||
form of an expression was known in advance. The code to apply an expression did
|
||||
not operate by backtracking, as the original Henry Spencer code and current
|
||||
Perl code does, but instead checked all possibilities simultaneously by keeping
|
||||
a list of current states and checking all of them as it advanced through the
|
||||
subject string. (In the terminology of Jeffrey Friedl's book, it was a "DFA
|
||||
algorithm".) When the pattern was all used up, all remaining states were
|
||||
possible matches, and the one matching the longest subset of the subject string
|
||||
was chosen. This did not necessarily maximize the individual wild portions of
|
||||
the pattern, as is expected in Unix and Perl-style regular expressions.
|
||||
|
||||
By contrast, the code originally written by Henry Spencer and subsequently
|
||||
heavily modified for Perl actually compiles the expression twice: once in a
|
||||
dummy mode in order to find out how much store will be needed, and then for
|
||||
real. The execution function operates by backtracking and maximizing (or,
|
||||
optionally, minimizing in Perl) the amount of the subject that matches
|
||||
individual wild portions of the pattern. This is an "NFA algorithm" in Friedl's
|
||||
terminology.
|
||||
|
||||
For the set of functions that forms PCRE (which are unrelated to those
|
||||
mentioned above), I tried at first to invent an algorithm that used an amount
|
||||
of store bounded by a multiple of the number of characters in the pattern, to
|
||||
save on compiling time. However, because of the greater complexity in Perl
|
||||
regular expressions, I couldn't do this. In any case, a first pass through the
|
||||
pattern is needed, for a number of reasons. PCRE works by running a very
|
||||
degenerate first pass to calculate a maximum store size, and then a second pass
|
||||
to do the real compile - which may use a bit less than the predicted amount of
|
||||
store. The idea is that this is going to turn out faster because the first pass
|
||||
is degenerate and the second pass can just store stuff straight into the
|
||||
vector. It does make the compiling functions bigger, of course, but they have
|
||||
got quite big anyway to handle all the Perl stuff.
|
||||
|
||||
The compiled form of a pattern is a vector of bytes, containing items of
|
||||
variable length. The first byte in an item is an opcode, and the length of the
|
||||
item is either implicit in the opcode or contained in the data bytes which
|
||||
follow it. A list of all the opcodes follows:
|
||||
|
||||
Opcodes with no following data
|
||||
------------------------------
|
||||
|
||||
These items are all just one byte long
|
||||
|
||||
OP_END end of pattern
|
||||
OP_ANY match any character
|
||||
OP_ANYBYTE match any single byte, even in UTF-8 mode
|
||||
OP_SOD match start of data: \A
|
||||
OP_SOM, start of match (subject + offset): \G
|
||||
OP_CIRC ^ (start of data, or after \n in multiline)
|
||||
OP_NOT_WORD_BOUNDARY \W
|
||||
OP_WORD_BOUNDARY \w
|
||||
OP_NOT_DIGIT \D
|
||||
OP_DIGIT \d
|
||||
OP_NOT_WHITESPACE \S
|
||||
OP_WHITESPACE \s
|
||||
OP_NOT_WORDCHAR \W
|
||||
OP_WORDCHAR \w
|
||||
OP_EODN match end of data or \n at end: \Z
|
||||
OP_EOD match end of data: \z
|
||||
OP_DOLL $ (end of data, or before \n in multiline)
|
||||
|
||||
|
||||
Repeating single characters
|
||||
---------------------------
|
||||
|
||||
The common repeats (*, +, ?) when applied to a single character appear as
|
||||
two-byte items using the following opcodes:
|
||||
|
||||
OP_STAR
|
||||
OP_MINSTAR
|
||||
OP_PLUS
|
||||
OP_MINPLUS
|
||||
OP_QUERY
|
||||
OP_MINQUERY
|
||||
|
||||
Those with "MIN" in their name are the minimizing versions. Each is followed by
|
||||
the character that is to be repeated. Other repeats make use of
|
||||
|
||||
OP_UPTO
|
||||
OP_MINUPTO
|
||||
OP_EXACT
|
||||
|
||||
which are followed by a two-byte count (most significant first) and the
|
||||
repeated character. OP_UPTO matches from 0 to the given number. A repeat with a
|
||||
non-zero minimum and a fixed maximum is coded as an OP_EXACT followed by an
|
||||
OP_UPTO (or OP_MINUPTO).
|
||||
|
||||
|
||||
Repeating character types
|
||||
-------------------------
|
||||
|
||||
Repeats of things like \d are done exactly as for single characters, except
|
||||
that instead of a character, the opcode for the type is stored in the data
|
||||
byte. The opcodes are:
|
||||
|
||||
OP_TYPESTAR
|
||||
OP_TYPEMINSTAR
|
||||
OP_TYPEPLUS
|
||||
OP_TYPEMINPLUS
|
||||
OP_TYPEQUERY
|
||||
OP_TYPEMINQUERY
|
||||
OP_TYPEUPTO
|
||||
OP_TYPEMINUPTO
|
||||
OP_TYPEEXACT
|
||||
|
||||
|
||||
Matching a character string
|
||||
---------------------------
|
||||
|
||||
The OP_CHARS opcode is followed by a one-byte count and then that number of
|
||||
characters. If there are more than 255 characters in sequence, successive
|
||||
instances of OP_CHARS are used.
|
||||
|
||||
|
||||
Character classes
|
||||
-----------------
|
||||
|
||||
If there is only one character, OP_CHARS is used for a positive class,
|
||||
and OP_NOT for a negative one (that is, for something like [^a]). However, in
|
||||
UTF-8 mode, this applies only to characters with values < 128, because OP_NOT
|
||||
is confined to single bytes.
|
||||
|
||||
Another set of repeating opcodes (OP_NOTSTAR etc.) are used for a repeated,
|
||||
negated, single-character class. The normal ones (OP_STAR etc.) are used for a
|
||||
repeated positive single-character class.
|
||||
|
||||
When there's more than one character in a class and all the characters are less
|
||||
than 256, OP_CLASS is used for a positive class, and OP_NCLASS for a negative
|
||||
one. In either case, the opcode is followed by a 32-byte bit map containing a 1
|
||||
bit for every character that is acceptable. The bits are counted from the least
|
||||
significant end of each byte.
|
||||
|
||||
The reason for having both OP_CLASS and OP_NCLASS is so that, in UTF-8 mode,
|
||||
subject characters with values greater than 256 can be handled correctly. For
|
||||
OP_CLASS they don't match, whereas for OP_NCLASS they do.
|
||||
|
||||
For classes containing characters with values > 255, OP_XCLASS is used. It
|
||||
optionally uses a bit map (if any characters lie within it), followed by a list
|
||||
of pairs and single characters. There is a flag character than indicates
|
||||
whether it's a positive or a negative class.
|
||||
|
||||
|
||||
Back references
|
||||
---------------
|
||||
|
||||
OP_REF is followed by two bytes containing the reference number.
|
||||
|
||||
|
||||
Repeating character classes and back references
|
||||
-----------------------------------------------
|
||||
|
||||
Single-character classes are handled specially (see above). This applies to
|
||||
OP_CLASS and OP_REF. In both cases, the repeat information follows the base
|
||||
item. The matching code looks at the following opcode to see if it is one of
|
||||
|
||||
OP_CRSTAR
|
||||
OP_CRMINSTAR
|
||||
OP_CRPLUS
|
||||
OP_CRMINPLUS
|
||||
OP_CRQUERY
|
||||
OP_CRMINQUERY
|
||||
OP_CRRANGE
|
||||
OP_CRMINRANGE
|
||||
|
||||
All but the last two are just single-byte items. The others are followed by
|
||||
four bytes of data, comprising the minimum and maximum repeat counts.
|
||||
|
||||
|
||||
Brackets and alternation
|
||||
------------------------
|
||||
|
||||
A pair of non-capturing (round) brackets is wrapped round each expression at
|
||||
compile time, so alternation always happens in the context of brackets.
|
||||
|
||||
Non-capturing brackets use the opcode OP_BRA, while capturing brackets use
|
||||
OP_BRA+1, OP_BRA+2, etc. [Note for North Americans: "bracket" to some English
|
||||
speakers, including myself, can be round, square, curly, or pointy. Hence this
|
||||
usage.]
|
||||
|
||||
Originally PCRE was limited to 99 capturing brackets (so as not to use up all
|
||||
the opcodes). From release 3.5, there is no limit. What happens is that the
|
||||
first ones, up to EXTRACT_BASIC_MAX are handled with separate opcodes, as
|
||||
above. If there are more, the opcode is set to EXTRACT_BASIC_MAX+1, and the
|
||||
first operation in the bracket is OP_BRANUMBER, followed by a 2-byte bracket
|
||||
number. This opcode is ignored while matching, but is fished out when handling
|
||||
the bracket itself. (They could have all been done like this, but I was making
|
||||
minimal changes.)
|
||||
|
||||
A bracket opcode is followed by two bytes which give the offset to the next
|
||||
alternative OP_ALT or, if there aren't any branches, to the matching KET
|
||||
opcode. Each OP_ALT is followed by two bytes giving the offset to the next one,
|
||||
or to the KET opcode.
|
||||
|
||||
OP_KET is used for subpatterns that do not repeat indefinitely, while
|
||||
OP_KETRMIN and OP_KETRMAX are used for indefinite repetitions, minimally or
|
||||
maximally respectively. All three are followed by two bytes giving (as a
|
||||
positive number) the offset back to the matching BRA opcode.
|
||||
|
||||
If a subpattern is quantified such that it is permitted to match zero times, it
|
||||
is preceded by one of OP_BRAZERO or OP_BRAMINZERO. These are single-byte
|
||||
opcodes which tell the matcher that skipping this subpattern entirely is a
|
||||
valid branch.
|
||||
|
||||
A subpattern with an indefinite maximum repetition is replicated in the
|
||||
compiled data its minimum number of times (or once with a BRAZERO if the
|
||||
minimum is zero), with the final copy terminating with a KETRMIN or KETRMAX as
|
||||
appropriate.
|
||||
|
||||
A subpattern with a bounded maximum repetition is replicated in a nested
|
||||
fashion up to the maximum number of times, with BRAZERO or BRAMINZERO before
|
||||
each replication after the minimum, so that, for example, (abc){2,5} is
|
||||
compiled as (abc)(abc)((abc)((abc)(abc)?)?)?. The 99 and 200 bracket limits do
|
||||
not apply to these internally generated brackets.
|
||||
|
||||
|
||||
Assertions
|
||||
----------
|
||||
|
||||
Forward assertions are just like other subpatterns, but starting with one of
|
||||
the opcodes OP_ASSERT or OP_ASSERT_NOT. Backward assertions use the opcodes
|
||||
OP_ASSERTBACK and OP_ASSERTBACK_NOT, and the first opcode inside the assertion
|
||||
is OP_REVERSE, followed by a two byte count of the number of characters to move
|
||||
back the pointer in the subject string. When operating in UTF-8 mode, the count
|
||||
is a character count rather than a byte count. A separate count is present in
|
||||
each alternative of a lookbehind assertion, allowing them to have different
|
||||
fixed lengths.
|
||||
|
||||
|
||||
Once-only subpatterns
|
||||
---------------------
|
||||
|
||||
These are also just like other subpatterns, but they start with the opcode
|
||||
OP_ONCE.
|
||||
|
||||
|
||||
Conditional subpatterns
|
||||
-----------------------
|
||||
|
||||
These are like other subpatterns, but they start with the opcode OP_COND. If
|
||||
the condition is a back reference, this is stored at the start of the
|
||||
subpattern using the opcode OP_CREF followed by two bytes containing the
|
||||
reference number. If the condition is "in recursion" (coded as "(?(R)"), the
|
||||
same scheme is used, with a "reference number" of 0xffff. Otherwise, a
|
||||
conditional subpattern always starts with one of the assertions.
|
||||
|
||||
|
||||
Recursion
|
||||
---------
|
||||
|
||||
Recursion either matches the current regex, or some subexpression. The opcode
|
||||
OP_RECURSE is followed by an value which is the offset to the starting bracket
|
||||
from the start of the whole pattern.
|
||||
|
||||
|
||||
Callout
|
||||
-------
|
||||
|
||||
OP_CALLOUT is followed by one byte of data that holds a callout number in the
|
||||
range 0 to 255.
|
||||
|
||||
|
||||
Changing options
|
||||
----------------
|
||||
|
||||
If any of the /i, /m, or /s options are changed within a pattern, an OP_OPT
|
||||
opcode is compiled, followed by one byte containing the new settings of these
|
||||
flags. If there are several alternatives, there is an occurrence of OP_OPT at
|
||||
the start of all those following the first options change, to set appropriate
|
||||
options for the start of the alternative. Immediately after the end of the
|
||||
group there is another such item to reset the flags to their previous values. A
|
||||
change of flag right at the very start of the pattern can be handled entirely
|
||||
at compile time, and so does not cause anything to be put into the compiled
|
||||
data.
|
||||
|
||||
Philip Hazel
|
||||
August 2003
|
@ -1,102 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>PCRE specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>Perl-compatible Regular Expressions (PCRE)</h1>
|
||||
<p>
|
||||
The HTML documentation for PCRE comprises the following pages:
|
||||
</p>
|
||||
|
||||
<table>
|
||||
<tr><td><a href="pcre.html">pcre</a></td>
|
||||
<td> Introductory page</td></tr>
|
||||
|
||||
<tr><td><a href="pcreapi.html">pcreapi</a></td>
|
||||
<td> PCRE's native API</td></tr>
|
||||
|
||||
<tr><td><a href="pcrebuild.html">pcrebuild</a></td>
|
||||
<td> Options for building PCRE</td></tr>
|
||||
|
||||
<tr><td><a href="pcrecallout.html">pcrecallout</a></td>
|
||||
<td> The <i>callout</i> facility</td></tr>
|
||||
|
||||
<tr><td><a href="pcrecompat.html">pcrecompat</a></td>
|
||||
<td> Compability with Perl</td></tr>
|
||||
|
||||
<tr><td><a href="pcregrep.html">pcregrep</a></td>
|
||||
<td> The <b>pcregrep</b> command</td></tr>
|
||||
|
||||
<tr><td><a href="pcrepattern.html">pcrepattern</a></td>
|
||||
<td> Regular expressions supported by PCRE</td></tr>
|
||||
|
||||
<tr><td><a href="pcreperform.html">pcreperform</a></td>
|
||||
<td> Some comments on performance</td></tr>
|
||||
|
||||
<tr><td><a href="pcreposix.html">pcreposix</a></td>
|
||||
<td> The POSIX API to the PCRE library</td></tr>
|
||||
|
||||
<tr><td><a href="pcresample.html">pcresample</a></td>
|
||||
<td> Description of the sample program</td></tr>
|
||||
|
||||
<tr><td><a href="pcretest.html">pcretest</a></td>
|
||||
<td> The <b>pcretest</b> command for testing PCRE</td></tr>
|
||||
</table>
|
||||
|
||||
<p>
|
||||
There are also individual pages that summarize the interface for each function
|
||||
in the library:
|
||||
</p>
|
||||
|
||||
<table>
|
||||
|
||||
<tr><td><a href="pcre_compile.html">pcre_compile</a></td>
|
||||
<td> Compile a regular expression</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_config.html">pcre_config</a></td>
|
||||
<td> Show build-time configuration options</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_copy_named_substring.html">pcre_copy_named_substring</a></td>
|
||||
<td> Extract named substring into given buffer</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_copy_substring.html">pcre_copy_substring</a></td>
|
||||
<td> Extract numbered substring into given buffer</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_exec.html">pcre_exec</a></td>
|
||||
<td> Match a compiled pattern to a subject string</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_free_substring.html">pcre_free_substring</a></td>
|
||||
<td> Free extracted substring</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_free_substring_list.html">pcre_free_substring_list</a></td>
|
||||
<td> Free list of extracted substrings</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_fullinfo.html">pcre_fullinfo</a></td>
|
||||
<td> Extract information about a pattern</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_get_named_substring.html">pcre_get_named_substring</a></td>
|
||||
<td> Extract named substring into new memory</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_get_stringnumber.html">pcre_get_stringnumber</a></td>
|
||||
<td> Convert captured string name to number</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_get_substring.html">pcre_get_substring</a></td>
|
||||
<td> Extract numbered substring into new memory</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_get_substring_list.html">pcre_get_substring_list</a></td>
|
||||
<td> Extract all substrings into new memory</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_info.html">pcre_info</a></td>
|
||||
<td> Obsolete information extraction function</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_maketables.html">pcre_maketables</a></td>
|
||||
<td> Build character tables in current locale</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_study.html">pcre_study</a></td>
|
||||
<td> Study a compiled pattern</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_version.html">pcre_version</a></td>
|
||||
<td> Return PCRE version and release date</td></tr>
|
||||
</table>
|
||||
|
||||
</html>
|
@ -1,190 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page, in case the
|
||||
conversion went wrong.<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">DESCRIPTION</a>
|
||||
<li><a name="TOC2" href="#SEC2">USER DOCUMENTATION</a>
|
||||
<li><a name="TOC3" href="#SEC3">LIMITATIONS</a>
|
||||
<li><a name="TOC4" href="#SEC4">UTF-8 SUPPORT</a>
|
||||
<li><a name="TOC5" href="#SEC5">AUTHOR</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">DESCRIPTION</a><br>
|
||||
<P>
|
||||
The PCRE library is a set of functions that implement regular expression
|
||||
pattern matching using the same syntax and semantics as Perl, with just a few
|
||||
differences. The current implementation of PCRE (release 4.x) corresponds
|
||||
approximately with Perl 5.8, including support for UTF-8 encoded strings.
|
||||
However, this support has to be explicitly enabled; it is not the default.
|
||||
</P>
|
||||
<P>
|
||||
PCRE is written in C and released as a C library. However, a number of people
|
||||
have written wrappers and interfaces of various kinds. A C++ class is included
|
||||
in these contributions, which can be found in the <i>Contrib</i> directory at
|
||||
the primary FTP site, which is:
|
||||
</P>
|
||||
<a href="ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre">ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre</a>
|
||||
<P>
|
||||
Details of exactly which Perl regular expression features are and are not
|
||||
supported by PCRE are given in separate documents. See the
|
||||
<a href="pcrepattern.html"><b>pcrepattern</b></a>
|
||||
and
|
||||
<a href="pcrecompat.html"><b>pcrecompat</b></a>
|
||||
pages.
|
||||
</P>
|
||||
<P>
|
||||
Some features of PCRE can be included, excluded, or changed when the library is
|
||||
built. The
|
||||
<a href="pcre_config.html"><b>pcre_config()</b></a>
|
||||
function makes it possible for a client to discover which features are
|
||||
available. Documentation about building PCRE for various operating systems can
|
||||
be found in the <b>README</b> file in the source distribution.
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">USER DOCUMENTATION</a><br>
|
||||
<P>
|
||||
The user documentation for PCRE has been split up into a number of different
|
||||
sections. In the "man" format, each of these is a separate "man page". In the
|
||||
HTML format, each is a separate page, linked from the index page. In the plain
|
||||
text format, all the sections are concatenated, for ease of searching. The
|
||||
sections are as follows:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
pcre this document
|
||||
pcreapi details of PCRE's native API
|
||||
pcrebuild options for building PCRE
|
||||
pcrecallout details of the callout feature
|
||||
pcrecompat discussion of Perl compatibility
|
||||
pcregrep description of the <b>pcregrep</b> command
|
||||
pcrepattern syntax and semantics of supported
|
||||
regular expressions
|
||||
pcreperform discussion of performance issues
|
||||
pcreposix the POSIX-compatible API
|
||||
pcresample discussion of the sample program
|
||||
pcretest the <b>pcretest</b> testing command
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
In addition, in the "man" and HTML formats, there is a short page for each
|
||||
library function, listing its arguments and results.
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">LIMITATIONS</a><br>
|
||||
<P>
|
||||
There are some size limitations in PCRE but it is hoped that they will never in
|
||||
practice be relevant.
|
||||
</P>
|
||||
<P>
|
||||
The maximum length of a compiled pattern is 65539 (sic) bytes if PCRE is
|
||||
compiled with the default internal linkage size of 2. If you want to process
|
||||
regular expressions that are truly enormous, you can compile PCRE with an
|
||||
internal linkage size of 3 or 4 (see the <b>README</b> file in the source
|
||||
distribution and the
|
||||
<a href="pcrebuild.html"><b>pcrebuild</b></a>
|
||||
documentation for details). If these cases the limit is substantially larger.
|
||||
However, the speed of execution will be slower.
|
||||
</P>
|
||||
<P>
|
||||
All values in repeating quantifiers must be less than 65536.
|
||||
The maximum number of capturing subpatterns is 65535.
|
||||
</P>
|
||||
<P>
|
||||
There is no limit to the number of non-capturing subpatterns, but the maximum
|
||||
depth of nesting of all kinds of parenthesized subpattern, including capturing
|
||||
subpatterns, assertions, and other types of subpattern, is 200.
|
||||
</P>
|
||||
<P>
|
||||
The maximum length of a subject string is the largest positive number that an
|
||||
integer variable can hold. However, PCRE uses recursion to handle subpatterns
|
||||
and indefinite repetition. This means that the available stack space may limit
|
||||
the size of a subject string that can be processed by certain patterns.
|
||||
</P>
|
||||
<a name="utf8support"></a><br><a name="SEC4" href="#TOC1">UTF-8 SUPPORT</a><br>
|
||||
<P>
|
||||
Starting at release 3.3, PCRE has had some support for character strings
|
||||
encoded in the UTF-8 format. For release 4.0 this has been greatly extended to
|
||||
cover most common requirements.
|
||||
</P>
|
||||
<P>
|
||||
In order process UTF-8 strings, you must build PCRE to include UTF-8 support in
|
||||
the code, and, in addition, you must call
|
||||
<a href="pcre_compile.html"><b>pcre_compile()</b></a>
|
||||
with the PCRE_UTF8 option flag. When you do this, both the pattern and any
|
||||
subject strings that are matched against it are treated as UTF-8 strings
|
||||
instead of just strings of bytes.
|
||||
</P>
|
||||
<P>
|
||||
If you compile PCRE with UTF-8 support, but do not use it at run time, the
|
||||
library will be a bit bigger, but the additional run time overhead is limited
|
||||
to testing the PCRE_UTF8 flag in several places, so should not be very large.
|
||||
</P>
|
||||
<P>
|
||||
The following comments apply when PCRE is running in UTF-8 mode:
|
||||
</P>
|
||||
<P>
|
||||
1. When you set the PCRE_UTF8 flag, the strings passed as patterns and subjects
|
||||
are checked for validity on entry to the relevant functions. If an invalid
|
||||
UTF-8 string is passed, an error return is given. In some situations, you may
|
||||
already know that your strings are valid, and therefore want to skip these
|
||||
checks in order to improve performance. If you set the PCRE_NO_UTF8_CHECK flag
|
||||
at compile time or at run time, PCRE assumes that the pattern or subject it
|
||||
is given (respectively) contains only valid UTF-8 codes. In this case, it does
|
||||
not diagnose an invalid UTF-8 string. If you pass an invalid UTF-8 string to
|
||||
PCRE when PCRE_NO_UTF8_CHECK is set, the results are undefined. Your program
|
||||
may crash.
|
||||
</P>
|
||||
<P>
|
||||
2. In a pattern, the escape sequence \x{...}, where the contents of the braces
|
||||
is a string of hexadecimal digits, is interpreted as a UTF-8 character whose
|
||||
code number is the given hexadecimal number, for example: \x{1234}. If a
|
||||
non-hexadecimal digit appears between the braces, the item is not recognized.
|
||||
This escape sequence can be used either as a literal, or within a character
|
||||
class.
|
||||
</P>
|
||||
<P>
|
||||
3. The original hexadecimal escape sequence, \xhh, matches a two-byte UTF-8
|
||||
character if the value is greater than 127.
|
||||
</P>
|
||||
<P>
|
||||
4. Repeat quantifiers apply to complete UTF-8 characters, not to individual
|
||||
bytes, for example: \x{100}{3}.
|
||||
</P>
|
||||
<P>
|
||||
5. The dot metacharacter matches one UTF-8 character instead of a single byte.
|
||||
</P>
|
||||
<P>
|
||||
6. The escape sequence \C can be used to match a single byte in UTF-8 mode,
|
||||
but its use can lead to some strange effects.
|
||||
</P>
|
||||
<P>
|
||||
7. The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly
|
||||
test characters of any code value, but the characters that PCRE recognizes as
|
||||
digits, spaces, or word characters remain the same set as before, all with
|
||||
values less than 256.
|
||||
</P>
|
||||
<P>
|
||||
8. Case-insensitive matching applies only to characters whose values are less
|
||||
than 256. PCRE does not support the notion of "case" for higher-valued
|
||||
characters.
|
||||
</P>
|
||||
<P>
|
||||
9. PCRE does not support the use of Unicode tables and properties or the Perl
|
||||
escapes \p, \P, and \X.
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
|
||||
<P>
|
||||
Philip Hazel <ph10@cam.ac.uk>
|
||||
<br>
|
||||
University Computing Service,
|
||||
<br>
|
||||
Cambridge CB2 3QG, England.
|
||||
<br>
|
||||
Phone: +44 1223 334714
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 20 August 2003
|
||||
<br>
|
||||
Copyright © 1997-2003 University of Cambridge.
|
@ -1,71 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_compile specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page, in case the
|
||||
conversion went wrong.<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>pcre *pcre_compile(const char *<i>pattern</i>, int <i>options</i>,</b>
|
||||
<b>const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
|
||||
<b>const unsigned char *<i>tableptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function compiles a regular expression into an internal form. Its
|
||||
arguments are:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
<i>pattern</i> A zero-terminated string containing the
|
||||
regular expression to be compiled
|
||||
<i>options</i> Zero or more option bits
|
||||
<i>errptr</i> Where to put an error message
|
||||
<i>erroffset</i> Offset in pattern where error was found
|
||||
<i>tableptr</i> Pointer to character tables, or NULL to
|
||||
use the built-in default
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The option bits are:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
PCRE_ANCHORED Force pattern anchoring
|
||||
PCRE_CASELESS Do caseless matching
|
||||
PCRE_DOLLAR_ENDONLY $ not to match newline at end
|
||||
PCRE_DOTALL . matches anything including NL
|
||||
PCRE_EXTENDED Ignore whitespace and # comments
|
||||
PCRE_EXTRA PCRE extra features
|
||||
(not much use currently)
|
||||
PCRE_MULTILINE ^ and $ match newlines within data
|
||||
PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
|
||||
theses (named ones available)
|
||||
PCRE_UNGREEDY Invert greediness of quantifiers
|
||||
PCRE_UTF8 Run in UTF-8 mode
|
||||
PCRE_NO_UTF8_CHECK Do not check the pattern for UTF-8
|
||||
validity (only relevant if
|
||||
PCRE_UTF8 is set)
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
PCRE must be compiled with UTF-8 support in order to use PCRE_UTF8
|
||||
(or PCRE_NO_UTF8_CHECK).
|
||||
</P>
|
||||
<P>
|
||||
The yield of the function is a pointer to a private data structure that
|
||||
contains the compiled pattern, or NULL if an error was detected.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page.
|
@ -1,56 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_config specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page, in case the
|
||||
conversion went wrong.<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_config(int <i>what</i>, void *<i>where</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function makes it possible for a client program to find out which optional
|
||||
features are available in the version of the PCRE library it is using. Its
|
||||
arguments are as follows:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
<i>what</i> A code specifying what information is required
|
||||
<i>where</i> Points to where to put the data
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The available codes are:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
PCRE_CONFIG_LINK_SIZE Internal link size: 2, 3, or 4
|
||||
PCRE_CONFIG_MATCH_LIMIT Internal resource limit
|
||||
PCRE_CONFIG_NEWLINE Value of the newline character
|
||||
PCRE_CONFIG_POSIX_MALLOC_THRESHOLD
|
||||
Threshold of return slots, above
|
||||
which <b>malloc()</b> is used by
|
||||
the POSIX API
|
||||
PCRE_CONFIG_STACKRECURSE Recursion implementation (1=stack 0=heap)
|
||||
PCRE_CONFIG_UTF8 Availability of UTF-8 support (1=yes 0=no)
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The function yields 0 on success or PCRE_ERROR_BADOPTION otherwise.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page, and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
@ -1,46 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_copy_named_substring specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page, in case the
|
||||
conversion went wrong.<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_copy_named_substring(const pcre *<i>code</i>,</b>
|
||||
<b>const char *<i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, const char *<i>stringname</i>,</b>
|
||||
<b>char *<i>buffer</i>, int <i>buffersize</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This is a convenience function for extracting a captured substring, identified
|
||||
by name, into a given buffer. The arguments are:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
<i>code</i> Pattern that was successfully matched
|
||||
<i>subject</i> Subject that has been successfully matched
|
||||
<i>ovector</i> Offset vector that <b>pcre_exec()</b> used
|
||||
<i>stringcount</i> Value returned by <b>pcre_exec()</b>
|
||||
<i>stringname</i> Name of the required substring
|
||||
<i>buffer</i> Buffer to receive the string
|
||||
<i>buffersize</i> Size of buffer
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The yield is the length of the substring, PCRE_ERROR_NOMEMORY if the buffer was
|
||||
too small, or PCRE_ERROR_NOSUBSTRING if the string name is invalid.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page.
|
@ -1,44 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_copy_substring specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page, in case the
|
||||
conversion went wrong.<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_copy_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, int <i>stringnumber</i>, char *<i>buffer</i>,</b>
|
||||
<b>int <i>buffersize</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This is a convenience function for extracting a captured substring into a given
|
||||
buffer. The arguments are:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
<i>subject</i> Subject that has been successfully matched
|
||||
<i>ovector</i> Offset vector that <b>pcre_exec()</b> used
|
||||
<i>stringcount</i> Value returned by <b>pcre_exec()</b>
|
||||
<i>stringnumber</i> Number of the required substring
|
||||
<i>buffer</i> Buffer to receive the string
|
||||
<i>buffersize</i> Size of buffer
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The yield is the legnth of the string, PCRE_ERROR_NOMEMORY if the buffer was
|
||||
too small, or PCRE_ERROR_NOSUBSTRING if the string number is invalid.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page.
|
@ -1,58 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_exec specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page, in case the
|
||||
conversion went wrong.<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_exec(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
|
||||
<b>const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b>int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function matches a compiled regular expression against a given subject
|
||||
string, and returns offsets to capturing subexpressions. Its arguments are:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
<i>code</i> Points to the compiled pattern
|
||||
<i>extra</i> Points to an associated <b>pcre_extra</b> structure,
|
||||
or is NULL
|
||||
<i>subject</i> Points to the subject string
|
||||
<i>length</i> Length of the subject string, in bytes
|
||||
<i>startoffset</i> Offset in bytes in the subject at which to
|
||||
start matching
|
||||
<i>options</i> Option bits
|
||||
<i>ovector</i> Points to a vector of ints for result offsets
|
||||
<i>ovecsize</i> Size of the vector (a multiple of 3)
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The options are:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
PCRE_ANCHORED Match only at the first position
|
||||
PCRE_NOTBOL Subject is not the beginning of a line
|
||||
PCRE_NOTEOL Subject is not the end of a line
|
||||
PCRE_NOTEMPTY An empty string is not a valid match
|
||||
PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
|
||||
validity (only relevant if PCRE_UTF8
|
||||
was set at compile time)
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page.
|
@ -1,29 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_free_substring specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page, in case the
|
||||
conversion went wrong.<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>void pcre_free_substring(const char *<i>stringptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This is a convenience function for freeing the store obtained by a previous
|
||||
call to <b>pcre_get_substring()</b> or <b>pcre_get_named_substring()</b>. Its
|
||||
only argument is a pointer to the string.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page.
|
@ -1,29 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_free_substring_list specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page, in case the
|
||||
conversion went wrong.<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>void pcre_free_substring_list(const char **<i>stringptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This is a convenience function for freeing the store obtained by a previous
|
||||
call to <b>pcre_get_substring_list()</b>. Its only argument is a pointer to the
|
||||
list of string pointers.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page.
|
@ -1,68 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_fullinfo specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page, in case the
|
||||
conversion went wrong.<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_fullinfo(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
|
||||
<b>int <i>what</i>, void *<i>where</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function returns information about a compiled pattern. Its arguments are:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
<i>code</i> Compiled regular expression
|
||||
<i>extra</i> Result of <b>pcre_study()</b> or NULL
|
||||
<i>what</i> What information is required
|
||||
<i>where</i> Where to put the information
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The following information is available:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
PCRE_INFO_BACKREFMAX Number of highest back reference
|
||||
PCRE_INFO_CAPTURECOUNT Number of capturing subpatterns
|
||||
PCRE_INFO_FIRSTBYTE Fixed first byte for a match, or
|
||||
-1 for start of string
|
||||
or after newline, or
|
||||
-2 otherwise
|
||||
PCRE_INFO_FIRSTTABLE Table of first bytes
|
||||
(after studying)
|
||||
PCRE_INFO_LASTLITERAL Literal last byte required
|
||||
PCRE_INFO_NAMECOUNT Number of named subpatterns
|
||||
PCRE_INFO_NAMEENTRYSIZE Size of name table entry
|
||||
PCRE_INFO_NAMETABLE Pointer to name table
|
||||
PCRE_INFO_OPTIONS Options used for compilation
|
||||
PCRE_INFO_SIZE Size of compiled pattern
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The yield of the function is zero on success or:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
PCRE_ERROR_NULL the argument <i>code</i> was NULL
|
||||
the argument <i>where</i> was NULL
|
||||
PCRE_ERROR_BADMAGIC the "magic number" was not found
|
||||
PCRE_ERROR_BADOPTION the value of <i>what</i> was invalid
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page.
|
@ -1,46 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_get_named_substring specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page, in case the
|
||||
conversion went wrong.<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_get_named_substring(const pcre *<i>code</i>,</b>
|
||||
<b>const char *<i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, const char *<i>stringname</i>,</b>
|
||||
<b>const char **<i>stringptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This is a convenience function for extracting a captured substring by name. The
|
||||
arguments are:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
<i>code</i> Compiled pattern
|
||||
<i>subject</i> Subject that has been successfully matched
|
||||
<i>ovector</i> Offset vector that <b>pcre_exec()</b> used
|
||||
<i>stringcount</i> Value returned by <b>pcre_exec()</b>
|
||||
<i>stringname</i> Name of the required substring
|
||||
<i>stringptr</i> Where to put the string pointer
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The yield is the length of the extracted substring, PCRE_ERROR_NOMEMORY if
|
||||
sufficient memory could not be obtained, or PCRE_ERROR_NOSUBSTRING if the
|
||||
string name is invalid.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page.
|
@ -1,39 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_get_stringnumber specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page, in case the
|
||||
conversion went wrong.<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_get_stringnumber(const pcre *<i>code</i>,</b>
|
||||
<b>const char *<i>name</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This convenience function finds the number of a named substring capturing
|
||||
parenthesis in a compiled pattern. Its arguments are:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
<i>code</i> Compiled regular expression
|
||||
<i>name</i> Name whose number is required
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The yield of the function is the number of the parenthesis if the name is
|
||||
found, or PCRE_ERROR_NOSUBSTRING otherwise.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page.
|
@ -1,44 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_get_substring specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page, in case the
|
||||
conversion went wrong.<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_get_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, int <i>stringnumber</i>,</b>
|
||||
<b>const char **<i>stringptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This is a convenience function for extracting a captured substring. The
|
||||
arguments are:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
<i>subject</i> Subject that has been successfully matched
|
||||
<i>ovector</i> Offset vector that <b>pcre_exec()</b> used
|
||||
<i>stringcount</i> Value returned by <b>pcre_exec()</b>
|
||||
<i>stringnumber</i> Number of the required substring
|
||||
<i>stringptr</i> Where to put the string pointer
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The yield is the length of the substring, PCRE_ERROR_NOMEMORY if sufficient
|
||||
memory could not be obtained, or PCRE_ERROR_NOSUBSTRING if the string number is
|
||||
invalid.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page.
|
@ -1,41 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_get_substring_list specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page, in case the
|
||||
conversion went wrong.<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_get_substring_list(const char *<i>subject</i>,</b>
|
||||
<b>int *<i>ovector</i>, int <i>stringcount</i>, const char ***<i>listptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This is a convenience function for extracting a list of all the captured
|
||||
substrings. The arguments are:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
<i>subject</i> Subject that has been successfully matched
|
||||
<i>ovector</i> Offset vector that <b>pcre_exec</b> used
|
||||
<i>stringcount</i> Value returned by <b>pcre_exec</b>
|
||||
<i>listptr</i> Where to put a pointer to the list
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The yield is zero on success or PCRE_ERROR_NOMEMORY if sufficient memory could
|
||||
not be obtained.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page.
|
@ -1,28 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_info specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page, in case the
|
||||
conversion went wrong.<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_info(const pcre *<i>code</i>, int *<i>optptr</i>, int</b>
|
||||
<b>*<i>firstcharptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function is obsolete. You should be using <b>pcre_fullinfo()</b> instead.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page.
|
@ -1,31 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_maketables specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page, in case the
|
||||
conversion went wrong.<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>const unsigned char *pcre_maketables(void);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function builds a set of character tables which can be passed to
|
||||
<b>pcre_compile()</b> to override PCRE's internal, built-in tables (which were
|
||||
made by <b>pcre_maketables()</b> when PCRE was compiled). You might want to do
|
||||
this if you are using a non-standard locale. The function yields a pointer to
|
||||
the tables.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page.
|
@ -1,45 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_study specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page, in case the
|
||||
conversion went wrong.<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>pcre_extra *pcre_study(const pcre *<i>code</i>, int <i>options</i>,</b>
|
||||
<b>const char **<i>errptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function studies a compiled pattern, to see if additional information can
|
||||
be extracted that might speed up matching. Its arguments are:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
<i>code</i> A compiled regular expression
|
||||
<i>options</i> Options for <b>pcre_study()</b>
|
||||
<i>errptr</i> Where to put an error message
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
If the function returns NULL, either it could not find any additional
|
||||
information, or there was an error. You can tell the difference by looking at
|
||||
the error value. It is NULL in first case.
|
||||
</P>
|
||||
<P>
|
||||
There are currently no options defined; the value of the second argument should
|
||||
always be zero.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page.
|
@ -1,28 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_version specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page, in case the
|
||||
conversion went wrong.<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>char *pcre_version(void);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function returns a character string that gives the version number of the
|
||||
PCRE library, and its date of release.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page.
|
File diff suppressed because it is too large
Load Diff
@ -1,189 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcrebuild specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page, in case the
|
||||
conversion went wrong.<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">PCRE BUILD-TIME OPTIONS</a>
|
||||
<li><a name="TOC2" href="#SEC2">UTF-8 SUPPORT</a>
|
||||
<li><a name="TOC3" href="#SEC3">CODE VALUE OF NEWLINE</a>
|
||||
<li><a name="TOC4" href="#SEC4">BUILDING SHARED AND STATIC LIBRARIES</a>
|
||||
<li><a name="TOC5" href="#SEC5">POSIX MALLOC USAGE</a>
|
||||
<li><a name="TOC6" href="#SEC6">LIMITING PCRE RESOURCE USAGE</a>
|
||||
<li><a name="TOC7" href="#SEC7">HANDLING VERY LARGE PATTERNS</a>
|
||||
<li><a name="TOC8" href="#SEC8">AVOIDING EXCESSIVE STACK USAGE</a>
|
||||
<li><a name="TOC9" href="#SEC9">USING EBCDIC CODE</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">PCRE BUILD-TIME OPTIONS</a><br>
|
||||
<P>
|
||||
This document describes the optional features of PCRE that can be selected when
|
||||
the library is compiled. They are all selected, or deselected, by providing
|
||||
options to the <b>configure</b> script which is run before the <b>make</b>
|
||||
command. The complete list of options for <b>configure</b> (which includes the
|
||||
standard ones such as the selection of the installation directory) can be
|
||||
obtained by running
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
./configure --help
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The following sections describe certain options whose names begin with --enable
|
||||
or --disable. These settings specify changes to the defaults for the
|
||||
<b>configure</b> command. Because of the way that <b>configure</b> works,
|
||||
--enable and --disable always come in pairs, so the complementary option always
|
||||
exists as well, but as it specifies the default, it is not described.
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">UTF-8 SUPPORT</a><br>
|
||||
<P>
|
||||
To build PCRE with support for UTF-8 character strings, add
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
--enable-utf8
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
to the <b>configure</b> command. Of itself, this does not make PCRE treat
|
||||
strings as UTF-8. As well as compiling PCRE with this option, you also have
|
||||
have to set the PCRE_UTF8 option when you call the <b>pcre_compile()</b>
|
||||
function.
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">CODE VALUE OF NEWLINE</a><br>
|
||||
<P>
|
||||
By default, PCRE treats character 10 (linefeed) as the newline character. This
|
||||
is the normal newline character on Unix-like systems. You can compile PCRE to
|
||||
use character 13 (carriage return) instead by adding
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
--enable-newline-is-cr
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
to the <b>configure</b> command. For completeness there is also a
|
||||
--enable-newline-is-lf option, which explicitly specifies linefeed as the
|
||||
newline character.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">BUILDING SHARED AND STATIC LIBRARIES</a><br>
|
||||
<P>
|
||||
The PCRE building process uses <b>libtool</b> to build both shared and static
|
||||
Unix libraries by default. You can suppress one of these by adding one of
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
--disable-shared
|
||||
--disable-static
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
to the <b>configure</b> command, as required.
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">POSIX MALLOC USAGE</a><br>
|
||||
<P>
|
||||
When PCRE is called through the POSIX interface (see the <b>pcreposix</b>
|
||||
documentation), additional working storage is required for holding the pointers
|
||||
to capturing substrings because PCRE requires three integers per substring,
|
||||
whereas the POSIX interface provides only two. If the number of expected
|
||||
substrings is small, the wrapper function uses space on the stack, because this
|
||||
is faster than using <b>malloc()</b> for each call. The default threshold above
|
||||
which the stack is no longer used is 10; it can be changed by adding a setting
|
||||
such as
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
--with-posix-malloc-threshold=20
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
to the <b>configure</b> command.
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">LIMITING PCRE RESOURCE USAGE</a><br>
|
||||
<P>
|
||||
Internally, PCRE has a function called <b>match()</b> which it calls repeatedly
|
||||
(possibly recursively) when performing a matching operation. By limiting the
|
||||
number of times this function may be called, a limit can be placed on the
|
||||
resources used by a single call to <b>pcre_exec()</b>. The limit can be changed
|
||||
at run time, as described in the <b>pcreapi</b> documentation. The default is 10
|
||||
million, but this can be changed by adding a setting such as
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
--with-match-limit=500000
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
to the <b>configure</b> command.
|
||||
</P>
|
||||
<br><a name="SEC7" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>
|
||||
<P>
|
||||
Within a compiled pattern, offset values are used to point from one part to
|
||||
another (for example, from an opening parenthesis to an alternation
|
||||
metacharacter). By default two-byte values are used for these offsets, leading
|
||||
to a maximum size for a compiled pattern of around 64K. This is sufficient to
|
||||
handle all but the most gigantic patterns. Nevertheless, some people do want to
|
||||
process enormous patterns, so it is possible to compile PCRE to use three-byte
|
||||
or four-byte offsets by adding a setting such as
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
--with-link-size=3
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
to the <b>configure</b> command. The value given must be 2, 3, or 4. Using
|
||||
longer offsets slows down the operation of PCRE because it has to load
|
||||
additional bytes when handling them.
|
||||
</P>
|
||||
<P>
|
||||
If you build PCRE with an increased link size, test 2 (and test 5 if you are
|
||||
using UTF-8) will fail. Part of the output of these tests is a representation
|
||||
of the compiled pattern, and this changes with the link size.
|
||||
</P>
|
||||
<br><a name="SEC8" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br>
|
||||
<P>
|
||||
PCRE implements backtracking while matching by making recursive calls to an
|
||||
internal function called <b>match()</b>. In environments where the size of the
|
||||
stack is limited, this can severely limit PCRE's operation. (The Unix
|
||||
environment does not usually suffer from this problem.) An alternative approach
|
||||
that uses memory from the heap to remember data, instead of using recursive
|
||||
function calls, has been implemented to work round this problem. If you want to
|
||||
build a version of PCRE that works this way, add
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
--disable-stack-for-recursion
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
to the <b>configure</b> command. With this configuration, PCRE will use the
|
||||
<b>pcre_stack_malloc</b> and <b>pcre_stack_free</b> variables to call memory
|
||||
management functions. Separate functions are provided because the usage is very
|
||||
predictable: the block sizes requested are always the same, and the blocks are
|
||||
always freed in reverse order. A calling program might be able to implement
|
||||
optimized functions that perform better than the standard <b>malloc()</b> and
|
||||
<b>free()</b> functions. PCRE runs noticeably more slowly when built in this
|
||||
way.
|
||||
</P>
|
||||
<br><a name="SEC9" href="#TOC1">USING EBCDIC CODE</a><br>
|
||||
<P>
|
||||
PCRE assumes by default that it will run in an environment where the character
|
||||
code is ASCII (or UTF-8, which is a superset of ASCII). PCRE can, however, be
|
||||
compiled to run in an EBCDIC environment by adding
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
--enable-ebcdic
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
to the <b>configure</b> command.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 09 December 2003
|
||||
<br>
|
||||
Copyright © 1997-2003 University of Cambridge.
|
@ -1,117 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcrecallout specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page, in case the
|
||||
conversion went wrong.<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">PCRE CALLOUTS</a>
|
||||
<li><a name="TOC2" href="#SEC2">RETURN VALUES</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">PCRE CALLOUTS</a><br>
|
||||
<P>
|
||||
<b>int (*pcre_callout)(pcre_callout_block *);</b>
|
||||
</P>
|
||||
<P>
|
||||
PCRE provides a feature called "callout", which is a means of temporarily
|
||||
passing control to the caller of PCRE in the middle of pattern matching. The
|
||||
caller of PCRE provides an external function by putting its entry point in the
|
||||
global variable <i>pcre_callout</i>. By default, this variable contains NULL,
|
||||
which disables all calling out.
|
||||
</P>
|
||||
<P>
|
||||
Within a regular expression, (?C) indicates the points at which the external
|
||||
function is to be called. Different callout points can be identified by putting
|
||||
a number less than 256 after the letter C. The default value is zero.
|
||||
For example, this pattern has two callout points:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
(?C1)\dabc(?C2)def
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
During matching, when PCRE reaches a callout point (and <i>pcre_callout</i> is
|
||||
set), the external function is called. Its only argument is a pointer to a
|
||||
<b>pcre_callout</b> block. This contains the following variables:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
int <i>version</i>;
|
||||
int <i>callout_number</i>;
|
||||
int *<i>offset_vector</i>;
|
||||
const char *<i>subject</i>;
|
||||
int <i>subject_length</i>;
|
||||
int <i>start_match</i>;
|
||||
int <i>current_position</i>;
|
||||
int <i>capture_top</i>;
|
||||
int <i>capture_last</i>;
|
||||
void *<i>callout_data</i>;
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The <i>version</i> field is an integer containing the version number of the
|
||||
block format. The current version is zero. The version number may change in
|
||||
future if additional fields are added, but the intention is never to remove any
|
||||
of the existing fields.
|
||||
</P>
|
||||
<P>
|
||||
The <i>callout_number</i> field contains the number of the callout, as compiled
|
||||
into the pattern (that is, the number after ?C).
|
||||
</P>
|
||||
<P>
|
||||
The <i>offset_vector</i> field is a pointer to the vector of offsets that was
|
||||
passed by the caller to <b>pcre_exec()</b>. The contents can be inspected in
|
||||
order to extract substrings that have been matched so far, in the same way as
|
||||
for extracting substrings after a match has completed.
|
||||
</P>
|
||||
<P>
|
||||
The <i>subject</i> and <i>subject_length</i> fields contain copies the values
|
||||
that were passed to <b>pcre_exec()</b>.
|
||||
</P>
|
||||
<P>
|
||||
The <i>start_match</i> field contains the offset within the subject at which the
|
||||
current match attempt started. If the pattern is not anchored, the callout
|
||||
function may be called several times for different starting points.
|
||||
</P>
|
||||
<P>
|
||||
The <i>current_position</i> field contains the offset within the subject of the
|
||||
current match pointer.
|
||||
</P>
|
||||
<P>
|
||||
The <i>capture_top</i> field contains one more than the number of the highest
|
||||
numbered captured substring so far. If no substrings have been captured,
|
||||
the value of <i>capture_top</i> is one.
|
||||
</P>
|
||||
<P>
|
||||
The <i>capture_last</i> field contains the number of the most recently captured
|
||||
substring.
|
||||
</P>
|
||||
<P>
|
||||
The <i>callout_data</i> field contains a value that is passed to
|
||||
<b>pcre_exec()</b> by the caller specifically so that it can be passed back in
|
||||
callouts. It is passed in the <i>pcre_callout</i> field of the <b>pcre_extra</b>
|
||||
data structure. If no such data was passed, the value of <i>callout_data</i> in
|
||||
a <b>pcre_callout</b> block is NULL. There is a description of the
|
||||
<b>pcre_extra</b> structure in the <b>pcreapi</b> documentation.
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">RETURN VALUES</a><br>
|
||||
<P>
|
||||
The callout function returns an integer. If the value is zero, matching
|
||||
proceeds as normal. If the value is greater than zero, matching fails at the
|
||||
current point, but backtracking to test other possibilities goes ahead, just as
|
||||
if a lookahead assertion had failed. If the value is less than zero, the match
|
||||
is abandoned, and <b>pcre_exec()</b> returns the value.
|
||||
</P>
|
||||
<P>
|
||||
Negative values should normally be chosen from the set of PCRE_ERROR_xxx
|
||||
values. In particular, PCRE_ERROR_NOMATCH forces a standard "no match" failure.
|
||||
The error number PCRE_ERROR_CALLOUT is reserved for use by callout functions;
|
||||
it will never be used by PCRE itself.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 21 January 2003
|
||||
<br>
|
||||
Copyright © 1997-2003 University of Cambridge.
|
@ -1,136 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcrecompat specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page, in case the
|
||||
conversion went wrong.<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">DIFFERENCES FROM PERL</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">DIFFERENCES FROM PERL</a><br>
|
||||
<P>
|
||||
This document describes the differences in the ways that PCRE and Perl handle
|
||||
regular expressions. The differences described here are with respect to Perl
|
||||
5.8.
|
||||
</P>
|
||||
<P>
|
||||
1. PCRE does not have full UTF-8 support. Details of what it does have are
|
||||
given in the
|
||||
<a href="pcre.html#utf8support">section on UTF-8 support</a>
|
||||
in the main
|
||||
<a href="pcre.html"><b>pcre</b></a>
|
||||
page.
|
||||
</P>
|
||||
<P>
|
||||
2. PCRE does not allow repeat quantifiers on lookahead assertions. Perl permits
|
||||
them, but they do not mean what you might think. For example, (?!a){3} does
|
||||
not assert that the next three characters are not "a". It just asserts that the
|
||||
next character is not "a" three times.
|
||||
</P>
|
||||
<P>
|
||||
3. Capturing subpatterns that occur inside negative lookahead assertions are
|
||||
counted, but their entries in the offsets vector are never set. Perl sets its
|
||||
numerical variables from any such patterns that are matched before the
|
||||
assertion fails to match something (thereby succeeding), but only if the
|
||||
negative lookahead assertion contains just one branch.
|
||||
</P>
|
||||
<P>
|
||||
4. Though binary zero characters are supported in the subject string, they are
|
||||
not allowed in a pattern string because it is passed as a normal C string,
|
||||
terminated by zero. The escape sequence "\0" can be used in the pattern to
|
||||
represent a binary zero.
|
||||
</P>
|
||||
<P>
|
||||
5. The following Perl escape sequences are not supported: \l, \u, \L,
|
||||
\U, \P, \p, \N, and \X. In fact these are implemented by Perl's general
|
||||
string-handling and are not part of its pattern matching engine. If any of
|
||||
these are encountered by PCRE, an error is generated.
|
||||
</P>
|
||||
<P>
|
||||
6. PCRE does support the \Q...\E escape for quoting substrings. Characters in
|
||||
between are treated as literals. This is slightly different from Perl in that $
|
||||
and @ are also handled as literals inside the quotes. In Perl, they cause
|
||||
variable interpolation (but of course PCRE does not have variables). Note the
|
||||
following examples:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
Pattern PCRE matches Perl matches
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
\Qabc$xyz\E abc$xyz abc followed by the
|
||||
contents of $xyz
|
||||
\Qabc\$xyz\E abc\$xyz abc\$xyz
|
||||
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The \Q...\E sequence is recognized both inside and outside character classes.
|
||||
</P>
|
||||
<P>
|
||||
7. Fairly obviously, PCRE does not support the (?{code}) and (?p{code})
|
||||
constructions. However, there is some experimental support for recursive
|
||||
patterns using the non-Perl items (?R), (?number) and (?P>name). Also, the PCRE
|
||||
"callout" feature allows an external function to be called during pattern
|
||||
matching.
|
||||
</P>
|
||||
<P>
|
||||
8. There are some differences that are concerned with the settings of captured
|
||||
strings when part of a pattern is repeated. For example, matching "aba" against
|
||||
the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b".
|
||||
</P>
|
||||
<P>
|
||||
9. PCRE provides some extensions to the Perl regular expression facilities:
|
||||
</P>
|
||||
<P>
|
||||
(a) Although lookbehind assertions must match fixed length strings, each
|
||||
alternative branch of a lookbehind assertion can match a different length of
|
||||
string. Perl requires them all to have the same length.
|
||||
</P>
|
||||
<P>
|
||||
(b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $
|
||||
meta-character matches only at the very end of the string.
|
||||
</P>
|
||||
<P>
|
||||
© If PCRE_EXTRA is set, a backslash followed by a letter with no special
|
||||
meaning is faulted.
|
||||
</P>
|
||||
<P>
|
||||
(d) If PCRE_UNGREEDY is set, the greediness of the repetition quantifiers is
|
||||
inverted, that is, by default they are not greedy, but if followed by a
|
||||
question mark they are.
|
||||
</P>
|
||||
<P>
|
||||
(e) PCRE_ANCHORED can be used to force a pattern to be tried only at the first
|
||||
matching position in the subject string.
|
||||
</P>
|
||||
<P>
|
||||
(f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NO_AUTO_CAPTURE
|
||||
options for <b>pcre_exec()</b> have no Perl equivalents.
|
||||
</P>
|
||||
<P>
|
||||
(g) The (?R), (?number), and (?P>name) constructs allows for recursive pattern
|
||||
matching (Perl can do this using the (?p{code}) construct, which PCRE cannot
|
||||
support.)
|
||||
</P>
|
||||
<P>
|
||||
(h) PCRE supports named capturing substrings, using the Python syntax.
|
||||
</P>
|
||||
<P>
|
||||
(i) PCRE supports the possessive quantifier "++" syntax, taken from Sun's Java
|
||||
package.
|
||||
</P>
|
||||
<P>
|
||||
(j) The (R) condition, for testing recursion, is a PCRE extension.
|
||||
</P>
|
||||
<P>
|
||||
(k) The callout facility is PCRE-specific.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 09 December 2003
|
||||
<br>
|
||||
Copyright © 1997-2003 University of Cambridge.
|
@ -1,153 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcregrep specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page, in case the
|
||||
conversion went wrong.<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
|
||||
<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
|
||||
<li><a name="TOC3" href="#SEC3">OPTIONS</a>
|
||||
<li><a name="TOC4" href="#SEC4">LONG OPTIONS</a>
|
||||
<li><a name="TOC5" href="#SEC5">DIAGNOSTICS</a>
|
||||
<li><a name="TOC6" href="#SEC6">AUTHOR</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
|
||||
<P>
|
||||
<b>pcregrep [-Vcfhilnrsuvx] [long options] [pattern] [file1 file2 ...]</b>
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
|
||||
<P>
|
||||
<b>pcregrep</b> searches files for character patterns, in the same way as other
|
||||
grep commands do, but it uses the PCRE regular expression library to support
|
||||
patterns that are compatible with the regular expressions of Perl 5. See
|
||||
<a href="pcrepattern.html"><b>pcrepattern</b></a>
|
||||
for a full description of syntax and semantics of the regular expressions that
|
||||
PCRE supports.
|
||||
</P>
|
||||
<P>
|
||||
A pattern must be specified on the command line unless the <b>-f</b> option is
|
||||
used (see below).
|
||||
</P>
|
||||
<P>
|
||||
If no files are specified, <b>pcregrep</b> reads the standard input. By default,
|
||||
each line that matches the pattern is copied to the standard output, and if
|
||||
there is more than one file, the file name is printed before each line of
|
||||
output. However, there are options that can change how <b>pcregrep</b> behaves.
|
||||
</P>
|
||||
<P>
|
||||
Lines are limited to BUFSIZ characters. BUFSIZ is defined in <b><stdio.h></b>.
|
||||
The newline character is removed from the end of each line before it is matched
|
||||
against the pattern.
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">OPTIONS</a><br>
|
||||
<P>
|
||||
<b>-V</b>
|
||||
Write the version number of the PCRE library being used to the standard error
|
||||
stream.
|
||||
</P>
|
||||
<P>
|
||||
<b>-c</b>
|
||||
Do not print individual lines; instead just print a count of the number of
|
||||
lines that would otherwise have been printed. If several files are given, a
|
||||
count is printed for each of them.
|
||||
</P>
|
||||
<P>
|
||||
<b>-f</b><i>filename</i>
|
||||
Read a number of patterns from the file, one per line, and match all of them
|
||||
against each line of input. A line is output if any of the patterns match it.
|
||||
When <b>-f</b> is used, no pattern is taken from the command line; all arguments
|
||||
are treated as file names. There is a maximum of 100 patterns. Trailing white
|
||||
space is removed, and blank lines are ignored. An empty file contains no
|
||||
patterns and therefore matches nothing.
|
||||
</P>
|
||||
<P>
|
||||
<b>-h</b>
|
||||
Suppress printing of filenames when searching multiple files.
|
||||
</P>
|
||||
<P>
|
||||
<b>-i</b>
|
||||
Ignore upper/lower case distinctions during comparisons.
|
||||
</P>
|
||||
<P>
|
||||
<b>-l</b>
|
||||
Instead of printing lines from the files, just print the names of the files
|
||||
containing lines that would have been printed. Each file name is printed
|
||||
once, on a separate line.
|
||||
</P>
|
||||
<P>
|
||||
<b>-n</b>
|
||||
Precede each line by its line number in the file.
|
||||
</P>
|
||||
<P>
|
||||
<b>-r</b>
|
||||
If any file is a directory, recursively scan the files it contains. Without
|
||||
<b>-r</b> a directory is scanned as a normal file.
|
||||
</P>
|
||||
<P>
|
||||
<b>-s</b>
|
||||
Work silently, that is, display nothing except error messages.
|
||||
The exit status indicates whether any matches were found.
|
||||
</P>
|
||||
<P>
|
||||
<b>-u</b>
|
||||
Operate in UTF-8 mode. This option is available only if PCRE has been compiled
|
||||
with UTF-8 support. Both the pattern and each subject line are assumed to be
|
||||
valid strings of UTF-8 characters.
|
||||
</P>
|
||||
<P>
|
||||
<b>-v</b>
|
||||
Invert the sense of the match, so that lines which do <i>not</i> match the
|
||||
pattern are now the ones that are found.
|
||||
</P>
|
||||
<P>
|
||||
<b>-x</b>
|
||||
Force the pattern to be anchored (it must start matching at the beginning of
|
||||
the line) and in addition, require it to match the entire line. This is
|
||||
equivalent to having ^ and $ characters at the start and end of each
|
||||
alternative branch in the regular expression.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">LONG OPTIONS</a><br>
|
||||
<P>
|
||||
Long forms of all the options are available, as in GNU grep. They are shown in
|
||||
the following table:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
-c --count
|
||||
-h --no-filename
|
||||
-i --ignore-case
|
||||
-l --files-with-matches
|
||||
-n --line-number
|
||||
-r --recursive
|
||||
-s --no-messages
|
||||
-u --utf-8
|
||||
-V --version
|
||||
-v --invert-match
|
||||
-x --line-regex
|
||||
-x --line-regexp
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
In addition, --file=<i>filename</i> is equivalent to -f<i>filename</i>, and
|
||||
--help shows the list of options and then exits.
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">DIAGNOSTICS</a><br>
|
||||
<P>
|
||||
Exit status is 0 if any matches were found, 1 if no matches were found, and 2
|
||||
for syntax errors or inacessible files (even if matches were found).
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">AUTHOR</a><br>
|
||||
<P>
|
||||
Philip Hazel <ph10@cam.ac.uk>
|
||||
<br>
|
||||
University Computing Service
|
||||
<br>
|
||||
Cambridge CB2 3QG, England.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 03 February 2003
|
||||
<br>
|
||||
Copyright © 1997-2003 University of Cambridge.
|
File diff suppressed because it is too large
Load Diff
@ -1,93 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcreperform specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page, in case the
|
||||
conversion went wrong.<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">PCRE PERFORMANCE</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">PCRE PERFORMANCE</a><br>
|
||||
<P>
|
||||
Certain items that may appear in regular expression patterns are more efficient
|
||||
than others. It is more efficient to use a character class like [aeiou] than a
|
||||
set of alternatives such as (a|e|i|o|u). In general, the simplest construction
|
||||
that provides the required behaviour is usually the most efficient. Jeffrey
|
||||
Friedl's book contains a lot of discussion about optimizing regular expressions
|
||||
for efficient performance.
|
||||
</P>
|
||||
<P>
|
||||
When a pattern begins with .* not in parentheses, or in parentheses that are
|
||||
not the subject of a backreference, and the PCRE_DOTALL option is set, the
|
||||
pattern is implicitly anchored by PCRE, since it can match only at the start of
|
||||
a subject string. However, if PCRE_DOTALL is not set, PCRE cannot make this
|
||||
optimization, because the . metacharacter does not then match a newline, and if
|
||||
the subject string contains newlines, the pattern may match from the character
|
||||
immediately following one of them instead of from the very start. For example,
|
||||
the pattern
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
.*second
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
matches the subject "first\nand second" (where \n stands for a newline
|
||||
character), with the match starting at the seventh character. In order to do
|
||||
this, PCRE has to retry the match starting after every newline in the subject.
|
||||
</P>
|
||||
<P>
|
||||
If you are using such a pattern with subject strings that do not contain
|
||||
newlines, the best performance is obtained by setting PCRE_DOTALL, or starting
|
||||
the pattern with ^.* to indicate explicit anchoring. That saves PCRE from
|
||||
having to scan along the subject looking for a newline to restart at.
|
||||
</P>
|
||||
<P>
|
||||
Beware of patterns that contain nested indefinite repeats. These can take a
|
||||
long time to run when applied to a string that does not match. Consider the
|
||||
pattern fragment
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
(a+)*
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
This can match "aaaa" in 33 different ways, and this number increases very
|
||||
rapidly as the string gets longer. (The * repeat can match 0, 1, 2, 3, or 4
|
||||
times, and for each of those cases other than 0, the + repeats can match
|
||||
different numbers of times.) When the remainder of the pattern is such that the
|
||||
entire match is going to fail, PCRE has in principle to try every possible
|
||||
variation, and this can take an extremely long time.
|
||||
</P>
|
||||
<P>
|
||||
An optimization catches some of the more simple cases such as
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
(a+)*b
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
where a literal character follows. Before embarking on the standard matching
|
||||
procedure, PCRE checks that there is a "b" later in the subject string, and if
|
||||
there is not, it fails the match immediately. However, when there is no
|
||||
following literal this optimization cannot be used. You can see the difference
|
||||
by comparing the behaviour of
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
(a+)*\d
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
with the pattern above. The former gives a failure almost instantly when
|
||||
applied to a whole line of "a" characters, whereas the latter takes an
|
||||
appreciable time with strings longer than about 20 characters.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 03 February 2003
|
||||
<br>
|
||||
Copyright © 1997-2003 University of Cambridge.
|
@ -1,237 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcreposix specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page, in case the
|
||||
conversion went wrong.<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">SYNOPSIS OF POSIX API</a>
|
||||
<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
|
||||
<li><a name="TOC3" href="#SEC3">COMPILING A PATTERN</a>
|
||||
<li><a name="TOC4" href="#SEC4">MATCHING NEWLINE CHARACTERS</a>
|
||||
<li><a name="TOC5" href="#SEC5">MATCHING A PATTERN</a>
|
||||
<li><a name="TOC6" href="#SEC6">ERROR MESSAGES</a>
|
||||
<li><a name="TOC7" href="#SEC7">STORAGE</a>
|
||||
<li><a name="TOC8" href="#SEC8">AUTHOR</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">SYNOPSIS OF POSIX API</a><br>
|
||||
<P>
|
||||
<b>#include <pcreposix.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int regcomp(regex_t *<i>preg</i>, const char *<i>pattern</i>,</b>
|
||||
<b>int <i>cflags</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int regexec(regex_t *<i>preg</i>, const char *<i>string</i>,</b>
|
||||
<b>size_t <i>nmatch</i>, regmatch_t <i>pmatch</i>[], int <i>eflags</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b>size_t regerror(int <i>errcode</i>, const regex_t *<i>preg</i>,</b>
|
||||
<b>char *<i>errbuf</i>, size_t <i>errbuf_size</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b>void regfree(regex_t *<i>preg</i>);</b>
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
|
||||
<P>
|
||||
This set of functions provides a POSIX-style API to the PCRE regular expression
|
||||
package. See the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
documentation for a description of the native API, which contains additional
|
||||
functionality.
|
||||
</P>
|
||||
<P>
|
||||
The functions described here are just wrapper functions that ultimately call
|
||||
the PCRE native API. Their prototypes are defined in the <b>pcreposix.h</b>
|
||||
header file, and on Unix systems the library itself is called
|
||||
<b>pcreposix.a</b>, so can be accessed by adding <b>-lpcreposix</b> to the
|
||||
command for linking an application which uses them. Because the POSIX functions
|
||||
call the native ones, it is also necessary to add \fR-lpcre\fR.
|
||||
</P>
|
||||
<P>
|
||||
I have implemented only those option bits that can be reasonably mapped to PCRE
|
||||
native options. In addition, the options REG_EXTENDED and REG_NOSUB are defined
|
||||
with the value zero. They have no effect, but since programs that are written
|
||||
to the POSIX interface often use them, this makes it easier to slot in PCRE as
|
||||
a replacement library. Other POSIX options are not even defined.
|
||||
</P>
|
||||
<P>
|
||||
When PCRE is called via these functions, it is only the API that is POSIX-like
|
||||
in style. The syntax and semantics of the regular expressions themselves are
|
||||
still those of Perl, subject to the setting of various PCRE options, as
|
||||
described below. "POSIX-like in style" means that the API approximates to the
|
||||
POSIX definition; it is not fully POSIX-compatible, and in multi-byte encoding
|
||||
domains it is probably even less compatible.
|
||||
</P>
|
||||
<P>
|
||||
The header for these functions is supplied as <b>pcreposix.h</b> to avoid any
|
||||
potential clash with other POSIX libraries. It can, of course, be renamed or
|
||||
aliased as <b>regex.h</b>, which is the "correct" name. It provides two
|
||||
structure types, <i>regex_t</i> for compiled internal forms, and
|
||||
<i>regmatch_t</i> for returning captured substrings. It also defines some
|
||||
constants whose names start with "REG_"; these are used for setting options and
|
||||
identifying error codes.
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">COMPILING A PATTERN</a><br>
|
||||
<P>
|
||||
The function <b>regcomp()</b> is called to compile a pattern into an
|
||||
internal form. The pattern is a C string terminated by a binary zero, and
|
||||
is passed in the argument <i>pattern</i>. The <i>preg</i> argument is a pointer
|
||||
to a regex_t structure which is used as a base for storing information about
|
||||
the compiled expression.
|
||||
</P>
|
||||
<P>
|
||||
The argument <i>cflags</i> is either zero, or contains one or more of the bits
|
||||
defined by the following macros:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
REG_ICASE
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The PCRE_CASELESS option is set when the expression is passed for compilation
|
||||
to the native function.
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
REG_NEWLINE
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The PCRE_MULTILINE option is set when the expression is passed for compilation
|
||||
to the native function. Note that this does <i>not</i> mimic the defined POSIX
|
||||
behaviour for REG_NEWLINE (see the following section).
|
||||
</P>
|
||||
<P>
|
||||
In the absence of these flags, no options are passed to the native function.
|
||||
This means the the regex is compiled with PCRE default semantics. In
|
||||
particular, the way it handles newline characters in the subject string is the
|
||||
Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only
|
||||
<i>some</i> of the effects specified for REG_NEWLINE. It does not affect the way
|
||||
newlines are matched by . (they aren't) or by a negative class such as [^a]
|
||||
(they are).
|
||||
</P>
|
||||
<P>
|
||||
The yield of <b>regcomp()</b> is zero on success, and non-zero otherwise. The
|
||||
<i>preg</i> structure is filled in on success, and one member of the structure
|
||||
is public: <i>re_nsub</i> contains the number of capturing subpatterns in
|
||||
the regular expression. Various error codes are defined in the header file.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">MATCHING NEWLINE CHARACTERS</a><br>
|
||||
<P>
|
||||
This area is not simple, because POSIX and Perl take different views of things.
|
||||
It is not possible to get PCRE to obey POSIX semantics, but then PCRE was never
|
||||
intended to be a POSIX engine. The following table lists the different
|
||||
possibilities for matching newline characters in PCRE:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
Default Change with
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
. matches newline no PCRE_DOTALL
|
||||
newline matches [^a] yes not changeable
|
||||
$ matches \n at end yes PCRE_DOLLARENDONLY
|
||||
$ matches \n in middle no PCRE_MULTILINE
|
||||
^ matches \n in middle no PCRE_MULTILINE
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
This is the equivalent table for POSIX:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
Default Change with
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
. matches newline yes REG_NEWLINE
|
||||
newline matches [^a] yes REG_NEWLINE
|
||||
$ matches \n at end no REG_NEWLINE
|
||||
$ matches \n in middle no REG_NEWLINE
|
||||
^ matches \n in middle no REG_NEWLINE
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
PCRE's behaviour is the same as Perl's, except that there is no equivalent for
|
||||
PCRE_DOLLARENDONLY in Perl. In both PCRE and Perl, there is no way to stop
|
||||
newline from matching [^a].
|
||||
</P>
|
||||
<P>
|
||||
The default POSIX newline handling can be obtained by setting PCRE_DOTALL and
|
||||
PCRE_DOLLARENDONLY, but there is no way to make PCRE behave exactly as for the
|
||||
REG_NEWLINE action.
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">MATCHING A PATTERN</a><br>
|
||||
<P>
|
||||
The function <b>regexec()</b> is called to match a pre-compiled pattern
|
||||
<i>preg</i> against a given <i>string</i>, which is terminated by a zero byte,
|
||||
subject to the options in <i>eflags</i>. These can be:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
REG_NOTBOL
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The PCRE_NOTBOL option is set when calling the underlying PCRE matching
|
||||
function.
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
REG_NOTEOL
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
The PCRE_NOTEOL option is set when calling the underlying PCRE matching
|
||||
function.
|
||||
</P>
|
||||
<P>
|
||||
The portion of the string that was matched, and also any captured substrings,
|
||||
are returned via the <i>pmatch</i> argument, which points to an array of
|
||||
<i>nmatch</i> structures of type <i>regmatch_t</i>, containing the members
|
||||
<i>rm_so</i> and <i>rm_eo</i>. These contain the offset to the first character of
|
||||
each substring and the offset to the first character after the end of each
|
||||
substring, respectively. The 0th element of the vector relates to the entire
|
||||
portion of <i>string</i> that was matched; subsequent elements relate to the
|
||||
capturing subpatterns of the regular expression. Unused entries in the array
|
||||
have both structure members set to -1.
|
||||
</P>
|
||||
<P>
|
||||
A successful match yields a zero return; various error codes are defined in the
|
||||
header file, of which REG_NOMATCH is the "expected" failure code.
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">ERROR MESSAGES</a><br>
|
||||
<P>
|
||||
The <b>regerror()</b> function maps a non-zero errorcode from either
|
||||
<b>regcomp()</b> or <b>regexec()</b> to a printable message. If <i>preg</i> is not
|
||||
NULL, the error should have arisen from the use of that structure. A message
|
||||
terminated by a binary zero is placed in <i>errbuf</i>. The length of the
|
||||
message, including the zero, is limited to <i>errbuf_size</i>. The yield of the
|
||||
function is the size of buffer needed to hold the whole message.
|
||||
</P>
|
||||
<br><a name="SEC7" href="#TOC1">STORAGE</a><br>
|
||||
<P>
|
||||
Compiling a regular expression causes memory to be allocated and associated
|
||||
with the <i>preg</i> structure. The function <b>regfree()</b> frees all such
|
||||
memory, after which <i>preg</i> may no longer be used as a compiled expression.
|
||||
</P>
|
||||
<br><a name="SEC8" href="#TOC1">AUTHOR</a><br>
|
||||
<P>
|
||||
Philip Hazel <ph10@cam.ac.uk>
|
||||
<br>
|
||||
University Computing Service,
|
||||
<br>
|
||||
Cambridge CB2 3QG, England.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 03 February 2003
|
||||
<br>
|
||||
Copyright © 1997-2003 University of Cambridge.
|
@ -1,79 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcresample specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page, in case the
|
||||
conversion went wrong.<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">PCRE SAMPLE PROGRAM</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">PCRE SAMPLE PROGRAM</a><br>
|
||||
<P>
|
||||
A simple, complete demonstration program, to get you started with using PCRE,
|
||||
is supplied in the file <i>pcredemo.c</i> in the PCRE distribution.
|
||||
</P>
|
||||
<P>
|
||||
The program compiles the regular expression that is its first argument, and
|
||||
matches it against the subject string in its second argument. No PCRE options
|
||||
are set, and default character tables are used. If matching succeeds, the
|
||||
program outputs the portion of the subject that matched, together with the
|
||||
contents of any captured substrings.
|
||||
</P>
|
||||
<P>
|
||||
If the -g option is given on the command line, the program then goes on to
|
||||
check for further matches of the same regular expression in the same subject
|
||||
string. The logic is a little bit tricky because of the possibility of matching
|
||||
an empty string. Comments in the code explain what is going on.
|
||||
</P>
|
||||
<P>
|
||||
On a Unix system that has PCRE installed in <i>/usr/local</i>, you can compile
|
||||
the demonstration program using a command like this:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
gcc -o pcredemo pcredemo.c -I/usr/local/include \
|
||||
-L/usr/local/lib -lpcre
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
Then you can run simple tests like this:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
./pcredemo 'cat|dog' 'the cat sat on the mat'
|
||||
./pcredemo -g 'cat|dog' 'the dog sat on the cat'
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
Note that there is a much more comprehensive test program, called
|
||||
<b>pcretest</b>, which supports many more facilities for testing regular
|
||||
expressions and the PCRE library. The <b>pcredemo</b> program is provided as a
|
||||
simple coding example.
|
||||
</P>
|
||||
<P>
|
||||
On some operating systems (e.g. Solaris) you may get an error like this when
|
||||
you try to run <b>pcredemo</b>:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such file or directory
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
This is caused by the way shared library support works on those systems. You
|
||||
need to add
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
-R/usr/local/lib
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
to the compile command to get round this problem.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 28 January 2003
|
||||
<br>
|
||||
Copyright © 1997-2003 University of Cambridge.
|
@ -1,443 +0,0 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcretest specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
This HTML document has been generated automatically from the original man page.
|
||||
If there is any nonsense in it, please consult the man page, in case the
|
||||
conversion went wrong.<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
|
||||
<li><a name="TOC2" href="#SEC2">OPTIONS</a>
|
||||
<li><a name="TOC3" href="#SEC3">DESCRIPTION</a>
|
||||
<li><a name="TOC4" href="#SEC4">PATTERN MODIFIERS</a>
|
||||
<li><a name="TOC5" href="#SEC5">CALLOUTS</a>
|
||||
<li><a name="TOC6" href="#SEC6">DATA LINES</a>
|
||||
<li><a name="TOC7" href="#SEC7">OUTPUT FROM PCRETEST</a>
|
||||
<li><a name="TOC8" href="#SEC8">AUTHOR</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
|
||||
<P>
|
||||
<b>pcretest [-d] [-i] [-m] [-o osize] [-p] [-t] [source] [destination]</b>
|
||||
</P>
|
||||
<P>
|
||||
<b>pcretest</b> was written as a test program for the PCRE regular expression
|
||||
library itself, but it can also be used for experimenting with regular
|
||||
expressions. This document describes the features of the test program; for
|
||||
details of the regular expressions themselves, see the
|
||||
<a href="pcrepattern.html"><b>pcrepattern</b></a>
|
||||
documentation. For details of PCRE and its options, see the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">OPTIONS</a><br>
|
||||
<P>
|
||||
<b>-C</b>
|
||||
Output the version number of the PCRE library, and all available information
|
||||
about the optional features that are included, and then exit.
|
||||
</P>
|
||||
<P>
|
||||
<b>-d</b>
|
||||
Behave as if each regex had the <b>/D</b> modifier (see below); the internal
|
||||
form is output after compilation.
|
||||
</P>
|
||||
<P>
|
||||
<b>-i</b>
|
||||
Behave as if each regex had the <b>/I</b> modifier; information about the
|
||||
compiled pattern is given after compilation.
|
||||
</P>
|
||||
<P>
|
||||
<b>-m</b>
|
||||
Output the size of each compiled pattern after it has been compiled. This is
|
||||
equivalent to adding /M to each regular expression. For compatibility with
|
||||
earlier versions of pcretest, <b>-s</b> is a synonym for <b>-m</b>.
|
||||
</P>
|
||||
<P>
|
||||
<b>-o</b> <i>osize</i>
|
||||
Set the number of elements in the output vector that is used when calling PCRE
|
||||
to be <i>osize</i>. The default value is 45, which is enough for 14 capturing
|
||||
subexpressions. The vector size can be changed for individual matching calls by
|
||||
including \O in the data line (see below).
|
||||
</P>
|
||||
<P>
|
||||
<b>-p</b>
|
||||
Behave as if each regex has <b>/P</b> modifier; the POSIX wrapper API is used
|
||||
to call PCRE. None of the other options has any effect when <b>-p</b> is set.
|
||||
</P>
|
||||
<P>
|
||||
<b>-t</b>
|
||||
Run each compile, study, and match many times with a timer, and output
|
||||
resulting time per compile or match (in milliseconds). Do not set <b>-t</b> with
|
||||
<b>-m</b>, because you will then get the size output 20000 times and the timing
|
||||
will be distorted.
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">DESCRIPTION</a><br>
|
||||
<P>
|
||||
If <b>pcretest</b> is given two filename arguments, it reads from the first and
|
||||
writes to the second. If it is given only one filename argument, it reads from
|
||||
that file and writes to stdout. Otherwise, it reads from stdin and writes to
|
||||
stdout, and prompts for each line of input, using "re>" to prompt for regular
|
||||
expressions, and "data>" to prompt for data lines.
|
||||
</P>
|
||||
<P>
|
||||
The program handles any number of sets of input on a single input file. Each
|
||||
set starts with a regular expression, and continues with any number of data
|
||||
lines to be matched against the pattern.
|
||||
</P>
|
||||
<P>
|
||||
Each line is matched separately and independently. If you want to do
|
||||
multiple-line matches, you have to use the \n escape sequence in a single line
|
||||
of input to encode the newline characters. The maximum length of data line is
|
||||
30,000 characters.
|
||||
</P>
|
||||
<P>
|
||||
An empty line signals the end of the data lines, at which point a new regular
|
||||
expression is read. The regular expressions are given enclosed in any
|
||||
non-alphameric delimiters other than backslash, for example
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
/(a|bc)x+yz/
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
White space before the initial delimiter is ignored. A regular expression may
|
||||
be continued over several input lines, in which case the newline characters are
|
||||
included within it. It is possible to include the delimiter within the pattern
|
||||
by escaping it, for example
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
/abc\/def/
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
If you do so, the escape and the delimiter form part of the pattern, but since
|
||||
delimiters are always non-alphameric, this does not affect its interpretation.
|
||||
If the terminating delimiter is immediately followed by a backslash, for
|
||||
example,
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
/abc/\
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
then a backslash is added to the end of the pattern. This is done to provide a
|
||||
way of testing the error condition that arises if a pattern finishes with a
|
||||
backslash, because
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
/abc\/
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
is interpreted as the first line of a pattern that starts with "abc/", causing
|
||||
pcretest to read the next line as a continuation of the regular expression.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">PATTERN MODIFIERS</a><br>
|
||||
<P>
|
||||
The pattern may be followed by <b>i</b>, <b>m</b>, <b>s</b>, or <b>x</b> to set the
|
||||
PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options,
|
||||
respectively. For example:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
/caseless/i
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
These modifier letters have the same effect as they do in Perl. There are
|
||||
others that set PCRE options that do not correspond to anything in Perl:
|
||||
<b>/A</b>, <b>/E</b>, <b>/N</b>, <b>/U</b>, and <b>/X</b> set PCRE_ANCHORED,
|
||||
PCRE_DOLLAR_ENDONLY, PCRE_NO_AUTO_CAPTURE, PCRE_UNGREEDY, and PCRE_EXTRA
|
||||
respectively.
|
||||
</P>
|
||||
<P>
|
||||
Searching for all possible matches within each subject string can be requested
|
||||
by the <b>/g</b> or <b>/G</b> modifier. After finding a match, PCRE is called
|
||||
again to search the remainder of the subject string. The difference between
|
||||
<b>/g</b> and <b>/G</b> is that the former uses the <i>startoffset</i> argument to
|
||||
<b>pcre_exec()</b> to start searching at a new point within the entire string
|
||||
(which is in effect what Perl does), whereas the latter passes over a shortened
|
||||
substring. This makes a difference to the matching process if the pattern
|
||||
begins with a lookbehind assertion (including \b or \B).
|
||||
</P>
|
||||
<P>
|
||||
If any call to <b>pcre_exec()</b> in a <b>/g</b> or <b>/G</b> sequence matches an
|
||||
empty string, the next call is done with the PCRE_NOTEMPTY and PCRE_ANCHORED
|
||||
flags set in order to search for another, non-empty, match at the same point.
|
||||
If this second match fails, the start offset is advanced by one, and the normal
|
||||
match is retried. This imitates the way Perl handles such cases when using the
|
||||
<b>/g</b> modifier or the <b>split()</b> function.
|
||||
</P>
|
||||
<P>
|
||||
There are a number of other modifiers for controlling the way <b>pcretest</b>
|
||||
operates.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/+</b> modifier requests that as well as outputting the substring that
|
||||
matched the entire pattern, pcretest should in addition output the remainder of
|
||||
the subject string. This is useful for tests where the subject contains
|
||||
multiple copies of the same substring.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/L</b> modifier must be followed directly by the name of a locale, for
|
||||
example,
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
/pattern/Lfr
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
For this reason, it must be the last modifier letter. The given locale is set,
|
||||
<b>pcre_maketables()</b> is called to build a set of character tables for the
|
||||
locale, and this is then passed to <b>pcre_compile()</b> when compiling the
|
||||
regular expression. Without an <b>/L</b> modifier, NULL is passed as the tables
|
||||
pointer; that is, <b>/L</b> applies only to the expression on which it appears.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/I</b> modifier requests that <b>pcretest</b> output information about the
|
||||
compiled expression (whether it is anchored, has a fixed first character, and
|
||||
so on). It does this by calling <b>pcre_fullinfo()</b> after compiling an
|
||||
expression, and outputting the information it gets back. If the pattern is
|
||||
studied, the results of that are also output.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/D</b> modifier is a PCRE debugging feature, which also assumes <b>/I</b>.
|
||||
It causes the internal form of compiled regular expressions to be output after
|
||||
compilation. If the pattern was studied, the information returned is also
|
||||
output.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/S</b> modifier causes <b>pcre_study()</b> to be called after the
|
||||
expression has been compiled, and the results used when the expression is
|
||||
matched.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/M</b> modifier causes the size of memory block used to hold the compiled
|
||||
pattern to be output.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/P</b> modifier causes <b>pcretest</b> to call PCRE via the POSIX wrapper
|
||||
API rather than its native API. When this is done, all other modifiers except
|
||||
<b>/i</b>, <b>/m</b>, and <b>/+</b> are ignored. REG_ICASE is set if <b>/i</b> is
|
||||
present, and REG_NEWLINE is set if <b>/m</b> is present. The wrapper functions
|
||||
force PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/8</b> modifier causes <b>pcretest</b> to call PCRE with the PCRE_UTF8
|
||||
option set. This turns on support for UTF-8 character handling in PCRE,
|
||||
provided that it was compiled with this support enabled. This modifier also
|
||||
causes any non-printing characters in output strings to be printed using the
|
||||
\x{hh...} notation if they are valid UTF-8 sequences.
|
||||
</P>
|
||||
<P>
|
||||
If the <b>/?</b> modifier is used with <b>/8</b>, it causes <b>pcretest</b> to
|
||||
call <b>pcre_compile()</b> with the PCRE_NO_UTF8_CHECK option, to suppress the
|
||||
checking of the string for UTF-8 validity.
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">CALLOUTS</a><br>
|
||||
<P>
|
||||
If the pattern contains any callout requests, <b>pcretest</b>'s callout function
|
||||
will be called. By default, it displays the callout number, and the start and
|
||||
current positions in the text at the callout time. For example, the output
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
--->pqrabcdef
|
||||
0 ^ ^
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
indicates that callout number 0 occurred for a match attempt starting at the
|
||||
fourth character of the subject string, when the pointer was at the seventh
|
||||
character. The callout function returns zero (carry on matching) by default.
|
||||
</P>
|
||||
<P>
|
||||
Inserting callouts may be helpful when using <b>pcretest</b> to check
|
||||
complicated regular expressions. For further information about callouts, see
|
||||
the
|
||||
<a href="pcrecallout.html"><b>pcrecallout</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<P>
|
||||
For testing the PCRE library, additional control of callout behaviour is
|
||||
available via escape sequences in the data, as described in the following
|
||||
section. In particular, it is possible to pass in a number as callout data (the
|
||||
default is zero). If the callout function receives a non-zero number, it
|
||||
returns that value instead of zero.
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">DATA LINES</a><br>
|
||||
<P>
|
||||
Before each data line is passed to <b>pcre_exec()</b>, leading and trailing
|
||||
whitespace is removed, and it is then scanned for \ escapes. Some of these are
|
||||
pretty esoteric features, intended for checking out some of the more
|
||||
complicated features of PCRE. If you are just testing "ordinary" regular
|
||||
expressions, you probably don't need any of these. The following escapes are
|
||||
recognized:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
\a alarm (= BEL)
|
||||
\b backspace
|
||||
\e escape
|
||||
\f formfeed
|
||||
\n newline
|
||||
\r carriage return
|
||||
\t tab
|
||||
\v vertical tab
|
||||
\nnn octal character (up to 3 octal digits)
|
||||
\xhh hexadecimal character (up to 2 hex digits)
|
||||
\x{hh...} hexadecimal character, any number of digits
|
||||
in UTF-8 mode
|
||||
\A pass the PCRE_ANCHORED option to <b>pcre_exec()</b>
|
||||
\B pass the PCRE_NOTBOL option to <b>pcre_exec()</b>
|
||||
\Cdd call pcre_copy_substring() for substring dd
|
||||
after a successful match (any decimal number
|
||||
less than 32)
|
||||
\Cname call pcre_copy_named_substring() for substring
|
||||
"name" after a successful match (name termin-
|
||||
ated by next non alphanumeric character)
|
||||
\C+ show the current captured substrings at callout
|
||||
time
|
||||
\C- do not supply a callout function
|
||||
\C!n return 1 instead of 0 when callout number n is
|
||||
reached
|
||||
\C!n!m return 1 instead of 0 when callout number n is
|
||||
reached for the nth time
|
||||
\C*n pass the number n (may be negative) as callout
|
||||
data
|
||||
\Gdd call pcre_get_substring() for substring dd
|
||||
after a successful match (any decimal number
|
||||
less than 32)
|
||||
\Gname call pcre_get_named_substring() for substring
|
||||
"name" after a successful match (name termin-
|
||||
ated by next non-alphanumeric character)
|
||||
\L call pcre_get_substringlist() after a
|
||||
successful match
|
||||
\M discover the minimum MATCH_LIMIT setting
|
||||
\N pass the PCRE_NOTEMPTY option to <b>pcre_exec()</b>
|
||||
\Odd set the size of the output vector passed to
|
||||
<b>pcre_exec()</b> to dd (any number of decimal
|
||||
digits)
|
||||
\S output details of memory get/free calls during matching
|
||||
\Z pass the PCRE_NOTEOL option to <b>pcre_exec()</b>
|
||||
\? pass the PCRE_NO_UTF8_CHECK option to
|
||||
<b>pcre_exec()</b>
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
If \M is present, <b>pcretest</b> calls <b>pcre_exec()</b> several times, with
|
||||
different values in the <i>match_limit</i> field of the <b>pcre_extra</b> data
|
||||
structure, until it finds the minimum number that is needed for
|
||||
<b>pcre_exec()</b> to complete. This number is a measure of the amount of
|
||||
recursion and backtracking that takes place, and checking it out can be
|
||||
instructive. For most simple matches, the number is quite small, but for
|
||||
patterns with very large numbers of matching possibilities, it can become large
|
||||
very quickly with increasing length of subject string.
|
||||
</P>
|
||||
<P>
|
||||
When \O is used, it may be higher or lower than the size set by the <b>-O</b>
|
||||
option (or defaulted to 45); \O applies only to the call of <b>pcre_exec()</b>
|
||||
for the line in which it appears.
|
||||
</P>
|
||||
<P>
|
||||
A backslash followed by anything else just escapes the anything else. If the
|
||||
very last character is a backslash, it is ignored. This gives a way of passing
|
||||
an empty line as data, since a real empty line terminates the data input.
|
||||
</P>
|
||||
<P>
|
||||
If <b>/P</b> was present on the regex, causing the POSIX wrapper API to be used,
|
||||
only <b>\B</b>, and <b>\Z</b> have any effect, causing REG_NOTBOL and REG_NOTEOL
|
||||
to be passed to <b>regexec()</b> respectively.
|
||||
</P>
|
||||
<P>
|
||||
The use of \x{hh...} to represent UTF-8 characters is not dependent on the use
|
||||
of the <b>/8</b> modifier on the pattern. It is recognized always. There may be
|
||||
any number of hexadecimal digits inside the braces. The result is from one to
|
||||
six bytes, encoded according to the UTF-8 rules.
|
||||
</P>
|
||||
<br><a name="SEC7" href="#TOC1">OUTPUT FROM PCRETEST</a><br>
|
||||
<P>
|
||||
When a match succeeds, pcretest outputs the list of captured substrings that
|
||||
<b>pcre_exec()</b> returns, starting with number 0 for the string that matched
|
||||
the whole pattern. Here is an example of an interactive pcretest run.
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
$ pcretest
|
||||
PCRE version 4.00 08-Jan-2003
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
re> /^abc(\d+)/
|
||||
data> abc123
|
||||
0: abc123
|
||||
1: 123
|
||||
data> xyz
|
||||
No match
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
If the strings contain any non-printing characters, they are output as \0x
|
||||
escapes, or as \x{...} escapes if the <b>/8</b> modifier was present on the
|
||||
pattern. If the pattern has the <b>/+</b> modifier, then the output for
|
||||
substring 0 is followed by the the rest of the subject string, identified by
|
||||
"0+" like this:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
re> /cat/+
|
||||
data> cataract
|
||||
0: cat
|
||||
0+ aract
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
If the pattern has the <b>/g</b> or <b>/G</b> modifier, the results of successive
|
||||
matching attempts are output in sequence, like this:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
re> /\Bi(\w\w)/g
|
||||
data> Mississippi
|
||||
0: iss
|
||||
1: ss
|
||||
0: iss
|
||||
1: ss
|
||||
0: ipp
|
||||
1: pp
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
"No match" is output only if the first match attempt fails.
|
||||
</P>
|
||||
<P>
|
||||
If any of the sequences <b>\C</b>, <b>\G</b>, or <b>\L</b> are present in a
|
||||
data line that is successfully matched, the substrings extracted by the
|
||||
convenience functions are output with C, G, or L after the string number
|
||||
instead of a colon. This is in addition to the normal full list. The string
|
||||
length (that is, the return from the extraction function) is given in
|
||||
parentheses after each string for <b>\C</b> and <b>\G</b>.
|
||||
</P>
|
||||
<P>
|
||||
Note that while patterns can be continued over several lines (a plain ">"
|
||||
prompt is used for continuations), data lines may not. However newlines can be
|
||||
included in data by means of the \n escape.
|
||||
</P>
|
||||
<br><a name="SEC8" href="#TOC1">AUTHOR</a><br>
|
||||
<P>
|
||||
Philip Hazel <ph10@cam.ac.uk>
|
||||
<br>
|
||||
University Computing Service,
|
||||
<br>
|
||||
Cambridge CB2 3QG, England.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 09 December 2003
|
||||
<br>
|
||||
Copyright © 1997-2003 University of Cambridge.
|
@ -1,174 +0,0 @@
|
||||
.TH PCRE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
The PCRE library is a set of functions that implement regular expression
|
||||
pattern matching using the same syntax and semantics as Perl, with just a few
|
||||
differences. The current implementation of PCRE (release 4.x) corresponds
|
||||
approximately with Perl 5.8, including support for UTF-8 encoded strings.
|
||||
However, this support has to be explicitly enabled; it is not the default.
|
||||
|
||||
PCRE is written in C and released as a C library. However, a number of people
|
||||
have written wrappers and interfaces of various kinds. A C++ class is included
|
||||
in these contributions, which can be found in the \fIContrib\fR directory at
|
||||
the primary FTP site, which is:
|
||||
|
||||
.\" HTML <a href="ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre">
|
||||
.\" </a>
|
||||
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre
|
||||
|
||||
Details of exactly which Perl regular expression features are and are not
|
||||
supported by PCRE are given in separate documents. See the
|
||||
.\" HREF
|
||||
\fBpcrepattern\fR
|
||||
.\"
|
||||
and
|
||||
.\" HREF
|
||||
\fBpcrecompat\fR
|
||||
.\"
|
||||
pages.
|
||||
|
||||
Some features of PCRE can be included, excluded, or changed when the library is
|
||||
built. The
|
||||
.\" HREF
|
||||
\fBpcre_config()\fR
|
||||
.\"
|
||||
function makes it possible for a client to discover which features are
|
||||
available. Documentation about building PCRE for various operating systems can
|
||||
be found in the \fBREADME\fR file in the source distribution.
|
||||
|
||||
.SH USER DOCUMENTATION
|
||||
.rs
|
||||
.sp
|
||||
The user documentation for PCRE has been split up into a number of different
|
||||
sections. In the "man" format, each of these is a separate "man page". In the
|
||||
HTML format, each is a separate page, linked from the index page. In the plain
|
||||
text format, all the sections are concatenated, for ease of searching. The
|
||||
sections are as follows:
|
||||
|
||||
pcre this document
|
||||
pcreapi details of PCRE's native API
|
||||
pcrebuild options for building PCRE
|
||||
pcrecallout details of the callout feature
|
||||
pcrecompat discussion of Perl compatibility
|
||||
pcregrep description of the \fBpcregrep\fR command
|
||||
pcrepattern syntax and semantics of supported
|
||||
regular expressions
|
||||
pcreperform discussion of performance issues
|
||||
pcreposix the POSIX-compatible API
|
||||
pcresample discussion of the sample program
|
||||
pcretest the \fBpcretest\fR testing command
|
||||
|
||||
In addition, in the "man" and HTML formats, there is a short page for each
|
||||
library function, listing its arguments and results.
|
||||
|
||||
.SH LIMITATIONS
|
||||
.rs
|
||||
.sp
|
||||
There are some size limitations in PCRE but it is hoped that they will never in
|
||||
practice be relevant.
|
||||
|
||||
The maximum length of a compiled pattern is 65539 (sic) bytes if PCRE is
|
||||
compiled with the default internal linkage size of 2. If you want to process
|
||||
regular expressions that are truly enormous, you can compile PCRE with an
|
||||
internal linkage size of 3 or 4 (see the \fBREADME\fR file in the source
|
||||
distribution and the
|
||||
.\" HREF
|
||||
\fBpcrebuild\fR
|
||||
.\"
|
||||
documentation for details). If these cases the limit is substantially larger.
|
||||
However, the speed of execution will be slower.
|
||||
|
||||
All values in repeating quantifiers must be less than 65536.
|
||||
The maximum number of capturing subpatterns is 65535.
|
||||
|
||||
There is no limit to the number of non-capturing subpatterns, but the maximum
|
||||
depth of nesting of all kinds of parenthesized subpattern, including capturing
|
||||
subpatterns, assertions, and other types of subpattern, is 200.
|
||||
|
||||
The maximum length of a subject string is the largest positive number that an
|
||||
integer variable can hold. However, PCRE uses recursion to handle subpatterns
|
||||
and indefinite repetition. This means that the available stack space may limit
|
||||
the size of a subject string that can be processed by certain patterns.
|
||||
|
||||
.\" HTML <a name="utf8support"></a>
|
||||
.SH UTF-8 SUPPORT
|
||||
.rs
|
||||
.sp
|
||||
Starting at release 3.3, PCRE has had some support for character strings
|
||||
encoded in the UTF-8 format. For release 4.0 this has been greatly extended to
|
||||
cover most common requirements.
|
||||
|
||||
In order process UTF-8 strings, you must build PCRE to include UTF-8 support in
|
||||
the code, and, in addition, you must call
|
||||
.\" HREF
|
||||
\fBpcre_compile()\fR
|
||||
.\"
|
||||
with the PCRE_UTF8 option flag. When you do this, both the pattern and any
|
||||
subject strings that are matched against it are treated as UTF-8 strings
|
||||
instead of just strings of bytes.
|
||||
|
||||
If you compile PCRE with UTF-8 support, but do not use it at run time, the
|
||||
library will be a bit bigger, but the additional run time overhead is limited
|
||||
to testing the PCRE_UTF8 flag in several places, so should not be very large.
|
||||
|
||||
The following comments apply when PCRE is running in UTF-8 mode:
|
||||
|
||||
1. When you set the PCRE_UTF8 flag, the strings passed as patterns and subjects
|
||||
are checked for validity on entry to the relevant functions. If an invalid
|
||||
UTF-8 string is passed, an error return is given. In some situations, you may
|
||||
already know that your strings are valid, and therefore want to skip these
|
||||
checks in order to improve performance. If you set the PCRE_NO_UTF8_CHECK flag
|
||||
at compile time or at run time, PCRE assumes that the pattern or subject it
|
||||
is given (respectively) contains only valid UTF-8 codes. In this case, it does
|
||||
not diagnose an invalid UTF-8 string. If you pass an invalid UTF-8 string to
|
||||
PCRE when PCRE_NO_UTF8_CHECK is set, the results are undefined. Your program
|
||||
may crash.
|
||||
|
||||
2. In a pattern, the escape sequence \\x{...}, where the contents of the braces
|
||||
is a string of hexadecimal digits, is interpreted as a UTF-8 character whose
|
||||
code number is the given hexadecimal number, for example: \\x{1234}. If a
|
||||
non-hexadecimal digit appears between the braces, the item is not recognized.
|
||||
This escape sequence can be used either as a literal, or within a character
|
||||
class.
|
||||
|
||||
3. The original hexadecimal escape sequence, \\xhh, matches a two-byte UTF-8
|
||||
character if the value is greater than 127.
|
||||
|
||||
4. Repeat quantifiers apply to complete UTF-8 characters, not to individual
|
||||
bytes, for example: \\x{100}{3}.
|
||||
|
||||
5. The dot metacharacter matches one UTF-8 character instead of a single byte.
|
||||
|
||||
6. The escape sequence \\C can be used to match a single byte in UTF-8 mode,
|
||||
but its use can lead to some strange effects.
|
||||
|
||||
7. The character escapes \\b, \\B, \\d, \\D, \\s, \\S, \\w, and \\W correctly
|
||||
test characters of any code value, but the characters that PCRE recognizes as
|
||||
digits, spaces, or word characters remain the same set as before, all with
|
||||
values less than 256.
|
||||
|
||||
8. Case-insensitive matching applies only to characters whose values are less
|
||||
than 256. PCRE does not support the notion of "case" for higher-valued
|
||||
characters.
|
||||
|
||||
9. PCRE does not support the use of Unicode tables and properties or the Perl
|
||||
escapes \\p, \\P, and \\X.
|
||||
|
||||
.SH AUTHOR
|
||||
.rs
|
||||
.sp
|
||||
Philip Hazel <ph10@cam.ac.uk>
|
||||
.br
|
||||
University Computing Service,
|
||||
.br
|
||||
Cambridge CB2 3QG, England.
|
||||
.br
|
||||
Phone: +44 1223 334714
|
||||
|
||||
.in 0
|
||||
Last updated: 20 August 2003
|
||||
.br
|
||||
Copyright (c) 1997-2003 University of Cambridge.
|
File diff suppressed because it is too large
Load Diff
@ -1,59 +0,0 @@
|
||||
.TH PCRE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B pcre *pcre_compile(const char *\fIpattern\fR, int \fIoptions\fR,
|
||||
.ti +5n
|
||||
.B const char **\fIerrptr\fR, int *\fIerroffset\fR,
|
||||
.ti +5n
|
||||
.B const unsigned char *\fItableptr\fR);
|
||||
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function compiles a regular expression into an internal form. Its
|
||||
arguments are:
|
||||
|
||||
\fIpattern\fR A zero-terminated string containing the
|
||||
regular expression to be compiled
|
||||
\fIoptions\fR Zero or more option bits
|
||||
\fIerrptr\fR Where to put an error message
|
||||
\fIerroffset\fR Offset in pattern where error was found
|
||||
\fItableptr\fR Pointer to character tables, or NULL to
|
||||
use the built-in default
|
||||
|
||||
The option bits are:
|
||||
|
||||
PCRE_ANCHORED Force pattern anchoring
|
||||
PCRE_CASELESS Do caseless matching
|
||||
PCRE_DOLLAR_ENDONLY $ not to match newline at end
|
||||
PCRE_DOTALL . matches anything including NL
|
||||
PCRE_EXTENDED Ignore whitespace and # comments
|
||||
PCRE_EXTRA PCRE extra features
|
||||
(not much use currently)
|
||||
PCRE_MULTILINE ^ and $ match newlines within data
|
||||
PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
|
||||
theses (named ones available)
|
||||
PCRE_UNGREEDY Invert greediness of quantifiers
|
||||
PCRE_UTF8 Run in UTF-8 mode
|
||||
PCRE_NO_UTF8_CHECK Do not check the pattern for UTF-8
|
||||
validity (only relevant if
|
||||
PCRE_UTF8 is set)
|
||||
|
||||
PCRE must be compiled with UTF-8 support in order to use PCRE_UTF8
|
||||
(or PCRE_NO_UTF8_CHECK).
|
||||
|
||||
The yield of the function is a pointer to a private data structure that
|
||||
contains the compiled pattern, or NULL if an error was detected.
|
||||
|
||||
There is a complete description of the PCRE API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fR
|
||||
.\"
|
||||
page.
|
@ -1,45 +0,0 @@
|
||||
.TH PCRE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_config(int \fIwhat\fR, void *\fIwhere\fR);
|
||||
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function makes it possible for a client program to find out which optional
|
||||
features are available in the version of the PCRE library it is using. Its
|
||||
arguments are as follows:
|
||||
|
||||
\fIwhat\fR A code specifying what information is required
|
||||
\fIwhere\fR Points to where to put the data
|
||||
|
||||
The available codes are:
|
||||
|
||||
PCRE_CONFIG_LINK_SIZE Internal link size: 2, 3, or 4
|
||||
PCRE_CONFIG_MATCH_LIMIT Internal resource limit
|
||||
PCRE_CONFIG_NEWLINE Value of the newline character
|
||||
PCRE_CONFIG_POSIX_MALLOC_THRESHOLD
|
||||
Threshold of return slots, above
|
||||
which \fBmalloc()\fR is used by
|
||||
the POSIX API
|
||||
PCRE_CONFIG_STACKRECURSE Recursion implementation (1=stack 0=heap)
|
||||
PCRE_CONFIG_UTF8 Availability of UTF-8 support (1=yes 0=no)
|
||||
|
||||
The function yields 0 on success or PCRE_ERROR_BADOPTION otherwise.
|
||||
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fR
|
||||
.\"
|
||||
page, and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fR
|
||||
.\"
|
||||
page.
|
@ -1,40 +0,0 @@
|
||||
.TH PCRE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_copy_named_substring(const pcre *\fIcode\fR,
|
||||
.ti +5n
|
||||
.B const char *\fIsubject\fR, int *\fIovector\fR,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fR, const char *\fIstringname\fR,
|
||||
.ti +5n
|
||||
.B char *\fIbuffer\fR, int \fIbuffersize\fR);
|
||||
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This is a convenience function for extracting a captured substring, identified
|
||||
by name, into a given buffer. The arguments are:
|
||||
|
||||
\fIcode\fR Pattern that was successfully matched
|
||||
\fIsubject\fR Subject that has been successfully matched
|
||||
\fIovector\fR Offset vector that \fBpcre_exec()\fR used
|
||||
\fIstringcount\fR Value returned by \fBpcre_exec()\fR
|
||||
\fIstringname\fR Name of the required substring
|
||||
\fIbuffer\fR Buffer to receive the string
|
||||
\fIbuffersize\fR Size of buffer
|
||||
|
||||
The yield is the length of the substring, PCRE_ERROR_NOMEMORY if the buffer was
|
||||
too small, or PCRE_ERROR_NOSUBSTRING if the string name is invalid.
|
||||
|
||||
There is a complete description of the PCRE API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fR
|
||||
.\"
|
||||
page.
|
@ -1,37 +0,0 @@
|
||||
.TH PCRE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_copy_substring(const char *\fIsubject\fR, int *\fIovector\fR,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fR, int \fIstringnumber\fR, char *\fIbuffer\fR,
|
||||
.ti +5n
|
||||
.B int \fIbuffersize\fR);
|
||||
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This is a convenience function for extracting a captured substring into a given
|
||||
buffer. The arguments are:
|
||||
|
||||
\fIsubject\fR Subject that has been successfully matched
|
||||
\fIovector\fR Offset vector that \fBpcre_exec()\fR used
|
||||
\fIstringcount\fR Value returned by \fBpcre_exec()\fR
|
||||
\fIstringnumber\fR Number of the required substring
|
||||
\fIbuffer\fR Buffer to receive the string
|
||||
\fIbuffersize\fR Size of buffer
|
||||
|
||||
The yield is the legnth of the string, PCRE_ERROR_NOMEMORY if the buffer was
|
||||
too small, or PCRE_ERROR_NOSUBSTRING if the string number is invalid.
|
||||
|
||||
There is a complete description of the PCRE API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fR
|
||||
.\"
|
||||
page.
|
@ -1,48 +0,0 @@
|
||||
.TH PCRE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_exec(const pcre *\fIcode\fR, "const pcre_extra *\fIextra\fR,"
|
||||
.ti +5n
|
||||
.B "const char *\fIsubject\fR," int \fIlength\fR, int \fIstartoffset\fR,
|
||||
.ti +5n
|
||||
.B int \fIoptions\fR, int *\fIovector\fR, int \fIovecsize\fR);
|
||||
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function matches a compiled regular expression against a given subject
|
||||
string, and returns offsets to capturing subexpressions. Its arguments are:
|
||||
|
||||
\fIcode\fR Points to the compiled pattern
|
||||
\fIextra\fR Points to an associated \fBpcre_extra\fR structure,
|
||||
or is NULL
|
||||
\fIsubject\fR Points to the subject string
|
||||
\fIlength\fR Length of the subject string, in bytes
|
||||
\fIstartoffset\fR Offset in bytes in the subject at which to
|
||||
start matching
|
||||
\fIoptions\fR Option bits
|
||||
\fIovector\fR Points to a vector of ints for result offsets
|
||||
\fIovecsize\fR Size of the vector (a multiple of 3)
|
||||
|
||||
The options are:
|
||||
|
||||
PCRE_ANCHORED Match only at the first position
|
||||
PCRE_NOTBOL Subject is not the beginning of a line
|
||||
PCRE_NOTEOL Subject is not the end of a line
|
||||
PCRE_NOTEMPTY An empty string is not a valid match
|
||||
PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
|
||||
validity (only relevant if PCRE_UTF8
|
||||
was set at compile time)
|
||||
|
||||
There is a complete description of the PCRE API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fR
|
||||
.\"
|
||||
page.
|
@ -1,24 +0,0 @@
|
||||
.TH PCRE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B void pcre_free_substring(const char *\fIstringptr\fR);
|
||||
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This is a convenience function for freeing the store obtained by a previous
|
||||
call to \fBpcre_get_substring()\fR or \fBpcre_get_named_substring()\fR. Its
|
||||
only argument is a pointer to the string.
|
||||
|
||||
There is a complete description of the PCRE API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fR
|
||||
.\"
|
||||
page.
|
@ -1,24 +0,0 @@
|
||||
.TH PCRE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B void pcre_free_substring_list(const char **\fIstringptr\fR);
|
||||
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This is a convenience function for freeing the store obtained by a previous
|
||||
call to \fBpcre_get_substring_list()\fR. Its only argument is a pointer to the
|
||||
list of string pointers.
|
||||
|
||||
There is a complete description of the PCRE API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fR
|
||||
.\"
|
||||
page.
|
@ -1,53 +0,0 @@
|
||||
.TH PCRE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_fullinfo(const pcre *\fIcode\fR, "const pcre_extra *\fIextra\fR,"
|
||||
.ti +5n
|
||||
.B int \fIwhat\fR, void *\fIwhere\fR);
|
||||
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function returns information about a compiled pattern. Its arguments are:
|
||||
|
||||
\fIcode\fR Compiled regular expression
|
||||
\fIextra\fR Result of \fBpcre_study()\fR or NULL
|
||||
\fIwhat\fR What information is required
|
||||
\fIwhere\fR Where to put the information
|
||||
|
||||
The following information is available:
|
||||
|
||||
PCRE_INFO_BACKREFMAX Number of highest back reference
|
||||
PCRE_INFO_CAPTURECOUNT Number of capturing subpatterns
|
||||
PCRE_INFO_FIRSTBYTE Fixed first byte for a match, or
|
||||
-1 for start of string
|
||||
or after newline, or
|
||||
-2 otherwise
|
||||
PCRE_INFO_FIRSTTABLE Table of first bytes
|
||||
(after studying)
|
||||
PCRE_INFO_LASTLITERAL Literal last byte required
|
||||
PCRE_INFO_NAMECOUNT Number of named subpatterns
|
||||
PCRE_INFO_NAMEENTRYSIZE Size of name table entry
|
||||
PCRE_INFO_NAMETABLE Pointer to name table
|
||||
PCRE_INFO_OPTIONS Options used for compilation
|
||||
PCRE_INFO_SIZE Size of compiled pattern
|
||||
|
||||
The yield of the function is zero on success or:
|
||||
|
||||
PCRE_ERROR_NULL the argument \fIcode\fR was NULL
|
||||
the argument \fIwhere\fR was NULL
|
||||
PCRE_ERROR_BADMAGIC the "magic number" was not found
|
||||
PCRE_ERROR_BADOPTION the value of \fIwhat\fR was invalid
|
||||
|
||||
There is a complete description of the PCRE API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fR
|
||||
.\"
|
||||
page.
|
@ -1,40 +0,0 @@
|
||||
.TH PCRE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_get_named_substring(const pcre *\fIcode\fR,
|
||||
.ti +5n
|
||||
.B const char *\fIsubject\fR, int *\fIovector\fR,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fR, const char *\fIstringname\fR,
|
||||
.ti +5n
|
||||
.B const char **\fIstringptr\fR);
|
||||
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This is a convenience function for extracting a captured substring by name. The
|
||||
arguments are:
|
||||
|
||||
\fIcode\fR Compiled pattern
|
||||
\fIsubject\fR Subject that has been successfully matched
|
||||
\fIovector\fR Offset vector that \fBpcre_exec()\fR used
|
||||
\fIstringcount\fR Value returned by \fBpcre_exec()\fR
|
||||
\fIstringname\fR Name of the required substring
|
||||
\fIstringptr\fR Where to put the string pointer
|
||||
|
||||
The yield is the length of the extracted substring, PCRE_ERROR_NOMEMORY if
|
||||
sufficient memory could not be obtained, or PCRE_ERROR_NOSUBSTRING if the
|
||||
string name is invalid.
|
||||
|
||||
There is a complete description of the PCRE API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fR
|
||||
.\"
|
||||
page.
|
@ -1,31 +0,0 @@
|
||||
.TH PCRE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_get_stringnumber(const pcre *\fIcode\fR,
|
||||
.ti +5n
|
||||
.B const char *\fIname\fR);
|
||||
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This convenience function finds the number of a named substring capturing
|
||||
parenthesis in a compiled pattern. Its arguments are:
|
||||
|
||||
\fIcode\fR Compiled regular expression
|
||||
\fIname\fR Name whose number is required
|
||||
|
||||
The yield of the function is the number of the parenthesis if the name is
|
||||
found, or PCRE_ERROR_NOSUBSTRING otherwise.
|
||||
|
||||
There is a complete description of the PCRE API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fR
|
||||
.\"
|
||||
page.
|
@ -1,37 +0,0 @@
|
||||
.TH PCRE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_get_substring(const char *\fIsubject\fR, int *\fIovector\fR,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fR, int \fIstringnumber\fR,
|
||||
.ti +5n
|
||||
.B const char **\fIstringptr\fR);
|
||||
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This is a convenience function for extracting a captured substring. The
|
||||
arguments are:
|
||||
|
||||
\fIsubject\fR Subject that has been successfully matched
|
||||
\fIovector\fR Offset vector that \fBpcre_exec()\fR used
|
||||
\fIstringcount\fR Value returned by \fBpcre_exec()\fR
|
||||
\fIstringnumber\fR Number of the required substring
|
||||
\fIstringptr\fR Where to put the string pointer
|
||||
|
||||
The yield is the length of the substring, PCRE_ERROR_NOMEMORY if sufficient
|
||||
memory could not be obtained, or PCRE_ERROR_NOSUBSTRING if the string number is
|
||||
invalid.
|
||||
|
||||
There is a complete description of the PCRE API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fR
|
||||
.\"
|
||||
page.
|
@ -1,33 +0,0 @@
|
||||
.TH PCRE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_get_substring_list(const char *\fIsubject\fR,
|
||||
.ti +5n
|
||||
.B int *\fIovector\fR, int \fIstringcount\fR, "const char ***\fIlistptr\fR);"
|
||||
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This is a convenience function for extracting a list of all the captured
|
||||
substrings. The arguments are:
|
||||
|
||||
\fIsubject\fR Subject that has been successfully matched
|
||||
\fIovector\fR Offset vector that \fBpcre_exec\fR used
|
||||
\fIstringcount\fR Value returned by \fBpcre_exec\fR
|
||||
\fIlistptr\fR Where to put a pointer to the list
|
||||
|
||||
The yield is zero on success or PCRE_ERROR_NOMEMORY if sufficient memory could
|
||||
not be obtained.
|
||||
|
||||
There is a complete description of the PCRE API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fR
|
||||
.\"
|
||||
page.
|
@ -1,23 +0,0 @@
|
||||
.TH PCRE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_info(const pcre *\fIcode\fR, int *\fIoptptr\fR, int
|
||||
.B *\fIfirstcharptr\fR);
|
||||
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function is obsolete. You should be using \fBpcre_fullinfo()\fR instead.
|
||||
|
||||
There is a complete description of the PCRE API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fR
|
||||
.\"
|
||||
page.
|
@ -1,26 +0,0 @@
|
||||
.TH PCRE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B const unsigned char *pcre_maketables(void);
|
||||
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function builds a set of character tables which can be passed to
|
||||
\fBpcre_compile()\fR to override PCRE's internal, built-in tables (which were
|
||||
made by \fBpcre_maketables()\fR when PCRE was compiled). You might want to do
|
||||
this if you are using a non-standard locale. The function yields a pointer to
|
||||
the tables.
|
||||
|
||||
There is a complete description of the PCRE API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fR
|
||||
.\"
|
||||
page.
|
@ -1,36 +0,0 @@
|
||||
.TH PCRE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B pcre_extra *pcre_study(const pcre *\fIcode\fR, int \fIoptions\fR,
|
||||
.ti +5n
|
||||
.B const char **\fIerrptr\fR);
|
||||
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function studies a compiled pattern, to see if additional information can
|
||||
be extracted that might speed up matching. Its arguments are:
|
||||
|
||||
\fIcode\fR A compiled regular expression
|
||||
\fIoptions\fR Options for \fBpcre_study()\fR
|
||||
\fIerrptr\fR Where to put an error message
|
||||
|
||||
If the function returns NULL, either it could not find any additional
|
||||
information, or there was an error. You can tell the difference by looking at
|
||||
the error value. It is NULL in first case.
|
||||
|
||||
There are currently no options defined; the value of the second argument should
|
||||
always be zero.
|
||||
|
||||
There is a complete description of the PCRE API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fR
|
||||
.\"
|
||||
page.
|
@ -1,23 +0,0 @@
|
||||
.TH PCRE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B char *pcre_version(void);
|
||||
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function returns a character string that gives the version number of the
|
||||
PCRE library, and its date of release.
|
||||
|
||||
There is a complete description of the PCRE API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fR
|
||||
.\"
|
||||
page.
|
File diff suppressed because it is too large
Load Diff
@ -1,145 +0,0 @@
|
||||
.TH PCRE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH PCRE BUILD-TIME OPTIONS
|
||||
.rs
|
||||
.sp
|
||||
This document describes the optional features of PCRE that can be selected when
|
||||
the library is compiled. They are all selected, or deselected, by providing
|
||||
options to the \fBconfigure\fR script which is run before the \fBmake\fR
|
||||
command. The complete list of options for \fBconfigure\fR (which includes the
|
||||
standard ones such as the selection of the installation directory) can be
|
||||
obtained by running
|
||||
|
||||
./configure --help
|
||||
|
||||
The following sections describe certain options whose names begin with --enable
|
||||
or --disable. These settings specify changes to the defaults for the
|
||||
\fBconfigure\fR command. Because of the way that \fBconfigure\fR works,
|
||||
--enable and --disable always come in pairs, so the complementary option always
|
||||
exists as well, but as it specifies the default, it is not described.
|
||||
|
||||
.SH UTF-8 SUPPORT
|
||||
.rs
|
||||
.sp
|
||||
To build PCRE with support for UTF-8 character strings, add
|
||||
|
||||
--enable-utf8
|
||||
|
||||
to the \fBconfigure\fR command. Of itself, this does not make PCRE treat
|
||||
strings as UTF-8. As well as compiling PCRE with this option, you also have
|
||||
have to set the PCRE_UTF8 option when you call the \fBpcre_compile()\fR
|
||||
function.
|
||||
|
||||
.SH CODE VALUE OF NEWLINE
|
||||
.rs
|
||||
.sp
|
||||
By default, PCRE treats character 10 (linefeed) as the newline character. This
|
||||
is the normal newline character on Unix-like systems. You can compile PCRE to
|
||||
use character 13 (carriage return) instead by adding
|
||||
|
||||
--enable-newline-is-cr
|
||||
|
||||
to the \fBconfigure\fR command. For completeness there is also a
|
||||
--enable-newline-is-lf option, which explicitly specifies linefeed as the
|
||||
newline character.
|
||||
|
||||
.SH BUILDING SHARED AND STATIC LIBRARIES
|
||||
.rs
|
||||
.sp
|
||||
The PCRE building process uses \fBlibtool\fR to build both shared and static
|
||||
Unix libraries by default. You can suppress one of these by adding one of
|
||||
|
||||
--disable-shared
|
||||
--disable-static
|
||||
|
||||
to the \fBconfigure\fR command, as required.
|
||||
|
||||
.SH POSIX MALLOC USAGE
|
||||
.rs
|
||||
.sp
|
||||
When PCRE is called through the POSIX interface (see the \fBpcreposix\fR
|
||||
documentation), additional working storage is required for holding the pointers
|
||||
to capturing substrings because PCRE requires three integers per substring,
|
||||
whereas the POSIX interface provides only two. If the number of expected
|
||||
substrings is small, the wrapper function uses space on the stack, because this
|
||||
is faster than using \fBmalloc()\fR for each call. The default threshold above
|
||||
which the stack is no longer used is 10; it can be changed by adding a setting
|
||||
such as
|
||||
|
||||
--with-posix-malloc-threshold=20
|
||||
|
||||
to the \fBconfigure\fR command.
|
||||
|
||||
.SH LIMITING PCRE RESOURCE USAGE
|
||||
.rs
|
||||
.sp
|
||||
Internally, PCRE has a function called \fBmatch()\fR which it calls repeatedly
|
||||
(possibly recursively) when performing a matching operation. By limiting the
|
||||
number of times this function may be called, a limit can be placed on the
|
||||
resources used by a single call to \fBpcre_exec()\fR. The limit can be changed
|
||||
at run time, as described in the \fBpcreapi\fR documentation. The default is 10
|
||||
million, but this can be changed by adding a setting such as
|
||||
|
||||
--with-match-limit=500000
|
||||
|
||||
to the \fBconfigure\fR command.
|
||||
|
||||
.SH HANDLING VERY LARGE PATTERNS
|
||||
.rs
|
||||
.sp
|
||||
Within a compiled pattern, offset values are used to point from one part to
|
||||
another (for example, from an opening parenthesis to an alternation
|
||||
metacharacter). By default two-byte values are used for these offsets, leading
|
||||
to a maximum size for a compiled pattern of around 64K. This is sufficient to
|
||||
handle all but the most gigantic patterns. Nevertheless, some people do want to
|
||||
process enormous patterns, so it is possible to compile PCRE to use three-byte
|
||||
or four-byte offsets by adding a setting such as
|
||||
|
||||
--with-link-size=3
|
||||
|
||||
to the \fBconfigure\fR command. The value given must be 2, 3, or 4. Using
|
||||
longer offsets slows down the operation of PCRE because it has to load
|
||||
additional bytes when handling them.
|
||||
|
||||
If you build PCRE with an increased link size, test 2 (and test 5 if you are
|
||||
using UTF-8) will fail. Part of the output of these tests is a representation
|
||||
of the compiled pattern, and this changes with the link size.
|
||||
|
||||
.SH AVOIDING EXCESSIVE STACK USAGE
|
||||
.rs
|
||||
.sp
|
||||
PCRE implements backtracking while matching by making recursive calls to an
|
||||
internal function called \fBmatch()\fR. In environments where the size of the
|
||||
stack is limited, this can severely limit PCRE's operation. (The Unix
|
||||
environment does not usually suffer from this problem.) An alternative approach
|
||||
that uses memory from the heap to remember data, instead of using recursive
|
||||
function calls, has been implemented to work round this problem. If you want to
|
||||
build a version of PCRE that works this way, add
|
||||
|
||||
--disable-stack-for-recursion
|
||||
|
||||
to the \fBconfigure\fR command. With this configuration, PCRE will use the
|
||||
\fBpcre_stack_malloc\fR and \fBpcre_stack_free\fR variables to call memory
|
||||
management functions. Separate functions are provided because the usage is very
|
||||
predictable: the block sizes requested are always the same, and the blocks are
|
||||
always freed in reverse order. A calling program might be able to implement
|
||||
optimized functions that perform better than the standard \fBmalloc()\fR and
|
||||
\fBfree()\fR functions. PCRE runs noticeably more slowly when built in this
|
||||
way.
|
||||
|
||||
.SH USING EBCDIC CODE
|
||||
.rs
|
||||
.sp
|
||||
PCRE assumes by default that it will run in an environment where the character
|
||||
code is ASCII (or UTF-8, which is a superset of ASCII). PCRE can, however, be
|
||||
compiled to run in an EBCDIC environment by adding
|
||||
|
||||
--enable-ebcdic
|
||||
|
||||
to the \fBconfigure\fR command.
|
||||
|
||||
.in 0
|
||||
Last updated: 09 December 2003
|
||||
.br
|
||||
Copyright (c) 1997-2003 University of Cambridge.
|
@ -1,92 +0,0 @@
|
||||
.TH PCRE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH PCRE CALLOUTS
|
||||
.rs
|
||||
.sp
|
||||
.B int (*pcre_callout)(pcre_callout_block *);
|
||||
.PP
|
||||
PCRE provides a feature called "callout", which is a means of temporarily
|
||||
passing control to the caller of PCRE in the middle of pattern matching. The
|
||||
caller of PCRE provides an external function by putting its entry point in the
|
||||
global variable \fIpcre_callout\fR. By default, this variable contains NULL,
|
||||
which disables all calling out.
|
||||
|
||||
Within a regular expression, (?C) indicates the points at which the external
|
||||
function is to be called. Different callout points can be identified by putting
|
||||
a number less than 256 after the letter C. The default value is zero.
|
||||
For example, this pattern has two callout points:
|
||||
|
||||
(?C1)\dabc(?C2)def
|
||||
|
||||
During matching, when PCRE reaches a callout point (and \fIpcre_callout\fR is
|
||||
set), the external function is called. Its only argument is a pointer to a
|
||||
\fBpcre_callout\fR block. This contains the following variables:
|
||||
|
||||
int \fIversion\fR;
|
||||
int \fIcallout_number\fR;
|
||||
int *\fIoffset_vector\fR;
|
||||
const char *\fIsubject\fR;
|
||||
int \fIsubject_length\fR;
|
||||
int \fIstart_match\fR;
|
||||
int \fIcurrent_position\fR;
|
||||
int \fIcapture_top\fR;
|
||||
int \fIcapture_last\fR;
|
||||
void *\fIcallout_data\fR;
|
||||
|
||||
The \fIversion\fR field is an integer containing the version number of the
|
||||
block format. The current version is zero. The version number may change in
|
||||
future if additional fields are added, but the intention is never to remove any
|
||||
of the existing fields.
|
||||
|
||||
The \fIcallout_number\fR field contains the number of the callout, as compiled
|
||||
into the pattern (that is, the number after ?C).
|
||||
|
||||
The \fIoffset_vector\fR field is a pointer to the vector of offsets that was
|
||||
passed by the caller to \fBpcre_exec()\fR. The contents can be inspected in
|
||||
order to extract substrings that have been matched so far, in the same way as
|
||||
for extracting substrings after a match has completed.
|
||||
|
||||
The \fIsubject\fR and \fIsubject_length\fR fields contain copies the values
|
||||
that were passed to \fBpcre_exec()\fR.
|
||||
|
||||
The \fIstart_match\fR field contains the offset within the subject at which the
|
||||
current match attempt started. If the pattern is not anchored, the callout
|
||||
function may be called several times for different starting points.
|
||||
|
||||
The \fIcurrent_position\fR field contains the offset within the subject of the
|
||||
current match pointer.
|
||||
|
||||
The \fIcapture_top\fR field contains one more than the number of the highest
|
||||
numbered captured substring so far. If no substrings have been captured,
|
||||
the value of \fIcapture_top\fR is one.
|
||||
|
||||
The \fIcapture_last\fR field contains the number of the most recently captured
|
||||
substring.
|
||||
|
||||
The \fIcallout_data\fR field contains a value that is passed to
|
||||
\fBpcre_exec()\fR by the caller specifically so that it can be passed back in
|
||||
callouts. It is passed in the \fIpcre_callout\fR field of the \fBpcre_extra\fR
|
||||
data structure. If no such data was passed, the value of \fIcallout_data\fR in
|
||||
a \fBpcre_callout\fR block is NULL. There is a description of the
|
||||
\fBpcre_extra\fR structure in the \fBpcreapi\fR documentation.
|
||||
|
||||
|
||||
.SH RETURN VALUES
|
||||
.rs
|
||||
.sp
|
||||
The callout function returns an integer. If the value is zero, matching
|
||||
proceeds as normal. If the value is greater than zero, matching fails at the
|
||||
current point, but backtracking to test other possibilities goes ahead, just as
|
||||
if a lookahead assertion had failed. If the value is less than zero, the match
|
||||
is abandoned, and \fBpcre_exec()\fR returns the value.
|
||||
|
||||
Negative values should normally be chosen from the set of PCRE_ERROR_xxx
|
||||
values. In particular, PCRE_ERROR_NOMATCH forces a standard "no match" failure.
|
||||
The error number PCRE_ERROR_CALLOUT is reserved for use by callout functions;
|
||||
it will never be used by PCRE itself.
|
||||
|
||||
.in 0
|
||||
Last updated: 21 January 2003
|
||||
.br
|
||||
Copyright (c) 1997-2003 University of Cambridge.
|
@ -1,107 +0,0 @@
|
||||
.TH PCRE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH DIFFERENCES FROM PERL
|
||||
.rs
|
||||
.sp
|
||||
This document describes the differences in the ways that PCRE and Perl handle
|
||||
regular expressions. The differences described here are with respect to Perl
|
||||
5.8.
|
||||
|
||||
1. PCRE does not have full UTF-8 support. Details of what it does have are
|
||||
given in the
|
||||
.\" HTML <a href="pcre.html#utf8support">
|
||||
.\" </a>
|
||||
section on UTF-8 support
|
||||
.\"
|
||||
in the main
|
||||
.\" HREF
|
||||
\fBpcre\fR
|
||||
.\"
|
||||
page.
|
||||
|
||||
2. PCRE does not allow repeat quantifiers on lookahead assertions. Perl permits
|
||||
them, but they do not mean what you might think. For example, (?!a){3} does
|
||||
not assert that the next three characters are not "a". It just asserts that the
|
||||
next character is not "a" three times.
|
||||
|
||||
3. Capturing subpatterns that occur inside negative lookahead assertions are
|
||||
counted, but their entries in the offsets vector are never set. Perl sets its
|
||||
numerical variables from any such patterns that are matched before the
|
||||
assertion fails to match something (thereby succeeding), but only if the
|
||||
negative lookahead assertion contains just one branch.
|
||||
|
||||
4. Though binary zero characters are supported in the subject string, they are
|
||||
not allowed in a pattern string because it is passed as a normal C string,
|
||||
terminated by zero. The escape sequence "\\0" can be used in the pattern to
|
||||
represent a binary zero.
|
||||
|
||||
5. The following Perl escape sequences are not supported: \\l, \\u, \\L,
|
||||
\\U, \\P, \\p, \\N, and \\X. In fact these are implemented by Perl's general
|
||||
string-handling and are not part of its pattern matching engine. If any of
|
||||
these are encountered by PCRE, an error is generated.
|
||||
|
||||
6. PCRE does support the \\Q...\\E escape for quoting substrings. Characters in
|
||||
between are treated as literals. This is slightly different from Perl in that $
|
||||
and @ are also handled as literals inside the quotes. In Perl, they cause
|
||||
variable interpolation (but of course PCRE does not have variables). Note the
|
||||
following examples:
|
||||
|
||||
Pattern PCRE matches Perl matches
|
||||
|
||||
\\Qabc$xyz\\E abc$xyz abc followed by the
|
||||
contents of $xyz
|
||||
\\Qabc\\$xyz\\E abc\\$xyz abc\\$xyz
|
||||
\\Qabc\\E\\$\\Qxyz\\E abc$xyz abc$xyz
|
||||
|
||||
The \\Q...\\E sequence is recognized both inside and outside character classes.
|
||||
|
||||
7. Fairly obviously, PCRE does not support the (?{code}) and (?p{code})
|
||||
constructions. However, there is some experimental support for recursive
|
||||
patterns using the non-Perl items (?R), (?number) and (?P>name). Also, the PCRE
|
||||
"callout" feature allows an external function to be called during pattern
|
||||
matching.
|
||||
|
||||
8. There are some differences that are concerned with the settings of captured
|
||||
strings when part of a pattern is repeated. For example, matching "aba" against
|
||||
the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b".
|
||||
|
||||
9. PCRE provides some extensions to the Perl regular expression facilities:
|
||||
|
||||
(a) Although lookbehind assertions must match fixed length strings, each
|
||||
alternative branch of a lookbehind assertion can match a different length of
|
||||
string. Perl requires them all to have the same length.
|
||||
|
||||
(b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $
|
||||
meta-character matches only at the very end of the string.
|
||||
|
||||
(c) If PCRE_EXTRA is set, a backslash followed by a letter with no special
|
||||
meaning is faulted.
|
||||
|
||||
(d) If PCRE_UNGREEDY is set, the greediness of the repetition quantifiers is
|
||||
inverted, that is, by default they are not greedy, but if followed by a
|
||||
question mark they are.
|
||||
|
||||
(e) PCRE_ANCHORED can be used to force a pattern to be tried only at the first
|
||||
matching position in the subject string.
|
||||
|
||||
(f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NO_AUTO_CAPTURE
|
||||
options for \fBpcre_exec()\fR have no Perl equivalents.
|
||||
|
||||
(g) The (?R), (?number), and (?P>name) constructs allows for recursive pattern
|
||||
matching (Perl can do this using the (?p{code}) construct, which PCRE cannot
|
||||
support.)
|
||||
|
||||
(h) PCRE supports named capturing substrings, using the Python syntax.
|
||||
|
||||
(i) PCRE supports the possessive quantifier "++" syntax, taken from Sun's Java
|
||||
package.
|
||||
|
||||
(j) The (R) condition, for testing recursion, is a PCRE extension.
|
||||
|
||||
(k) The callout facility is PCRE-specific.
|
||||
|
||||
.in 0
|
||||
Last updated: 09 December 2003
|
||||
.br
|
||||
Copyright (c) 1997-2003 University of Cambridge.
|
@ -1,130 +0,0 @@
|
||||
.TH PCREGREP 1
|
||||
.SH NAME
|
||||
pcregrep - a grep with Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS
|
||||
.B pcregrep [-Vcfhilnrsuvx] [long options] [pattern] [file1 file2 ...]
|
||||
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
\fBpcregrep\fR searches files for character patterns, in the same way as other
|
||||
grep commands do, but it uses the PCRE regular expression library to support
|
||||
patterns that are compatible with the regular expressions of Perl 5. See
|
||||
.\" HREF
|
||||
\fBpcrepattern\fR
|
||||
.\"
|
||||
for a full description of syntax and semantics of the regular expressions that
|
||||
PCRE supports.
|
||||
|
||||
A pattern must be specified on the command line unless the \fB-f\fR option is
|
||||
used (see below).
|
||||
|
||||
If no files are specified, \fBpcregrep\fR reads the standard input. By default,
|
||||
each line that matches the pattern is copied to the standard output, and if
|
||||
there is more than one file, the file name is printed before each line of
|
||||
output. However, there are options that can change how \fBpcregrep\fR behaves.
|
||||
|
||||
Lines are limited to BUFSIZ characters. BUFSIZ is defined in \fB<stdio.h>\fR.
|
||||
The newline character is removed from the end of each line before it is matched
|
||||
against the pattern.
|
||||
|
||||
.SH OPTIONS
|
||||
.rs
|
||||
.sp
|
||||
.TP 10
|
||||
\fB-V\fR
|
||||
Write the version number of the PCRE library being used to the standard error
|
||||
stream.
|
||||
.TP
|
||||
\fB-c\fR
|
||||
Do not print individual lines; instead just print a count of the number of
|
||||
lines that would otherwise have been printed. If several files are given, a
|
||||
count is printed for each of them.
|
||||
.TP
|
||||
\fB-f\fR\fIfilename\fR
|
||||
Read a number of patterns from the file, one per line, and match all of them
|
||||
against each line of input. A line is output if any of the patterns match it.
|
||||
When \fB-f\fR is used, no pattern is taken from the command line; all arguments
|
||||
are treated as file names. There is a maximum of 100 patterns. Trailing white
|
||||
space is removed, and blank lines are ignored. An empty file contains no
|
||||
patterns and therefore matches nothing.
|
||||
.TP
|
||||
\fB-h\fR
|
||||
Suppress printing of filenames when searching multiple files.
|
||||
.TP
|
||||
\fB-i\fR
|
||||
Ignore upper/lower case distinctions during comparisons.
|
||||
.TP
|
||||
\fB-l\fR
|
||||
Instead of printing lines from the files, just print the names of the files
|
||||
containing lines that would have been printed. Each file name is printed
|
||||
once, on a separate line.
|
||||
.TP
|
||||
\fB-n\fR
|
||||
Precede each line by its line number in the file.
|
||||
.TP
|
||||
\fB-r\fR
|
||||
If any file is a directory, recursively scan the files it contains. Without
|
||||
\fB-r\fR a directory is scanned as a normal file.
|
||||
.TP
|
||||
\fB-s\fR
|
||||
Work silently, that is, display nothing except error messages.
|
||||
The exit status indicates whether any matches were found.
|
||||
.TP
|
||||
\fB-u\fR
|
||||
Operate in UTF-8 mode. This option is available only if PCRE has been compiled
|
||||
with UTF-8 support. Both the pattern and each subject line are assumed to be
|
||||
valid strings of UTF-8 characters.
|
||||
.TP
|
||||
\fB-v\fR
|
||||
Invert the sense of the match, so that lines which do \fInot\fR match the
|
||||
pattern are now the ones that are found.
|
||||
.TP
|
||||
\fB-x\fR
|
||||
Force the pattern to be anchored (it must start matching at the beginning of
|
||||
the line) and in addition, require it to match the entire line. This is
|
||||
equivalent to having ^ and $ characters at the start and end of each
|
||||
alternative branch in the regular expression.
|
||||
|
||||
.SH LONG OPTIONS
|
||||
.rs
|
||||
.sp
|
||||
Long forms of all the options are available, as in GNU grep. They are shown in
|
||||
the following table:
|
||||
|
||||
-c --count
|
||||
-h --no-filename
|
||||
-i --ignore-case
|
||||
-l --files-with-matches
|
||||
-n --line-number
|
||||
-r --recursive
|
||||
-s --no-messages
|
||||
-u --utf-8
|
||||
-V --version
|
||||
-v --invert-match
|
||||
-x --line-regex
|
||||
-x --line-regexp
|
||||
|
||||
In addition, --file=\fIfilename\fR is equivalent to -f\fIfilename\fR, and
|
||||
--help shows the list of options and then exits.
|
||||
|
||||
.SH DIAGNOSTICS
|
||||
.rs
|
||||
.sp
|
||||
Exit status is 0 if any matches were found, 1 if no matches were found, and 2
|
||||
for syntax errors or inacessible files (even if matches were found).
|
||||
|
||||
|
||||
.SH AUTHOR
|
||||
.rs
|
||||
.sp
|
||||
Philip Hazel <ph10@cam.ac.uk>
|
||||
.br
|
||||
University Computing Service
|
||||
.br
|
||||
Cambridge CB2 3QG, England.
|
||||
|
||||
.in 0
|
||||
Last updated: 03 February 2003
|
||||
.br
|
||||
Copyright (c) 1997-2003 University of Cambridge.
|
@ -1,124 +0,0 @@
|
||||
PCREGREP(1) PCREGREP(1)
|
||||
|
||||
|
||||
|
||||
NAME
|
||||
pcregrep - a grep with Perl-compatible regular expressions.
|
||||
|
||||
SYNOPSIS
|
||||
pcregrep [-Vcfhilnrsuvx] [long options] [pattern] [file1 file2 ...]
|
||||
|
||||
|
||||
DESCRIPTION
|
||||
|
||||
pcregrep searches files for character patterns, in the same way as
|
||||
other grep commands do, but it uses the PCRE regular expression library
|
||||
to support patterns that are compatible with the regular expressions of
|
||||
Perl 5. See pcrepattern for a full description of syntax and semantics
|
||||
of the regular expressions that PCRE supports.
|
||||
|
||||
A pattern must be specified on the command line unless the -f option is
|
||||
used (see below).
|
||||
|
||||
If no files are specified, pcregrep reads the standard input. By
|
||||
default, each line that matches the pattern is copied to the standard
|
||||
output, and if there is more than one file, the file name is printed
|
||||
before each line of output. However, there are options that can change
|
||||
how pcregrep behaves.
|
||||
|
||||
Lines are limited to BUFSIZ characters. BUFSIZ is defined in <stdio.h>.
|
||||
The newline character is removed from the end of each line before it is
|
||||
matched against the pattern.
|
||||
|
||||
|
||||
OPTIONS
|
||||
|
||||
|
||||
-V Write the version number of the PCRE library being used to
|
||||
the standard error stream.
|
||||
|
||||
-c Do not print individual lines; instead just print a count of
|
||||
the number of lines that would otherwise have been printed.
|
||||
If several files are given, a count is printed for each of
|
||||
them.
|
||||
|
||||
-ffilename
|
||||
Read a number of patterns from the file, one per line, and
|
||||
match all of them against each line of input. A line is out-
|
||||
put if any of the patterns match it. When -f is used, no
|
||||
pattern is taken from the command line; all arguments are
|
||||
treated as file names. There is a maximum of 100 patterns.
|
||||
Trailing white space is removed, and blank lines are ignored.
|
||||
An empty file contains no patterns and therefore matches
|
||||
nothing.
|
||||
|
||||
-h Suppress printing of filenames when searching multiple files.
|
||||
|
||||
-i Ignore upper/lower case distinctions during comparisons.
|
||||
|
||||
-l Instead of printing lines from the files, just print the
|
||||
names of the files containing lines that would have been
|
||||
printed. Each file name is printed once, on a separate line.
|
||||
|
||||
-n Precede each line by its line number in the file.
|
||||
|
||||
-r If any file is a directory, recursively scan the files it
|
||||
contains. Without -r a directory is scanned as a normal file.
|
||||
|
||||
-s Work silently, that is, display nothing except error mes-
|
||||
sages. The exit status indicates whether any matches were
|
||||
found.
|
||||
|
||||
-u Operate in UTF-8 mode. This option is available only if PCRE
|
||||
has been compiled with UTF-8 support. Both the pattern and
|
||||
each subject line are assumed to be valid strings of UTF-8
|
||||
characters.
|
||||
|
||||
-v Invert the sense of the match, so that lines which do not
|
||||
match the pattern are now the ones that are found.
|
||||
|
||||
-x Force the pattern to be anchored (it must start matching at
|
||||
the beginning of the line) and in addition, require it to
|
||||
match the entire line. This is equivalent to having ^ and $
|
||||
characters at the start and end of each alternative branch in
|
||||
the regular expression.
|
||||
|
||||
|
||||
LONG OPTIONS
|
||||
|
||||
Long forms of all the options are available, as in GNU grep. They are
|
||||
shown in the following table:
|
||||
|
||||
-c --count
|
||||
-h --no-filename
|
||||
-i --ignore-case
|
||||
-l --files-with-matches
|
||||
-n --line-number
|
||||
-r --recursive
|
||||
-s --no-messages
|
||||
-u --utf-8
|
||||
-V --version
|
||||
-v --invert-match
|
||||
-x --line-regex
|
||||
-x --line-regexp
|
||||
|
||||
In addition, --file=filename is equivalent to -ffilename, and --help
|
||||
shows the list of options and then exits.
|
||||
|
||||
|
||||
DIAGNOSTICS
|
||||
|
||||
Exit status is 0 if any matches were found, 1 if no matches were found,
|
||||
and 2 for syntax errors or inacessible files (even if matches were
|
||||
found).
|
||||
|
||||
|
||||
|
||||
AUTHOR
|
||||
|
||||
Philip Hazel <ph10@cam.ac.uk>
|
||||
University Computing Service
|
||||
Cambridge CB2 3QG, England.
|
||||
|
||||
Last updated: 03 February 2003
|
||||
Copyright (c) 1997-2003 University of Cambridge.
|
File diff suppressed because it is too large
Load Diff
@ -1,66 +0,0 @@
|
||||
.TH PCRE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH PCRE PERFORMANCE
|
||||
.rs
|
||||
.sp
|
||||
Certain items that may appear in regular expression patterns are more efficient
|
||||
than others. It is more efficient to use a character class like [aeiou] than a
|
||||
set of alternatives such as (a|e|i|o|u). In general, the simplest construction
|
||||
that provides the required behaviour is usually the most efficient. Jeffrey
|
||||
Friedl's book contains a lot of discussion about optimizing regular expressions
|
||||
for efficient performance.
|
||||
|
||||
When a pattern begins with .* not in parentheses, or in parentheses that are
|
||||
not the subject of a backreference, and the PCRE_DOTALL option is set, the
|
||||
pattern is implicitly anchored by PCRE, since it can match only at the start of
|
||||
a subject string. However, if PCRE_DOTALL is not set, PCRE cannot make this
|
||||
optimization, because the . metacharacter does not then match a newline, and if
|
||||
the subject string contains newlines, the pattern may match from the character
|
||||
immediately following one of them instead of from the very start. For example,
|
||||
the pattern
|
||||
|
||||
.*second
|
||||
|
||||
matches the subject "first\\nand second" (where \\n stands for a newline
|
||||
character), with the match starting at the seventh character. In order to do
|
||||
this, PCRE has to retry the match starting after every newline in the subject.
|
||||
|
||||
If you are using such a pattern with subject strings that do not contain
|
||||
newlines, the best performance is obtained by setting PCRE_DOTALL, or starting
|
||||
the pattern with ^.* to indicate explicit anchoring. That saves PCRE from
|
||||
having to scan along the subject looking for a newline to restart at.
|
||||
|
||||
Beware of patterns that contain nested indefinite repeats. These can take a
|
||||
long time to run when applied to a string that does not match. Consider the
|
||||
pattern fragment
|
||||
|
||||
(a+)*
|
||||
|
||||
This can match "aaaa" in 33 different ways, and this number increases very
|
||||
rapidly as the string gets longer. (The * repeat can match 0, 1, 2, 3, or 4
|
||||
times, and for each of those cases other than 0, the + repeats can match
|
||||
different numbers of times.) When the remainder of the pattern is such that the
|
||||
entire match is going to fail, PCRE has in principle to try every possible
|
||||
variation, and this can take an extremely long time.
|
||||
|
||||
An optimization catches some of the more simple cases such as
|
||||
|
||||
(a+)*b
|
||||
|
||||
where a literal character follows. Before embarking on the standard matching
|
||||
procedure, PCRE checks that there is a "b" later in the subject string, and if
|
||||
there is not, it fails the match immediately. However, when there is no
|
||||
following literal this optimization cannot be used. You can see the difference
|
||||
by comparing the behaviour of
|
||||
|
||||
(a+)*\\d
|
||||
|
||||
with the pattern above. The former gives a failure almost instantly when
|
||||
applied to a whole line of "a" characters, whereas the latter takes an
|
||||
appreciable time with strings longer than about 20 characters.
|
||||
|
||||
.in 0
|
||||
Last updated: 03 February 2003
|
||||
.br
|
||||
Copyright (c) 1997-2003 University of Cambridge.
|
@ -1,194 +0,0 @@
|
||||
.TH PCRE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS OF POSIX API
|
||||
.B #include <pcreposix.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int regcomp(regex_t *\fIpreg\fR, const char *\fIpattern\fR,
|
||||
.ti +5n
|
||||
.B int \fIcflags\fR);
|
||||
.PP
|
||||
.br
|
||||
.B int regexec(regex_t *\fIpreg\fR, const char *\fIstring\fR,
|
||||
.ti +5n
|
||||
.B size_t \fInmatch\fR, regmatch_t \fIpmatch\fR[], int \fIeflags\fR);
|
||||
.PP
|
||||
.br
|
||||
.B size_t regerror(int \fIerrcode\fR, const regex_t *\fIpreg\fR,
|
||||
.ti +5n
|
||||
.B char *\fIerrbuf\fR, size_t \fIerrbuf_size\fR);
|
||||
.PP
|
||||
.br
|
||||
.B void regfree(regex_t *\fIpreg\fR);
|
||||
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This set of functions provides a POSIX-style API to the PCRE regular expression
|
||||
package. See the
|
||||
.\" HREF
|
||||
\fBpcreapi\fR
|
||||
.\"
|
||||
documentation for a description of the native API, which contains additional
|
||||
functionality.
|
||||
|
||||
The functions described here are just wrapper functions that ultimately call
|
||||
the PCRE native API. Their prototypes are defined in the \fBpcreposix.h\fR
|
||||
header file, and on Unix systems the library itself is called
|
||||
\fBpcreposix.a\fR, so can be accessed by adding \fB-lpcreposix\fR to the
|
||||
command for linking an application which uses them. Because the POSIX functions
|
||||
call the native ones, it is also necessary to add \fR-lpcre\fR.
|
||||
|
||||
I have implemented only those option bits that can be reasonably mapped to PCRE
|
||||
native options. In addition, the options REG_EXTENDED and REG_NOSUB are defined
|
||||
with the value zero. They have no effect, but since programs that are written
|
||||
to the POSIX interface often use them, this makes it easier to slot in PCRE as
|
||||
a replacement library. Other POSIX options are not even defined.
|
||||
|
||||
When PCRE is called via these functions, it is only the API that is POSIX-like
|
||||
in style. The syntax and semantics of the regular expressions themselves are
|
||||
still those of Perl, subject to the setting of various PCRE options, as
|
||||
described below. "POSIX-like in style" means that the API approximates to the
|
||||
POSIX definition; it is not fully POSIX-compatible, and in multi-byte encoding
|
||||
domains it is probably even less compatible.
|
||||
|
||||
The header for these functions is supplied as \fBpcreposix.h\fR to avoid any
|
||||
potential clash with other POSIX libraries. It can, of course, be renamed or
|
||||
aliased as \fBregex.h\fR, which is the "correct" name. It provides two
|
||||
structure types, \fIregex_t\fR for compiled internal forms, and
|
||||
\fIregmatch_t\fR for returning captured substrings. It also defines some
|
||||
constants whose names start with "REG_"; these are used for setting options and
|
||||
identifying error codes.
|
||||
|
||||
.SH COMPILING A PATTERN
|
||||
.rs
|
||||
.sp
|
||||
The function \fBregcomp()\fR is called to compile a pattern into an
|
||||
internal form. The pattern is a C string terminated by a binary zero, and
|
||||
is passed in the argument \fIpattern\fR. The \fIpreg\fR argument is a pointer
|
||||
to a regex_t structure which is used as a base for storing information about
|
||||
the compiled expression.
|
||||
|
||||
The argument \fIcflags\fR is either zero, or contains one or more of the bits
|
||||
defined by the following macros:
|
||||
|
||||
REG_ICASE
|
||||
|
||||
The PCRE_CASELESS option is set when the expression is passed for compilation
|
||||
to the native function.
|
||||
|
||||
REG_NEWLINE
|
||||
|
||||
The PCRE_MULTILINE option is set when the expression is passed for compilation
|
||||
to the native function. Note that this does \fInot\fR mimic the defined POSIX
|
||||
behaviour for REG_NEWLINE (see the following section).
|
||||
|
||||
In the absence of these flags, no options are passed to the native function.
|
||||
This means the the regex is compiled with PCRE default semantics. In
|
||||
particular, the way it handles newline characters in the subject string is the
|
||||
Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only
|
||||
\fIsome\fR of the effects specified for REG_NEWLINE. It does not affect the way
|
||||
newlines are matched by . (they aren't) or by a negative class such as [^a]
|
||||
(they are).
|
||||
|
||||
The yield of \fBregcomp()\fR is zero on success, and non-zero otherwise. The
|
||||
\fIpreg\fR structure is filled in on success, and one member of the structure
|
||||
is public: \fIre_nsub\fR contains the number of capturing subpatterns in
|
||||
the regular expression. Various error codes are defined in the header file.
|
||||
|
||||
.SH MATCHING NEWLINE CHARACTERS
|
||||
.rs
|
||||
.sp
|
||||
This area is not simple, because POSIX and Perl take different views of things.
|
||||
It is not possible to get PCRE to obey POSIX semantics, but then PCRE was never
|
||||
intended to be a POSIX engine. The following table lists the different
|
||||
possibilities for matching newline characters in PCRE:
|
||||
|
||||
Default Change with
|
||||
|
||||
. matches newline no PCRE_DOTALL
|
||||
newline matches [^a] yes not changeable
|
||||
$ matches \\n at end yes PCRE_DOLLARENDONLY
|
||||
$ matches \\n in middle no PCRE_MULTILINE
|
||||
^ matches \\n in middle no PCRE_MULTILINE
|
||||
|
||||
This is the equivalent table for POSIX:
|
||||
|
||||
Default Change with
|
||||
|
||||
. matches newline yes REG_NEWLINE
|
||||
newline matches [^a] yes REG_NEWLINE
|
||||
$ matches \\n at end no REG_NEWLINE
|
||||
$ matches \\n in middle no REG_NEWLINE
|
||||
^ matches \\n in middle no REG_NEWLINE
|
||||
|
||||
PCRE's behaviour is the same as Perl's, except that there is no equivalent for
|
||||
PCRE_DOLLARENDONLY in Perl. In both PCRE and Perl, there is no way to stop
|
||||
newline from matching [^a].
|
||||
|
||||
The default POSIX newline handling can be obtained by setting PCRE_DOTALL and
|
||||
PCRE_DOLLARENDONLY, but there is no way to make PCRE behave exactly as for the
|
||||
REG_NEWLINE action.
|
||||
|
||||
.SH MATCHING A PATTERN
|
||||
.rs
|
||||
.sp
|
||||
The function \fBregexec()\fR is called to match a pre-compiled pattern
|
||||
\fIpreg\fR against a given \fIstring\fR, which is terminated by a zero byte,
|
||||
subject to the options in \fIeflags\fR. These can be:
|
||||
|
||||
REG_NOTBOL
|
||||
|
||||
The PCRE_NOTBOL option is set when calling the underlying PCRE matching
|
||||
function.
|
||||
|
||||
REG_NOTEOL
|
||||
|
||||
The PCRE_NOTEOL option is set when calling the underlying PCRE matching
|
||||
function.
|
||||
|
||||
The portion of the string that was matched, and also any captured substrings,
|
||||
are returned via the \fIpmatch\fR argument, which points to an array of
|
||||
\fInmatch\fR structures of type \fIregmatch_t\fR, containing the members
|
||||
\fIrm_so\fR and \fIrm_eo\fR. These contain the offset to the first character of
|
||||
each substring and the offset to the first character after the end of each
|
||||
substring, respectively. The 0th element of the vector relates to the entire
|
||||
portion of \fIstring\fR that was matched; subsequent elements relate to the
|
||||
capturing subpatterns of the regular expression. Unused entries in the array
|
||||
have both structure members set to -1.
|
||||
|
||||
A successful match yields a zero return; various error codes are defined in the
|
||||
header file, of which REG_NOMATCH is the "expected" failure code.
|
||||
|
||||
.SH ERROR MESSAGES
|
||||
.rs
|
||||
.sp
|
||||
The \fBregerror()\fR function maps a non-zero errorcode from either
|
||||
\fBregcomp()\fR or \fBregexec()\fR to a printable message. If \fIpreg\fR is not
|
||||
NULL, the error should have arisen from the use of that structure. A message
|
||||
terminated by a binary zero is placed in \fIerrbuf\fR. The length of the
|
||||
message, including the zero, is limited to \fIerrbuf_size\fR. The yield of the
|
||||
function is the size of buffer needed to hold the whole message.
|
||||
|
||||
.SH STORAGE
|
||||
.rs
|
||||
.sp
|
||||
Compiling a regular expression causes memory to be allocated and associated
|
||||
with the \fIpreg\fR structure. The function \fBregfree()\fR frees all such
|
||||
memory, after which \fIpreg\fR may no longer be used as a compiled expression.
|
||||
|
||||
.SH AUTHOR
|
||||
.rs
|
||||
.sp
|
||||
Philip Hazel <ph10@cam.ac.uk>
|
||||
.br
|
||||
University Computing Service,
|
||||
.br
|
||||
Cambridge CB2 3QG, England.
|
||||
|
||||
.in 0
|
||||
Last updated: 03 February 2003
|
||||
.br
|
||||
Copyright (c) 1997-2003 University of Cambridge.
|
@ -1,52 +0,0 @@
|
||||
.TH PCRE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH PCRE SAMPLE PROGRAM
|
||||
.rs
|
||||
.sp
|
||||
A simple, complete demonstration program, to get you started with using PCRE,
|
||||
is supplied in the file \fIpcredemo.c\fR in the PCRE distribution.
|
||||
|
||||
The program compiles the regular expression that is its first argument, and
|
||||
matches it against the subject string in its second argument. No PCRE options
|
||||
are set, and default character tables are used. If matching succeeds, the
|
||||
program outputs the portion of the subject that matched, together with the
|
||||
contents of any captured substrings.
|
||||
|
||||
If the -g option is given on the command line, the program then goes on to
|
||||
check for further matches of the same regular expression in the same subject
|
||||
string. The logic is a little bit tricky because of the possibility of matching
|
||||
an empty string. Comments in the code explain what is going on.
|
||||
|
||||
On a Unix system that has PCRE installed in \fI/usr/local\fR, you can compile
|
||||
the demonstration program using a command like this:
|
||||
|
||||
gcc -o pcredemo pcredemo.c -I/usr/local/include \\
|
||||
-L/usr/local/lib -lpcre
|
||||
|
||||
Then you can run simple tests like this:
|
||||
|
||||
./pcredemo 'cat|dog' 'the cat sat on the mat'
|
||||
./pcredemo -g 'cat|dog' 'the dog sat on the cat'
|
||||
|
||||
Note that there is a much more comprehensive test program, called
|
||||
\fBpcretest\fR, which supports many more facilities for testing regular
|
||||
expressions and the PCRE library. The \fBpcredemo\fR program is provided as a
|
||||
simple coding example.
|
||||
|
||||
On some operating systems (e.g. Solaris) you may get an error like this when
|
||||
you try to run \fBpcredemo\fR:
|
||||
|
||||
ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such file or directory
|
||||
|
||||
This is caused by the way shared library support works on those systems. You
|
||||
need to add
|
||||
|
||||
-R/usr/local/lib
|
||||
|
||||
to the compile command to get round this problem.
|
||||
|
||||
.in 0
|
||||
Last updated: 28 January 2003
|
||||
.br
|
||||
Copyright (c) 1997-2003 University of Cambridge.
|
@ -1,364 +0,0 @@
|
||||
.TH PCRETEST 1
|
||||
.SH NAME
|
||||
pcretest - a program for testing Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS
|
||||
.B pcretest "[-d] [-i] [-m] [-o osize] [-p] [-t] [source] [destination]"
|
||||
|
||||
\fBpcretest\fR was written as a test program for the PCRE regular expression
|
||||
library itself, but it can also be used for experimenting with regular
|
||||
expressions. This document describes the features of the test program; for
|
||||
details of the regular expressions themselves, see the
|
||||
.\" HREF
|
||||
\fBpcrepattern\fR
|
||||
.\"
|
||||
documentation. For details of PCRE and its options, see the
|
||||
.\" HREF
|
||||
\fBpcreapi\fR
|
||||
.\"
|
||||
documentation.
|
||||
|
||||
.SH OPTIONS
|
||||
.rs
|
||||
.sp
|
||||
.TP 10
|
||||
\fB-C\fR
|
||||
Output the version number of the PCRE library, and all available information
|
||||
about the optional features that are included, and then exit.
|
||||
.TP 10
|
||||
\fB-d\fR
|
||||
Behave as if each regex had the \fB/D\fR modifier (see below); the internal
|
||||
form is output after compilation.
|
||||
.TP 10
|
||||
\fB-i\fR
|
||||
Behave as if each regex had the \fB/I\fR modifier; information about the
|
||||
compiled pattern is given after compilation.
|
||||
.TP 10
|
||||
\fB-m\fR
|
||||
Output the size of each compiled pattern after it has been compiled. This is
|
||||
equivalent to adding /M to each regular expression. For compatibility with
|
||||
earlier versions of pcretest, \fB-s\fR is a synonym for \fB-m\fR.
|
||||
.TP 10
|
||||
\fB-o\fR \fIosize\fR
|
||||
Set the number of elements in the output vector that is used when calling PCRE
|
||||
to be \fIosize\fR. The default value is 45, which is enough for 14 capturing
|
||||
subexpressions. The vector size can be changed for individual matching calls by
|
||||
including \\O in the data line (see below).
|
||||
.TP 10
|
||||
\fB-p\fR
|
||||
Behave as if each regex has \fB/P\fR modifier; the POSIX wrapper API is used
|
||||
to call PCRE. None of the other options has any effect when \fB-p\fR is set.
|
||||
.TP 10
|
||||
\fB-t\fR
|
||||
Run each compile, study, and match many times with a timer, and output
|
||||
resulting time per compile or match (in milliseconds). Do not set \fB-t\fR with
|
||||
\fB-m\fR, because you will then get the size output 20000 times and the timing
|
||||
will be distorted.
|
||||
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
If \fBpcretest\fR is given two filename arguments, it reads from the first and
|
||||
writes to the second. If it is given only one filename argument, it reads from
|
||||
that file and writes to stdout. Otherwise, it reads from stdin and writes to
|
||||
stdout, and prompts for each line of input, using "re>" to prompt for regular
|
||||
expressions, and "data>" to prompt for data lines.
|
||||
|
||||
The program handles any number of sets of input on a single input file. Each
|
||||
set starts with a regular expression, and continues with any number of data
|
||||
lines to be matched against the pattern.
|
||||
|
||||
Each line is matched separately and independently. If you want to do
|
||||
multiple-line matches, you have to use the \\n escape sequence in a single line
|
||||
of input to encode the newline characters. The maximum length of data line is
|
||||
30,000 characters.
|
||||
|
||||
An empty line signals the end of the data lines, at which point a new regular
|
||||
expression is read. The regular expressions are given enclosed in any
|
||||
non-alphameric delimiters other than backslash, for example
|
||||
|
||||
/(a|bc)x+yz/
|
||||
|
||||
White space before the initial delimiter is ignored. A regular expression may
|
||||
be continued over several input lines, in which case the newline characters are
|
||||
included within it. It is possible to include the delimiter within the pattern
|
||||
by escaping it, for example
|
||||
|
||||
/abc\\/def/
|
||||
|
||||
If you do so, the escape and the delimiter form part of the pattern, but since
|
||||
delimiters are always non-alphameric, this does not affect its interpretation.
|
||||
If the terminating delimiter is immediately followed by a backslash, for
|
||||
example,
|
||||
|
||||
/abc/\\
|
||||
|
||||
then a backslash is added to the end of the pattern. This is done to provide a
|
||||
way of testing the error condition that arises if a pattern finishes with a
|
||||
backslash, because
|
||||
|
||||
/abc\\/
|
||||
|
||||
is interpreted as the first line of a pattern that starts with "abc/", causing
|
||||
pcretest to read the next line as a continuation of the regular expression.
|
||||
|
||||
.SH PATTERN MODIFIERS
|
||||
.rs
|
||||
.sp
|
||||
The pattern may be followed by \fBi\fR, \fBm\fR, \fBs\fR, or \fBx\fR to set the
|
||||
PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options,
|
||||
respectively. For example:
|
||||
|
||||
/caseless/i
|
||||
|
||||
These modifier letters have the same effect as they do in Perl. There are
|
||||
others that set PCRE options that do not correspond to anything in Perl:
|
||||
\fB/A\fR, \fB/E\fR, \fB/N\fR, \fB/U\fR, and \fB/X\fR set PCRE_ANCHORED,
|
||||
PCRE_DOLLAR_ENDONLY, PCRE_NO_AUTO_CAPTURE, PCRE_UNGREEDY, and PCRE_EXTRA
|
||||
respectively.
|
||||
|
||||
Searching for all possible matches within each subject string can be requested
|
||||
by the \fB/g\fR or \fB/G\fR modifier. After finding a match, PCRE is called
|
||||
again to search the remainder of the subject string. The difference between
|
||||
\fB/g\fR and \fB/G\fR is that the former uses the \fIstartoffset\fR argument to
|
||||
\fBpcre_exec()\fR to start searching at a new point within the entire string
|
||||
(which is in effect what Perl does), whereas the latter passes over a shortened
|
||||
substring. This makes a difference to the matching process if the pattern
|
||||
begins with a lookbehind assertion (including \\b or \\B).
|
||||
|
||||
If any call to \fBpcre_exec()\fR in a \fB/g\fR or \fB/G\fR sequence matches an
|
||||
empty string, the next call is done with the PCRE_NOTEMPTY and PCRE_ANCHORED
|
||||
flags set in order to search for another, non-empty, match at the same point.
|
||||
If this second match fails, the start offset is advanced by one, and the normal
|
||||
match is retried. This imitates the way Perl handles such cases when using the
|
||||
\fB/g\fR modifier or the \fBsplit()\fR function.
|
||||
|
||||
There are a number of other modifiers for controlling the way \fBpcretest\fR
|
||||
operates.
|
||||
|
||||
The \fB/+\fR modifier requests that as well as outputting the substring that
|
||||
matched the entire pattern, pcretest should in addition output the remainder of
|
||||
the subject string. This is useful for tests where the subject contains
|
||||
multiple copies of the same substring.
|
||||
|
||||
The \fB/L\fR modifier must be followed directly by the name of a locale, for
|
||||
example,
|
||||
|
||||
/pattern/Lfr
|
||||
|
||||
For this reason, it must be the last modifier letter. The given locale is set,
|
||||
\fBpcre_maketables()\fR is called to build a set of character tables for the
|
||||
locale, and this is then passed to \fBpcre_compile()\fR when compiling the
|
||||
regular expression. Without an \fB/L\fR modifier, NULL is passed as the tables
|
||||
pointer; that is, \fB/L\fR applies only to the expression on which it appears.
|
||||
|
||||
The \fB/I\fR modifier requests that \fBpcretest\fR output information about the
|
||||
compiled expression (whether it is anchored, has a fixed first character, and
|
||||
so on). It does this by calling \fBpcre_fullinfo()\fR after compiling an
|
||||
expression, and outputting the information it gets back. If the pattern is
|
||||
studied, the results of that are also output.
|
||||
|
||||
The \fB/D\fR modifier is a PCRE debugging feature, which also assumes \fB/I\fR.
|
||||
It causes the internal form of compiled regular expressions to be output after
|
||||
compilation. If the pattern was studied, the information returned is also
|
||||
output.
|
||||
|
||||
The \fB/S\fR modifier causes \fBpcre_study()\fR to be called after the
|
||||
expression has been compiled, and the results used when the expression is
|
||||
matched.
|
||||
|
||||
The \fB/M\fR modifier causes the size of memory block used to hold the compiled
|
||||
pattern to be output.
|
||||
|
||||
The \fB/P\fR modifier causes \fBpcretest\fR to call PCRE via the POSIX wrapper
|
||||
API rather than its native API. When this is done, all other modifiers except
|
||||
\fB/i\fR, \fB/m\fR, and \fB/+\fR are ignored. REG_ICASE is set if \fB/i\fR is
|
||||
present, and REG_NEWLINE is set if \fB/m\fR is present. The wrapper functions
|
||||
force PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.
|
||||
|
||||
The \fB/8\fR modifier causes \fBpcretest\fR to call PCRE with the PCRE_UTF8
|
||||
option set. This turns on support for UTF-8 character handling in PCRE,
|
||||
provided that it was compiled with this support enabled. This modifier also
|
||||
causes any non-printing characters in output strings to be printed using the
|
||||
\\x{hh...} notation if they are valid UTF-8 sequences.
|
||||
|
||||
If the \fB/?\fR modifier is used with \fB/8\fR, it causes \fBpcretest\fR to
|
||||
call \fBpcre_compile()\fR with the PCRE_NO_UTF8_CHECK option, to suppress the
|
||||
checking of the string for UTF-8 validity.
|
||||
|
||||
.SH CALLOUTS
|
||||
.rs
|
||||
.sp
|
||||
If the pattern contains any callout requests, \fBpcretest\fR's callout function
|
||||
will be called. By default, it displays the callout number, and the start and
|
||||
current positions in the text at the callout time. For example, the output
|
||||
|
||||
--->pqrabcdef
|
||||
0 ^ ^
|
||||
|
||||
indicates that callout number 0 occurred for a match attempt starting at the
|
||||
fourth character of the subject string, when the pointer was at the seventh
|
||||
character. The callout function returns zero (carry on matching) by default.
|
||||
|
||||
Inserting callouts may be helpful when using \fBpcretest\fR to check
|
||||
complicated regular expressions. For further information about callouts, see
|
||||
the
|
||||
.\" HREF
|
||||
\fBpcrecallout\fR
|
||||
.\"
|
||||
documentation.
|
||||
|
||||
For testing the PCRE library, additional control of callout behaviour is
|
||||
available via escape sequences in the data, as described in the following
|
||||
section. In particular, it is possible to pass in a number as callout data (the
|
||||
default is zero). If the callout function receives a non-zero number, it
|
||||
returns that value instead of zero.
|
||||
|
||||
.SH DATA LINES
|
||||
.rs
|
||||
.sp
|
||||
Before each data line is passed to \fBpcre_exec()\fR, leading and trailing
|
||||
whitespace is removed, and it is then scanned for \\ escapes. Some of these are
|
||||
pretty esoteric features, intended for checking out some of the more
|
||||
complicated features of PCRE. If you are just testing "ordinary" regular
|
||||
expressions, you probably don't need any of these. The following escapes are
|
||||
recognized:
|
||||
|
||||
\\a alarm (= BEL)
|
||||
\\b backspace
|
||||
\\e escape
|
||||
\\f formfeed
|
||||
\\n newline
|
||||
\\r carriage return
|
||||
\\t tab
|
||||
\\v vertical tab
|
||||
\\nnn octal character (up to 3 octal digits)
|
||||
\\xhh hexadecimal character (up to 2 hex digits)
|
||||
\\x{hh...} hexadecimal character, any number of digits
|
||||
in UTF-8 mode
|
||||
\\A pass the PCRE_ANCHORED option to \fBpcre_exec()\fR
|
||||
\\B pass the PCRE_NOTBOL option to \fBpcre_exec()\fR
|
||||
\\Cdd call pcre_copy_substring() for substring dd
|
||||
after a successful match (any decimal number
|
||||
less than 32)
|
||||
\\Cname call pcre_copy_named_substring() for substring
|
||||
"name" after a successful match (name termin-
|
||||
ated by next non alphanumeric character)
|
||||
\\C+ show the current captured substrings at callout
|
||||
time
|
||||
\\C- do not supply a callout function
|
||||
\\C!n return 1 instead of 0 when callout number n is
|
||||
reached
|
||||
\\C!n!m return 1 instead of 0 when callout number n is
|
||||
reached for the nth time
|
||||
\\C*n pass the number n (may be negative) as callout
|
||||
data
|
||||
\\Gdd call pcre_get_substring() for substring dd
|
||||
after a successful match (any decimal number
|
||||
less than 32)
|
||||
\\Gname call pcre_get_named_substring() for substring
|
||||
"name" after a successful match (name termin-
|
||||
ated by next non-alphanumeric character)
|
||||
\\L call pcre_get_substringlist() after a
|
||||
successful match
|
||||
\\M discover the minimum MATCH_LIMIT setting
|
||||
\\N pass the PCRE_NOTEMPTY option to \fBpcre_exec()\fR
|
||||
\\Odd set the size of the output vector passed to
|
||||
\fBpcre_exec()\fR to dd (any number of decimal
|
||||
digits)
|
||||
\\S output details of memory get/free calls during matching
|
||||
\\Z pass the PCRE_NOTEOL option to \fBpcre_exec()\fR
|
||||
\\? pass the PCRE_NO_UTF8_CHECK option to
|
||||
\fBpcre_exec()\fR
|
||||
|
||||
If \\M is present, \fBpcretest\fR calls \fBpcre_exec()\fR several times, with
|
||||
different values in the \fImatch_limit\fR field of the \fBpcre_extra\fR data
|
||||
structure, until it finds the minimum number that is needed for
|
||||
\fBpcre_exec()\fR to complete. This number is a measure of the amount of
|
||||
recursion and backtracking that takes place, and checking it out can be
|
||||
instructive. For most simple matches, the number is quite small, but for
|
||||
patterns with very large numbers of matching possibilities, it can become large
|
||||
very quickly with increasing length of subject string.
|
||||
|
||||
When \\O is used, it may be higher or lower than the size set by the \fB-O\fR
|
||||
option (or defaulted to 45); \\O applies only to the call of \fBpcre_exec()\fR
|
||||
for the line in which it appears.
|
||||
|
||||
A backslash followed by anything else just escapes the anything else. If the
|
||||
very last character is a backslash, it is ignored. This gives a way of passing
|
||||
an empty line as data, since a real empty line terminates the data input.
|
||||
|
||||
If \fB/P\fR was present on the regex, causing the POSIX wrapper API to be used,
|
||||
only \fB\B\fR, and \fB\Z\fR have any effect, causing REG_NOTBOL and REG_NOTEOL
|
||||
to be passed to \fBregexec()\fR respectively.
|
||||
|
||||
The use of \\x{hh...} to represent UTF-8 characters is not dependent on the use
|
||||
of the \fB/8\fR modifier on the pattern. It is recognized always. There may be
|
||||
any number of hexadecimal digits inside the braces. The result is from one to
|
||||
six bytes, encoded according to the UTF-8 rules.
|
||||
|
||||
.SH OUTPUT FROM PCRETEST
|
||||
.rs
|
||||
.sp
|
||||
When a match succeeds, pcretest outputs the list of captured substrings that
|
||||
\fBpcre_exec()\fR returns, starting with number 0 for the string that matched
|
||||
the whole pattern. Here is an example of an interactive pcretest run.
|
||||
|
||||
$ pcretest
|
||||
PCRE version 4.00 08-Jan-2003
|
||||
|
||||
re> /^abc(\\d+)/
|
||||
data> abc123
|
||||
0: abc123
|
||||
1: 123
|
||||
data> xyz
|
||||
No match
|
||||
|
||||
If the strings contain any non-printing characters, they are output as \\0x
|
||||
escapes, or as \\x{...} escapes if the \fB/8\fR modifier was present on the
|
||||
pattern. If the pattern has the \fB/+\fR modifier, then the output for
|
||||
substring 0 is followed by the the rest of the subject string, identified by
|
||||
"0+" like this:
|
||||
|
||||
re> /cat/+
|
||||
data> cataract
|
||||
0: cat
|
||||
0+ aract
|
||||
|
||||
If the pattern has the \fB/g\fR or \fB/G\fR modifier, the results of successive
|
||||
matching attempts are output in sequence, like this:
|
||||
|
||||
re> /\\Bi(\\w\\w)/g
|
||||
data> Mississippi
|
||||
0: iss
|
||||
1: ss
|
||||
0: iss
|
||||
1: ss
|
||||
0: ipp
|
||||
1: pp
|
||||
|
||||
"No match" is output only if the first match attempt fails.
|
||||
|
||||
If any of the sequences \fB\\C\fR, \fB\\G\fR, or \fB\\L\fR are present in a
|
||||
data line that is successfully matched, the substrings extracted by the
|
||||
convenience functions are output with C, G, or L after the string number
|
||||
instead of a colon. This is in addition to the normal full list. The string
|
||||
length (that is, the return from the extraction function) is given in
|
||||
parentheses after each string for \fB\\C\fR and \fB\\G\fR.
|
||||
|
||||
Note that while patterns can be continued over several lines (a plain ">"
|
||||
prompt is used for continuations), data lines may not. However newlines can be
|
||||
included in data by means of the \\n escape.
|
||||
|
||||
.SH AUTHOR
|
||||
.rs
|
||||
.sp
|
||||
Philip Hazel <ph10@cam.ac.uk>
|
||||
.br
|
||||
University Computing Service,
|
||||
.br
|
||||
Cambridge CB2 3QG, England.
|
||||
|
||||
.in 0
|
||||
Last updated: 09 December 2003
|
||||
.br
|
||||
Copyright (c) 1997-2003 University of Cambridge.
|
@ -1,357 +0,0 @@
|
||||
PCRETEST(1) PCRETEST(1)
|
||||
|
||||
|
||||
|
||||
NAME
|
||||
pcretest - a program for testing Perl-compatible regular expressions.
|
||||
|
||||
SYNOPSIS
|
||||
pcretest [-d] [-i] [-m] [-o osize] [-p] [-t] [source] [destination]
|
||||
|
||||
pcretest was written as a test program for the PCRE regular expression
|
||||
library itself, but it can also be used for experimenting with regular
|
||||
expressions. This document describes the features of the test program;
|
||||
for details of the regular expressions themselves, see the pcrepattern
|
||||
documentation. For details of PCRE and its options, see the pcreapi
|
||||
documentation.
|
||||
|
||||
|
||||
OPTIONS
|
||||
|
||||
|
||||
-C Output the version number of the PCRE library, and all avail-
|
||||
able information about the optional features that are
|
||||
included, and then exit.
|
||||
|
||||
-d Behave as if each regex had the /D modifier (see below); the
|
||||
internal form is output after compilation.
|
||||
|
||||
-i Behave as if each regex had the /I modifier; information
|
||||
about the compiled pattern is given after compilation.
|
||||
|
||||
-m Output the size of each compiled pattern after it has been
|
||||
compiled. This is equivalent to adding /M to each regular
|
||||
expression. For compatibility with earlier versions of
|
||||
pcretest, -s is a synonym for -m.
|
||||
|
||||
-o osize Set the number of elements in the output vector that is used
|
||||
when calling PCRE to be osize. The default value is 45, which
|
||||
is enough for 14 capturing subexpressions. The vector size
|
||||
can be changed for individual matching calls by including \O
|
||||
in the data line (see below).
|
||||
|
||||
-p Behave as if each regex has /P modifier; the POSIX wrapper
|
||||
API is used to call PCRE. None of the other options has any
|
||||
effect when -p is set.
|
||||
|
||||
-t Run each compile, study, and match many times with a timer,
|
||||
and output resulting time per compile or match (in millisec-
|
||||
onds). Do not set -t with -m, because you will then get the
|
||||
size output 20000 times and the timing will be distorted.
|
||||
|
||||
|
||||
DESCRIPTION
|
||||
|
||||
If pcretest is given two filename arguments, it reads from the first
|
||||
and writes to the second. If it is given only one filename argument, it
|
||||
reads from that file and writes to stdout. Otherwise, it reads from
|
||||
stdin and writes to stdout, and prompts for each line of input, using
|
||||
"re>" to prompt for regular expressions, and "data>" to prompt for data
|
||||
lines.
|
||||
|
||||
The program handles any number of sets of input on a single input file.
|
||||
Each set starts with a regular expression, and continues with any num-
|
||||
ber of data lines to be matched against the pattern.
|
||||
|
||||
Each line is matched separately and independently. If you want to do
|
||||
multiple-line matches, you have to use the \n escape sequence in a sin-
|
||||
gle line of input to encode the newline characters. The maximum length
|
||||
of data line is 30,000 characters.
|
||||
|
||||
An empty line signals the end of the data lines, at which point a new
|
||||
regular expression is read. The regular expressions are given enclosed
|
||||
in any non-alphameric delimiters other than backslash, for example
|
||||
|
||||
/(a|bc)x+yz/
|
||||
|
||||
White space before the initial delimiter is ignored. A regular expres-
|
||||
sion may be continued over several input lines, in which case the new-
|
||||
line characters are included within it. It is possible to include the
|
||||
delimiter within the pattern by escaping it, for example
|
||||
|
||||
/abc\/def/
|
||||
|
||||
If you do so, the escape and the delimiter form part of the pattern,
|
||||
but since delimiters are always non-alphameric, this does not affect
|
||||
its interpretation. If the terminating delimiter is immediately fol-
|
||||
lowed by a backslash, for example,
|
||||
|
||||
/abc/\
|
||||
|
||||
then a backslash is added to the end of the pattern. This is done to
|
||||
provide a way of testing the error condition that arises if a pattern
|
||||
finishes with a backslash, because
|
||||
|
||||
/abc\/
|
||||
|
||||
is interpreted as the first line of a pattern that starts with "abc/",
|
||||
causing pcretest to read the next line as a continuation of the regular
|
||||
expression.
|
||||
|
||||
|
||||
PATTERN MODIFIERS
|
||||
|
||||
The pattern may be followed by i, m, s, or x to set the PCRE_CASELESS,
|
||||
PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively.
|
||||
For example:
|
||||
|
||||
/caseless/i
|
||||
|
||||
These modifier letters have the same effect as they do in Perl. There
|
||||
are others that set PCRE options that do not correspond to anything in
|
||||
Perl: /A, /E, /N, /U, and /X set PCRE_ANCHORED, PCRE_DOLLAR_ENDONLY,
|
||||
PCRE_NO_AUTO_CAPTURE, PCRE_UNGREEDY, and PCRE_EXTRA respectively.
|
||||
|
||||
Searching for all possible matches within each subject string can be
|
||||
requested by the /g or /G modifier. After finding a match, PCRE is
|
||||
called again to search the remainder of the subject string. The differ-
|
||||
ence between /g and /G is that the former uses the startoffset argument
|
||||
to pcre_exec() to start searching at a new point within the entire
|
||||
string (which is in effect what Perl does), whereas the latter passes
|
||||
over a shortened substring. This makes a difference to the matching
|
||||
process if the pattern begins with a lookbehind assertion (including \b
|
||||
or \B).
|
||||
|
||||
If any call to pcre_exec() in a /g or /G sequence matches an empty
|
||||
string, the next call is done with the PCRE_NOTEMPTY and PCRE_ANCHORED
|
||||
flags set in order to search for another, non-empty, match at the same
|
||||
point. If this second match fails, the start offset is advanced by
|
||||
one, and the normal match is retried. This imitates the way Perl han-
|
||||
dles such cases when using the /g modifier or the split() function.
|
||||
|
||||
There are a number of other modifiers for controlling the way pcretest
|
||||
operates.
|
||||
|
||||
The /+ modifier requests that as well as outputting the substring that
|
||||
matched the entire pattern, pcretest should in addition output the
|
||||
remainder of the subject string. This is useful for tests where the
|
||||
subject contains multiple copies of the same substring.
|
||||
|
||||
The /L modifier must be followed directly by the name of a locale, for
|
||||
example,
|
||||
|
||||
/pattern/Lfr
|
||||
|
||||
For this reason, it must be the last modifier letter. The given locale
|
||||
is set, pcre_maketables() is called to build a set of character tables
|
||||
for the locale, and this is then passed to pcre_compile() when compil-
|
||||
ing the regular expression. Without an /L modifier, NULL is passed as
|
||||
the tables pointer; that is, /L applies only to the expression on which
|
||||
it appears.
|
||||
|
||||
The /I modifier requests that pcretest output information about the
|
||||
compiled expression (whether it is anchored, has a fixed first charac-
|
||||
ter, and so on). It does this by calling pcre_fullinfo() after compil-
|
||||
ing an expression, and outputting the information it gets back. If the
|
||||
pattern is studied, the results of that are also output.
|
||||
|
||||
The /D modifier is a PCRE debugging feature, which also assumes /I. It
|
||||
causes the internal form of compiled regular expressions to be output
|
||||
after compilation. If the pattern was studied, the information returned
|
||||
is also output.
|
||||
|
||||
The /S modifier causes pcre_study() to be called after the expression
|
||||
has been compiled, and the results used when the expression is matched.
|
||||
|
||||
The /M modifier causes the size of memory block used to hold the com-
|
||||
piled pattern to be output.
|
||||
|
||||
The /P modifier causes pcretest to call PCRE via the POSIX wrapper API
|
||||
rather than its native API. When this is done, all other modifiers
|
||||
except /i, /m, and /+ are ignored. REG_ICASE is set if /i is present,
|
||||
and REG_NEWLINE is set if /m is present. The wrapper functions force
|
||||
PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.
|
||||
|
||||
The /8 modifier causes pcretest to call PCRE with the PCRE_UTF8 option
|
||||
set. This turns on support for UTF-8 character handling in PCRE, pro-
|
||||
vided that it was compiled with this support enabled. This modifier
|
||||
also causes any non-printing characters in output strings to be printed
|
||||
using the \x{hh...} notation if they are valid UTF-8 sequences.
|
||||
|
||||
If the /? modifier is used with /8, it causes pcretest to call
|
||||
pcre_compile() with the PCRE_NO_UTF8_CHECK option, to suppress the
|
||||
checking of the string for UTF-8 validity.
|
||||
|
||||
|
||||
CALLOUTS
|
||||
|
||||
If the pattern contains any callout requests, pcretest's callout func-
|
||||
tion will be called. By default, it displays the callout number, and
|
||||
the start and current positions in the text at the callout time. For
|
||||
example, the output
|
||||
|
||||
--->pqrabcdef
|
||||
0 ^ ^
|
||||
|
||||
indicates that callout number 0 occurred for a match attempt starting
|
||||
at the fourth character of the subject string, when the pointer was at
|
||||
the seventh character. The callout function returns zero (carry on
|
||||
matching) by default.
|
||||
|
||||
Inserting callouts may be helpful when using pcretest to check compli-
|
||||
cated regular expressions. For further information about callouts, see
|
||||
the pcrecallout documentation.
|
||||
|
||||
For testing the PCRE library, additional control of callout behaviour
|
||||
is available via escape sequences in the data, as described in the fol-
|
||||
lowing section. In particular, it is possible to pass in a number as
|
||||
callout data (the default is zero). If the callout function receives a
|
||||
non-zero number, it returns that value instead of zero.
|
||||
|
||||
|
||||
DATA LINES
|
||||
|
||||
Before each data line is passed to pcre_exec(), leading and trailing
|
||||
whitespace is removed, and it is then scanned for \ escapes. Some of
|
||||
these are pretty esoteric features, intended for checking out some of
|
||||
the more complicated features of PCRE. If you are just testing "ordi-
|
||||
nary" regular expressions, you probably don't need any of these. The
|
||||
following escapes are recognized:
|
||||
|
||||
\a alarm (= BEL)
|
||||
\b backspace
|
||||
\e escape
|
||||
\f formfeed
|
||||
\n newline
|
||||
\r carriage return
|
||||
\t tab
|
||||
\v vertical tab
|
||||
\nnn octal character (up to 3 octal digits)
|
||||
\xhh hexadecimal character (up to 2 hex digits)
|
||||
\x{hh...} hexadecimal character, any number of digits
|
||||
in UTF-8 mode
|
||||
\A pass the PCRE_ANCHORED option to pcre_exec()
|
||||
\B pass the PCRE_NOTBOL option to pcre_exec()
|
||||
\Cdd call pcre_copy_substring() for substring dd
|
||||
after a successful match (any decimal number
|
||||
less than 32)
|
||||
\Cname call pcre_copy_named_substring() for substring
|
||||
"name" after a successful match (name termin-
|
||||
ated by next non alphanumeric character)
|
||||
\C+ show the current captured substrings at callout
|
||||
time
|
||||
\C- do not supply a callout function
|
||||
\C!n return 1 instead of 0 when callout number n is
|
||||
reached
|
||||
\C!n!m return 1 instead of 0 when callout number n is
|
||||
reached for the nth time
|
||||
\C*n pass the number n (may be negative) as callout
|
||||
data
|
||||
\Gdd call pcre_get_substring() for substring dd
|
||||
after a successful match (any decimal number
|
||||
less than 32)
|
||||
\Gname call pcre_get_named_substring() for substring
|
||||
"name" after a successful match (name termin-
|
||||
ated by next non-alphanumeric character)
|
||||
\L call pcre_get_substringlist() after a
|
||||
successful match
|
||||
\M discover the minimum MATCH_LIMIT setting
|
||||
\N pass the PCRE_NOTEMPTY option to pcre_exec()
|
||||
\Odd set the size of the output vector passed to
|
||||
pcre_exec() to dd (any number of decimal
|
||||
digits)
|
||||
\S output details of memory get/free calls during matching
|
||||
\Z pass the PCRE_NOTEOL option to pcre_exec()
|
||||
\? pass the PCRE_NO_UTF8_CHECK option to
|
||||
pcre_exec()
|
||||
|
||||
If \M is present, pcretest calls pcre_exec() several times, with dif-
|
||||
ferent values in the match_limit field of the pcre_extra data struc-
|
||||
ture, until it finds the minimum number that is needed for pcre_exec()
|
||||
to complete. This number is a measure of the amount of recursion and
|
||||
backtracking that takes place, and checking it out can be instructive.
|
||||
For most simple matches, the number is quite small, but for patterns
|
||||
with very large numbers of matching possibilities, it can become large
|
||||
very quickly with increasing length of subject string.
|
||||
|
||||
When \O is used, it may be higher or lower than the size set by the -O
|
||||
option (or defaulted to 45); \O applies only to the call of pcre_exec()
|
||||
for the line in which it appears.
|
||||
|
||||
A backslash followed by anything else just escapes the anything else.
|
||||
If the very last character is a backslash, it is ignored. This gives a
|
||||
way of passing an empty line as data, since a real empty line termi-
|
||||
nates the data input.
|
||||
|
||||
If /P was present on the regex, causing the POSIX wrapper API to be
|
||||
used, only 0 causing REG_NOTBOL and REG_NOTEOL to be passed to
|
||||
regexec() respectively.
|
||||
|
||||
The use of \x{hh...} to represent UTF-8 characters is not dependent on
|
||||
the use of the /8 modifier on the pattern. It is recognized always.
|
||||
There may be any number of hexadecimal digits inside the braces. The
|
||||
result is from one to six bytes, encoded according to the UTF-8 rules.
|
||||
|
||||
|
||||
OUTPUT FROM PCRETEST
|
||||
|
||||
When a match succeeds, pcretest outputs the list of captured substrings
|
||||
that pcre_exec() returns, starting with number 0 for the string that
|
||||
matched the whole pattern. Here is an example of an interactive
|
||||
pcretest run.
|
||||
|
||||
$ pcretest
|
||||
PCRE version 4.00 08-Jan-2003
|
||||
|
||||
re> /^abc(\d+)/
|
||||
data> abc123
|
||||
0: abc123
|
||||
1: 123
|
||||
data> xyz
|
||||
No match
|
||||
|
||||
If the strings contain any non-printing characters, they are output as
|
||||
\0x escapes, or as \x{...} escapes if the /8 modifier was present on
|
||||
the pattern. If the pattern has the /+ modifier, then the output for
|
||||
substring 0 is followed by the the rest of the subject string, identi-
|
||||
fied by "0+" like this:
|
||||
|
||||
re> /cat/+
|
||||
data> cataract
|
||||
0: cat
|
||||
0+ aract
|
||||
|
||||
If the pattern has the /g or /G modifier, the results of successive
|
||||
matching attempts are output in sequence, like this:
|
||||
|
||||
re> /\Bi(\w\w)/g
|
||||
data> Mississippi
|
||||
0: iss
|
||||
1: ss
|
||||
0: iss
|
||||
1: ss
|
||||
0: ipp
|
||||
1: pp
|
||||
|
||||
"No match" is output only if the first match attempt fails.
|
||||
|
||||
If any of the sequences \C, \G, or \L are present in a data line that
|
||||
is successfully matched, the substrings extracted by the convenience
|
||||
functions are output with C, G, or L after the string number instead of
|
||||
a colon. This is in addition to the normal full list. The string length
|
||||
(that is, the return from the extraction function) is given in paren-
|
||||
theses after each string for \C and \G.
|
||||
|
||||
Note that while patterns can be continued over several lines (a plain
|
||||
">" prompt is used for continuations), data lines may not. However new-
|
||||
lines can be included in data by means of the \n escape.
|
||||
|
||||
|
||||
AUTHOR
|
||||
|
||||
Philip Hazel <ph10@cam.ac.uk>
|
||||
University Computing Service,
|
||||
Cambridge CB2 3QG, England.
|
||||
|
||||
Last updated: 09 December 2003
|
||||
Copyright (c) 1997-2003 University of Cambridge.
|
@ -1,34 +0,0 @@
|
||||
The perltest program
|
||||
--------------------
|
||||
|
||||
The perltest program tests Perl's regular expressions; it has the same
|
||||
specification as pcretest, and so can be given identical input, except that
|
||||
input patterns can be followed only by Perl's lower case modifiers and /+ (as
|
||||
used by pcretest), which is recognized and handled by the program.
|
||||
|
||||
The data lines are processed as Perl double-quoted strings, so if they contain
|
||||
" \ $ or @ characters, these have to be escaped. For this reason, all such
|
||||
characters in testinput1 and testinput3 are escaped so that they can be used
|
||||
for perltest as well as for pcretest, and the special upper case modifiers such
|
||||
as /A that pcretest recognizes are not used in these files. The output should
|
||||
be identical, apart from the initial identifying banner.
|
||||
|
||||
The perltest script can also test UTF-8 features. It works as is for Perl 5.8
|
||||
or higher. It recognizes the special modifier /8 that pcretest uses to invoke
|
||||
UTF-8 functionality. The testinput5 file can be fed to perltest to run UTF-8
|
||||
tests.
|
||||
|
||||
For Perl 5.6, perltest won't work unmodified for the UTF-8 tests. You need to
|
||||
uncomment the "use utf8" lines that it contains. It is best to do this on a
|
||||
copy of the script, because for non-UTF-8 tests, these lines should remain
|
||||
commented out.
|
||||
|
||||
The testinput2 and testinput4 files are not suitable for feeding to perltest,
|
||||
since they do make use of the special upper case modifiers and escapes that
|
||||
pcretest uses to test some features of PCRE. The first of these files also
|
||||
contains malformed regular expressions, in order to check that PCRE diagnoses
|
||||
them correctly. Similarly, testinput6 tests UTF-8 features that do not relate
|
||||
to Perl.
|
||||
|
||||
Philip Hazel <ph10@cam.ac.uk>
|
||||
August 2002
|
@ -1,251 +0,0 @@
|
||||
#!/bin/sh
|
||||
#
|
||||
# install - install a program, script, or datafile
|
||||
# This comes from X11R5 (mit/util/scripts/install.sh).
|
||||
#
|
||||
# Copyright 1991 by the Massachusetts Institute of Technology
|
||||
#
|
||||
# Permission to use, copy, modify, distribute, and sell this software and its
|
||||
# documentation for any purpose is hereby granted without fee, provided that
|
||||
# the above copyright notice appear in all copies and that both that
|
||||
# copyright notice and this permission notice appear in supporting
|
||||
# documentation, and that the name of M.I.T. not be used in advertising or
|
||||
# publicity pertaining to distribution of the software without specific,
|
||||
# written prior permission. M.I.T. makes no representations about the
|
||||
# suitability of this software for any purpose. It is provided "as is"
|
||||
# without express or implied warranty.
|
||||
#
|
||||
# Calling this script install-sh is preferred over install.sh, to prevent
|
||||
# `make' implicit rules from creating a file called install from it
|
||||
# when there is no Makefile.
|
||||
#
|
||||
# This script is compatible with the BSD install script, but was written
|
||||
# from scratch. It can only install one file at a time, a restriction
|
||||
# shared with many OS's install programs.
|
||||
|
||||
|
||||
# set DOITPROG to echo to test this script
|
||||
|
||||
# Don't use :- since 4.3BSD and earlier shells don't like it.
|
||||
doit="${DOITPROG-}"
|
||||
|
||||
|
||||
# put in absolute paths if you don't have them in your path; or use env. vars.
|
||||
|
||||
mvprog="${MVPROG-mv}"
|
||||
cpprog="${CPPROG-cp}"
|
||||
chmodprog="${CHMODPROG-chmod}"
|
||||
chownprog="${CHOWNPROG-chown}"
|
||||
chgrpprog="${CHGRPPROG-chgrp}"
|
||||
stripprog="${STRIPPROG-strip}"
|
||||
rmprog="${RMPROG-rm}"
|
||||
mkdirprog="${MKDIRPROG-mkdir}"
|
||||
|
||||
transformbasename=""
|
||||
transform_arg=""
|
||||
instcmd="$mvprog"
|
||||
chmodcmd="$chmodprog 0755"
|
||||
chowncmd=""
|
||||
chgrpcmd=""
|
||||
stripcmd=""
|
||||
rmcmd="$rmprog -f"
|
||||
mvcmd="$mvprog"
|
||||
src=""
|
||||
dst=""
|
||||
dir_arg=""
|
||||
|
||||
while [ x"$1" != x ]; do
|
||||
case $1 in
|
||||
-c) instcmd="$cpprog"
|
||||
shift
|
||||
continue;;
|
||||
|
||||
-d) dir_arg=true
|
||||
shift
|
||||
continue;;
|
||||
|
||||
-m) chmodcmd="$chmodprog $2"
|
||||
shift
|
||||
shift
|
||||
continue;;
|
||||
|
||||
-o) chowncmd="$chownprog $2"
|
||||
shift
|
||||
shift
|
||||
continue;;
|
||||
|
||||
-g) chgrpcmd="$chgrpprog $2"
|
||||
shift
|
||||
shift
|
||||
continue;;
|
||||
|
||||
-s) stripcmd="$stripprog"
|
||||
shift
|
||||
continue;;
|
||||
|
||||
-t=*) transformarg=`echo $1 | sed 's/-t=//'`
|
||||
shift
|
||||
continue;;
|
||||
|
||||
-b=*) transformbasename=`echo $1 | sed 's/-b=//'`
|
||||
shift
|
||||
continue;;
|
||||
|
||||
*) if [ x"$src" = x ]
|
||||
then
|
||||
src=$1
|
||||
else
|
||||
# this colon is to work around a 386BSD /bin/sh bug
|
||||
:
|
||||
dst=$1
|
||||
fi
|
||||
shift
|
||||
continue;;
|
||||
esac
|
||||
done
|
||||
|
||||
if [ x"$src" = x ]
|
||||
then
|
||||
echo "install: no input file specified"
|
||||
exit 1
|
||||
else
|
||||
true
|
||||
fi
|
||||
|
||||
if [ x"$dir_arg" != x ]; then
|
||||
dst=$src
|
||||
src=""
|
||||
|
||||
if [ -d $dst ]; then
|
||||
instcmd=:
|
||||
chmodcmd=""
|
||||
else
|
||||
instcmd=mkdir
|
||||
fi
|
||||
else
|
||||
|
||||
# Waiting for this to be detected by the "$instcmd $src $dsttmp" command
|
||||
# might cause directories to be created, which would be especially bad
|
||||
# if $src (and thus $dsttmp) contains '*'.
|
||||
|
||||
if [ -f $src -o -d $src ]
|
||||
then
|
||||
true
|
||||
else
|
||||
echo "install: $src does not exist"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ x"$dst" = x ]
|
||||
then
|
||||
echo "install: no destination specified"
|
||||
exit 1
|
||||
else
|
||||
true
|
||||
fi
|
||||
|
||||
# If destination is a directory, append the input filename; if your system
|
||||
# does not like double slashes in filenames, you may need to add some logic
|
||||
|
||||
if [ -d $dst ]
|
||||
then
|
||||
dst="$dst"/`basename $src`
|
||||
else
|
||||
true
|
||||
fi
|
||||
fi
|
||||
|
||||
## this sed command emulates the dirname command
|
||||
dstdir=`echo $dst | sed -e 's,[^/]*$,,;s,/$,,;s,^$,.,'`
|
||||
|
||||
# Make sure that the destination directory exists.
|
||||
# this part is taken from Noah Friedman's mkinstalldirs script
|
||||
|
||||
# Skip lots of stat calls in the usual case.
|
||||
if [ ! -d "$dstdir" ]; then
|
||||
defaultIFS='
|
||||
'
|
||||
IFS="${IFS-${defaultIFS}}"
|
||||
|
||||
oIFS="${IFS}"
|
||||
# Some sh's can't handle IFS=/ for some reason.
|
||||
IFS='%'
|
||||
set - `echo ${dstdir} | sed -e 's@/@%@g' -e 's@^%@/@'`
|
||||
IFS="${oIFS}"
|
||||
|
||||
pathcomp=''
|
||||
|
||||
while [ $# -ne 0 ] ; do
|
||||
pathcomp="${pathcomp}${1}"
|
||||
shift
|
||||
|
||||
if [ ! -d "${pathcomp}" ] ;
|
||||
then
|
||||
$mkdirprog "${pathcomp}"
|
||||
else
|
||||
true
|
||||
fi
|
||||
|
||||
pathcomp="${pathcomp}/"
|
||||
done
|
||||
fi
|
||||
|
||||
if [ x"$dir_arg" != x ]
|
||||
then
|
||||
$doit $instcmd $dst &&
|
||||
|
||||
if [ x"$chowncmd" != x ]; then $doit $chowncmd $dst; else true ; fi &&
|
||||
if [ x"$chgrpcmd" != x ]; then $doit $chgrpcmd $dst; else true ; fi &&
|
||||
if [ x"$stripcmd" != x ]; then $doit $stripcmd $dst; else true ; fi &&
|
||||
if [ x"$chmodcmd" != x ]; then $doit $chmodcmd $dst; else true ; fi
|
||||
else
|
||||
|
||||
# If we're going to rename the final executable, determine the name now.
|
||||
|
||||
if [ x"$transformarg" = x ]
|
||||
then
|
||||
dstfile=`basename $dst`
|
||||
else
|
||||
dstfile=`basename $dst $transformbasename |
|
||||
sed $transformarg`$transformbasename
|
||||
fi
|
||||
|
||||
# don't allow the sed command to completely eliminate the filename
|
||||
|
||||
if [ x"$dstfile" = x ]
|
||||
then
|
||||
dstfile=`basename $dst`
|
||||
else
|
||||
true
|
||||
fi
|
||||
|
||||
# Make a temp file name in the proper directory.
|
||||
|
||||
dsttmp=$dstdir/#inst.$$#
|
||||
|
||||
# Move or copy the file name to the temp name
|
||||
|
||||
$doit $instcmd $src $dsttmp &&
|
||||
|
||||
trap "rm -f ${dsttmp}" 0 &&
|
||||
|
||||
# and set any options; do chmod last to preserve setuid bits
|
||||
|
||||
# If any of these fail, we abort the whole thing. If we want to
|
||||
# ignore errors from any of these, just make sure not to ignore
|
||||
# errors from the above "$doit $instcmd $src $dsttmp" command.
|
||||
|
||||
if [ x"$chowncmd" != x ]; then $doit $chowncmd $dsttmp; else true;fi &&
|
||||
if [ x"$chgrpcmd" != x ]; then $doit $chgrpcmd $dsttmp; else true;fi &&
|
||||
if [ x"$stripcmd" != x ]; then $doit $stripcmd $dsttmp; else true;fi &&
|
||||
if [ x"$chmodcmd" != x ]; then $doit $chmodcmd $dsttmp; else true;fi &&
|
||||
|
||||
# Now rename the file to the real destination.
|
||||
|
||||
$doit $rmcmd -f $dstdir/$dstfile &&
|
||||
$doit $mvcmd $dsttmp $dstdir/$dstfile
|
||||
|
||||
fi &&
|
||||
|
||||
|
||||
exit 0
|
@ -1,19 +0,0 @@
|
||||
LIBRARY libpcre
|
||||
EXPORTS
|
||||
pcre_malloc
|
||||
pcre_free
|
||||
pcre_config
|
||||
pcre_callout
|
||||
pcre_compile
|
||||
pcre_copy_substring
|
||||
pcre_exec
|
||||
pcre_get_substring
|
||||
pcre_get_stringnumber
|
||||
pcre_get_substring_list
|
||||
pcre_free_substring
|
||||
pcre_free_substring_list
|
||||
pcre_info
|
||||
pcre_fullinfo
|
||||
pcre_maketables
|
||||
pcre_study
|
||||
pcre_version
|
@ -1,24 +0,0 @@
|
||||
LIBRARY libpcreposix
|
||||
EXPORTS
|
||||
pcre_malloc
|
||||
pcre_free
|
||||
pcre_config
|
||||
pcre_callout
|
||||
pcre_compile
|
||||
pcre_copy_substring
|
||||
pcre_exec
|
||||
pcre_get_substring
|
||||
pcre_get_stringnumber
|
||||
pcre_get_substring_list
|
||||
pcre_free_substring
|
||||
pcre_free_substring_list
|
||||
pcre_info
|
||||
pcre_fullinfo
|
||||
pcre_maketables
|
||||
pcre_study
|
||||
pcre_version
|
||||
|
||||
regcomp
|
||||
regexec
|
||||
regerror
|
||||
regfree
|
File diff suppressed because it is too large
Load Diff
@ -1,25 +0,0 @@
|
||||
@echo off
|
||||
|
||||
REM This file was contributed by Alexander Tokarev for building PCRE for use
|
||||
REM with Virtual Pascal. It has not been tested with the latest PCRE release.
|
||||
|
||||
REM CHANGE THIS FOR YOUR BORLAND C++ COMPILER PATH
|
||||
|
||||
SET BORLAND=c:\usr\apps\bcc55
|
||||
|
||||
sh configure
|
||||
|
||||
bcc32 -DDFTABLES -DSTATIC -DVPCOMPAT -I%BORLAND%\include -L%BORLAND%\lib dftables.c
|
||||
|
||||
dftables > chartables.c
|
||||
|
||||
bcc32 -c -RT- -y- -v- -u- -P- -O2 -5 -DSTATIC -DVPCOMPAT -UDFTABLES -I%BORLAND%\include get.c maketables.c pcre.c study.c
|
||||
|
||||
tlib %BORLAND%\lib\cw32.lib *calloc *del *strncmp *memcpy *memmove *memset
|
||||
tlib pcre.lib +get.obj +maketables.obj +pcre.obj +study.obj +calloc.obj +del.obj +strncmp.obj +memcpy.obj +memmove.obj +memset.obj
|
||||
|
||||
del *.obj *.exe *.tds *.bak >nul 2>nul
|
||||
|
||||
echo ---
|
||||
echo Now the library should be complete. Please check all messages above.
|
||||
echo Don't care for warnings, it's OK.
|
@ -1,40 +0,0 @@
|
||||
#! /bin/sh
|
||||
# mkinstalldirs --- make directory hierarchy
|
||||
# Author: Noah Friedman <friedman@prep.ai.mit.edu>
|
||||
# Created: 1993-05-16
|
||||
# Public domain
|
||||
|
||||
# $Id: mkinstalldirs,v 1.12.2.1 1998/12/26 17:32:14 bje Exp $
|
||||
|
||||
errstatus=0
|
||||
|
||||
for file
|
||||
do
|
||||
set fnord `echo ":$file" | sed -ne 's/^:\//#/;s/^://;s/\// /g;s/^#/\//;p'`
|
||||
shift
|
||||
|
||||
pathcomp=
|
||||
for d
|
||||
do
|
||||
pathcomp="$pathcomp$d"
|
||||
case "$pathcomp" in
|
||||
-* ) pathcomp=./$pathcomp ;;
|
||||
esac
|
||||
|
||||
if test ! -d "$pathcomp"; then
|
||||
echo "mkdir $pathcomp"
|
||||
|
||||
mkdir "$pathcomp" || lasterr=$?
|
||||
|
||||
if test ! -d "$pathcomp"; then
|
||||
errstatus=$lasterr
|
||||
fi
|
||||
fi
|
||||
|
||||
pathcomp="$pathcomp/"
|
||||
done
|
||||
done
|
||||
|
||||
exit $errstatus
|
||||
|
||||
# mkinstalldirs ends here
|
@ -1,22 +0,0 @@
|
||||
EXPORTS
|
||||
|
||||
pcre_malloc DATA
|
||||
pcre_free DATA
|
||||
|
||||
pcre_compile
|
||||
pcre_copy_substring
|
||||
pcre_exec
|
||||
pcre_get_substring
|
||||
pcre_get_substring_list
|
||||
pcre_free_substring
|
||||
pcre_free_substring_list
|
||||
pcre_info
|
||||
pcre_fullinfo
|
||||
pcre_maketables
|
||||
pcre_study
|
||||
pcre_version
|
||||
|
||||
regcomp
|
||||
regexec
|
||||
regerror
|
||||
regfree
|
@ -1,193 +0,0 @@
|
||||
/*************************************************
|
||||
* Perl-Compatible Regular Expressions *
|
||||
*************************************************/
|
||||
|
||||
/* Copyright (c) 1997-2003 University of Cambridge */
|
||||
|
||||
#ifndef _PCRE_H
|
||||
#define _PCRE_H
|
||||
|
||||
/* The file pcre.h is build by "configure". Do not edit it; instead
|
||||
make changes to pcre.in. */
|
||||
|
||||
#define PCRE_MAJOR @PCRE_MAJOR@
|
||||
#define PCRE_MINOR @PCRE_MINOR@
|
||||
#define PCRE_DATE @PCRE_DATE@
|
||||
|
||||
/* Win32 uses DLL by default */
|
||||
|
||||
#ifdef _WIN32
|
||||
# ifdef PCRE_DEFINITION
|
||||
# ifdef DLL_EXPORT
|
||||
# define PCRE_DATA_SCOPE __declspec(dllexport)
|
||||
# endif
|
||||
# else
|
||||
# ifndef PCRE_STATIC
|
||||
# define PCRE_DATA_SCOPE extern __declspec(dllimport)
|
||||
# endif
|
||||
# endif
|
||||
#endif
|
||||
#ifndef PCRE_DATA_SCOPE
|
||||
# define PCRE_DATA_SCOPE extern
|
||||
#endif
|
||||
|
||||
/* Have to include stdlib.h in order to ensure that size_t is defined;
|
||||
it is needed here for malloc. */
|
||||
|
||||
#include <stdlib.h>
|
||||
|
||||
/* Allow for C++ users */
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
/* Options */
|
||||
|
||||
#define PCRE_CASELESS 0x0001
|
||||
#define PCRE_MULTILINE 0x0002
|
||||
#define PCRE_DOTALL 0x0004
|
||||
#define PCRE_EXTENDED 0x0008
|
||||
#define PCRE_ANCHORED 0x0010
|
||||
#define PCRE_DOLLAR_ENDONLY 0x0020
|
||||
#define PCRE_EXTRA 0x0040
|
||||
#define PCRE_NOTBOL 0x0080
|
||||
#define PCRE_NOTEOL 0x0100
|
||||
#define PCRE_UNGREEDY 0x0200
|
||||
#define PCRE_NOTEMPTY 0x0400
|
||||
#define PCRE_UTF8 0x0800
|
||||
#define PCRE_NO_AUTO_CAPTURE 0x1000
|
||||
#define PCRE_NO_UTF8_CHECK 0x2000
|
||||
|
||||
/* Exec-time and get/set-time error codes */
|
||||
|
||||
#define PCRE_ERROR_NOMATCH (-1)
|
||||
#define PCRE_ERROR_NULL (-2)
|
||||
#define PCRE_ERROR_BADOPTION (-3)
|
||||
#define PCRE_ERROR_BADMAGIC (-4)
|
||||
#define PCRE_ERROR_UNKNOWN_NODE (-5)
|
||||
#define PCRE_ERROR_NOMEMORY (-6)
|
||||
#define PCRE_ERROR_NOSUBSTRING (-7)
|
||||
#define PCRE_ERROR_MATCHLIMIT (-8)
|
||||
#define PCRE_ERROR_CALLOUT (-9) /* Never used by PCRE itself */
|
||||
#define PCRE_ERROR_BADUTF8 (-10)
|
||||
#define PCRE_ERROR_BADUTF8_OFFSET (-11)
|
||||
|
||||
/* Request types for pcre_fullinfo() */
|
||||
|
||||
#define PCRE_INFO_OPTIONS 0
|
||||
#define PCRE_INFO_SIZE 1
|
||||
#define PCRE_INFO_CAPTURECOUNT 2
|
||||
#define PCRE_INFO_BACKREFMAX 3
|
||||
#define PCRE_INFO_FIRSTBYTE 4
|
||||
#define PCRE_INFO_FIRSTCHAR 4 /* For backwards compatibility */
|
||||
#define PCRE_INFO_FIRSTTABLE 5
|
||||
#define PCRE_INFO_LASTLITERAL 6
|
||||
#define PCRE_INFO_NAMEENTRYSIZE 7
|
||||
#define PCRE_INFO_NAMECOUNT 8
|
||||
#define PCRE_INFO_NAMETABLE 9
|
||||
#define PCRE_INFO_STUDYSIZE 10
|
||||
|
||||
/* Request types for pcre_config() */
|
||||
|
||||
#define PCRE_CONFIG_UTF8 0
|
||||
#define PCRE_CONFIG_NEWLINE 1
|
||||
#define PCRE_CONFIG_LINK_SIZE 2
|
||||
#define PCRE_CONFIG_POSIX_MALLOC_THRESHOLD 3
|
||||
#define PCRE_CONFIG_MATCH_LIMIT 4
|
||||
#define PCRE_CONFIG_STACKRECURSE 5
|
||||
|
||||
/* Bit flags for the pcre_extra structure */
|
||||
|
||||
#define PCRE_EXTRA_STUDY_DATA 0x0001
|
||||
#define PCRE_EXTRA_MATCH_LIMIT 0x0002
|
||||
#define PCRE_EXTRA_CALLOUT_DATA 0x0004
|
||||
|
||||
/* Types */
|
||||
|
||||
struct real_pcre; /* declaration; the definition is private */
|
||||
typedef struct real_pcre pcre;
|
||||
|
||||
/* The structure for passing additional data to pcre_exec(). This is defined in
|
||||
such as way as to be extensible. */
|
||||
|
||||
typedef struct pcre_extra {
|
||||
unsigned long int flags; /* Bits for which fields are set */
|
||||
void *study_data; /* Opaque data from pcre_study() */
|
||||
unsigned long int match_limit; /* Maximum number of calls to match() */
|
||||
void *callout_data; /* Data passed back in callouts */
|
||||
} pcre_extra;
|
||||
|
||||
/* The structure for passing out data via the pcre_callout_function. We use a
|
||||
structure so that new fields can be added on the end in future versions,
|
||||
without changing the API of the function, thereby allowing old clients to work
|
||||
without modification. */
|
||||
|
||||
typedef struct pcre_callout_block {
|
||||
int version; /* Identifies version of block */
|
||||
/* ------------------------ Version 0 ------------------------------- */
|
||||
int callout_number; /* Number compiled into pattern */
|
||||
int *offset_vector; /* The offset vector */
|
||||
const char *subject; /* The subject being matched */
|
||||
int subject_length; /* The length of the subject */
|
||||
int start_match; /* Offset to start of this match attempt */
|
||||
int current_position; /* Where we currently are */
|
||||
int capture_top; /* Max current capture */
|
||||
int capture_last; /* Most recently closed capture */
|
||||
void *callout_data; /* Data passed in with the call */
|
||||
/* ------------------------------------------------------------------ */
|
||||
} pcre_callout_block;
|
||||
|
||||
/* Indirection for store get and free functions. These can be set to
|
||||
alternative malloc/free functions if required. Special ones are used in the
|
||||
non-recursive case for "frames". There is also an optional callout function
|
||||
that is triggered by the (?) regex item. Some magic is required for Win32 DLL;
|
||||
it is null on other OS. For Virtual Pascal, these have to be different again.
|
||||
*/
|
||||
|
||||
#ifndef VPCOMPAT
|
||||
PCRE_DATA_SCOPE void *(*pcre_malloc)(size_t);
|
||||
PCRE_DATA_SCOPE void (*pcre_free)(void *);
|
||||
PCRE_DATA_SCOPE void *(*pcre_stack_malloc)(size_t);
|
||||
PCRE_DATA_SCOPE void (*pcre_stack_free)(void *);
|
||||
PCRE_DATA_SCOPE int (*pcre_callout)(pcre_callout_block *);
|
||||
#else /* VPCOMPAT */
|
||||
extern void *pcre_malloc(size_t);
|
||||
extern void pcre_free(void *);
|
||||
extern void *pcre_stack_malloc(size_t);
|
||||
extern void pcre_stack_free(void *);
|
||||
extern int pcre_callout(pcre_callout_block *);
|
||||
#endif /* VPCOMPAT */
|
||||
|
||||
/* Exported PCRE functions */
|
||||
|
||||
extern pcre *pcre_compile(const char *, int, const char **,
|
||||
int *, const unsigned char *);
|
||||
extern int pcre_config(int, void *);
|
||||
extern int pcre_copy_named_substring(const pcre *, const char *,
|
||||
int *, int, const char *, char *, int);
|
||||
extern int pcre_copy_substring(const char *, int *, int, int,
|
||||
char *, int);
|
||||
extern int pcre_exec(const pcre *, const pcre_extra *,
|
||||
const char *, int, int, int, int *, int);
|
||||
extern void pcre_free_substring(const char *);
|
||||
extern void pcre_free_substring_list(const char **);
|
||||
extern int pcre_fullinfo(const pcre *, const pcre_extra *, int,
|
||||
void *);
|
||||
extern int pcre_get_named_substring(const pcre *, const char *,
|
||||
int *, int, const char *, const char **);
|
||||
extern int pcre_get_stringnumber(const pcre *, const char *);
|
||||
extern int pcre_get_substring(const char *, int *, int, int,
|
||||
const char **);
|
||||
extern int pcre_get_substring_list(const char *, int *, int,
|
||||
const char ***);
|
||||
extern int pcre_info(const pcre *, int *, int *);
|
||||
extern const unsigned char *pcre_maketables(void);
|
||||
extern pcre_extra *pcre_study(const pcre *, int, const char **);
|
||||
extern const char *pcre_version(void);
|
||||
|
||||
#ifdef __cplusplus
|
||||
} /* extern "C" */
|
||||
#endif
|
||||
|
||||
#endif /* End of pcre.h */
|
3841
external-libs/pcre/testdata/testinput1
vendored
3841
external-libs/pcre/testdata/testinput1
vendored
File diff suppressed because it is too large
Load Diff
1259
external-libs/pcre/testdata/testinput2
vendored
1259
external-libs/pcre/testdata/testinput2
vendored
File diff suppressed because it is too large
Load Diff
65
external-libs/pcre/testdata/testinput3
vendored
65
external-libs/pcre/testdata/testinput3
vendored
@ -1,65 +0,0 @@
|
||||
/^[\w]+/
|
||||
*** Failers
|
||||
École
|
||||
|
||||
/^[\w]+/Lfr_FR
|
||||
École
|
||||
|
||||
/^[\w]+/
|
||||
*** Failers
|
||||
École
|
||||
|
||||
/^[\W]+/
|
||||
École
|
||||
|
||||
/^[\W]+/Lfr_FR
|
||||
*** Failers
|
||||
École
|
||||
|
||||
/[\b]/
|
||||
\b
|
||||
*** Failers
|
||||
a
|
||||
|
||||
/[\b]/Lfr_FR
|
||||
\b
|
||||
*** Failers
|
||||
a
|
||||
|
||||
/^\w+/
|
||||
*** Failers
|
||||
École
|
||||
|
||||
/^\w+/Lfr_FR
|
||||
École
|
||||
|
||||
/(.+)\b(.+)/
|
||||
École
|
||||
|
||||
/(.+)\b(.+)/Lfr_FR
|
||||
*** Failers
|
||||
École
|
||||
|
||||
/École/i
|
||||
École
|
||||
*** Failers
|
||||
école
|
||||
|
||||
/École/iLfr_FR
|
||||
École
|
||||
école
|
||||
|
||||
/\w/IS
|
||||
|
||||
/\w/ISLfr_FR
|
||||
|
||||
/^[\xc8-\xc9]/iLfr_FR
|
||||
École
|
||||
école
|
||||
|
||||
/^[\xc8-\xc9]/Lfr_FR
|
||||
École
|
||||
*** Failers
|
||||
école
|
||||
|
||||
/ End of testinput3 /
|
517
external-libs/pcre/testdata/testinput4
vendored
517
external-libs/pcre/testdata/testinput4
vendored
@ -1,517 +0,0 @@
|
||||
/-- Do not use the \x{} construct except with patterns that have the --/
|
||||
/-- /8 option set, because PCRE doesn't recognize them as UTF-8 unless --/
|
||||
/-- that option is set. However, the latest Perls recognize them always. --/
|
||||
|
||||
/a.b/8
|
||||
acb
|
||||
a\x7fb
|
||||
a\x{100}b
|
||||
*** Failers
|
||||
a\nb
|
||||
|
||||
/a(.{3})b/8
|
||||
a\x{4000}xyb
|
||||
a\x{4000}\x7fyb
|
||||
a\x{4000}\x{100}yb
|
||||
*** Failers
|
||||
a\x{4000}b
|
||||
ac\ncb
|
||||
|
||||
/a(.*?)(.)/
|
||||
a\xc0\x88b
|
||||
|
||||
/a(.*?)(.)/8
|
||||
a\x{100}b
|
||||
|
||||
/a(.*)(.)/
|
||||
a\xc0\x88b
|
||||
|
||||
/a(.*)(.)/8
|
||||
a\x{100}b
|
||||
|
||||
/a(.)(.)/
|
||||
a\xc0\x92bcd
|
||||
|
||||
/a(.)(.)/8
|
||||
a\x{240}bcd
|
||||
|
||||
/a(.?)(.)/
|
||||
a\xc0\x92bcd
|
||||
|
||||
/a(.?)(.)/8
|
||||
a\x{240}bcd
|
||||
|
||||
/a(.??)(.)/
|
||||
a\xc0\x92bcd
|
||||
|
||||
/a(.??)(.)/8
|
||||
a\x{240}bcd
|
||||
|
||||
/a(.{3})b/8
|
||||
a\x{1234}xyb
|
||||
a\x{1234}\x{4321}yb
|
||||
a\x{1234}\x{4321}\x{3412}b
|
||||
*** Failers
|
||||
a\x{1234}b
|
||||
ac\ncb
|
||||
|
||||
/a(.{3,})b/8
|
||||
a\x{1234}xyb
|
||||
a\x{1234}\x{4321}yb
|
||||
a\x{1234}\x{4321}\x{3412}b
|
||||
axxxxbcdefghijb
|
||||
a\x{1234}\x{4321}\x{3412}\x{3421}b
|
||||
*** Failers
|
||||
a\x{1234}b
|
||||
|
||||
/a(.{3,}?)b/8
|
||||
a\x{1234}xyb
|
||||
a\x{1234}\x{4321}yb
|
||||
a\x{1234}\x{4321}\x{3412}b
|
||||
axxxxbcdefghijb
|
||||
a\x{1234}\x{4321}\x{3412}\x{3421}b
|
||||
*** Failers
|
||||
a\x{1234}b
|
||||
|
||||
/a(.{3,5})b/8
|
||||
a\x{1234}xyb
|
||||
a\x{1234}\x{4321}yb
|
||||
a\x{1234}\x{4321}\x{3412}b
|
||||
axxxxbcdefghijb
|
||||
a\x{1234}\x{4321}\x{3412}\x{3421}b
|
||||
axbxxbcdefghijb
|
||||
axxxxxbcdefghijb
|
||||
*** Failers
|
||||
a\x{1234}b
|
||||
axxxxxxbcdefghijb
|
||||
|
||||
/a(.{3,5}?)b/8
|
||||
a\x{1234}xyb
|
||||
a\x{1234}\x{4321}yb
|
||||
a\x{1234}\x{4321}\x{3412}b
|
||||
axxxxbcdefghijb
|
||||
a\x{1234}\x{4321}\x{3412}\x{3421}b
|
||||
axbxxbcdefghijb
|
||||
axxxxxbcdefghijb
|
||||
*** Failers
|
||||
a\x{1234}b
|
||||
axxxxxxbcdefghijb
|
||||
|
||||
/^[a\x{c0}]/8
|
||||
*** Failers
|
||||
\x{100}
|
||||
|
||||
/(?<=aXb)cd/8
|
||||
aXbcd
|
||||
|
||||
/(?<=a\x{100}b)cd/8
|
||||
a\x{100}bcd
|
||||
|
||||
/(?<=a\x{100000}b)cd/8
|
||||
a\x{100000}bcd
|
||||
|
||||
/(?:\x{100}){3}b/8
|
||||
\x{100}\x{100}\x{100}b
|
||||
*** Failers
|
||||
\x{100}\x{100}b
|
||||
|
||||
/\x{ab}/8
|
||||
\x{ab}
|
||||
\xc2\xab
|
||||
*** Failers
|
||||
\x00{ab}
|
||||
|
||||
/(?<=(.))X/8
|
||||
WXYZ
|
||||
\x{256}XYZ
|
||||
*** Failers
|
||||
XYZ
|
||||
|
||||
/X(\C{3})/8
|
||||
X\x{1234}
|
||||
|
||||
/X(\C{4})/8
|
||||
X\x{1234}YZ
|
||||
|
||||
/X\C*/8
|
||||
XYZabcdce
|
||||
|
||||
/X\C*?/8
|
||||
XYZabcde
|
||||
|
||||
/X\C{3,5}/8
|
||||
Xabcdefg
|
||||
X\x{1234}
|
||||
X\x{1234}YZ
|
||||
X\x{1234}\x{512}
|
||||
X\x{1234}\x{512}YZ
|
||||
|
||||
/X\C{3,5}?/8
|
||||
Xabcdefg
|
||||
X\x{1234}
|
||||
X\x{1234}YZ
|
||||
X\x{1234}\x{512}
|
||||
|
||||
/[^a]+/8g
|
||||
bcd
|
||||
\x{100}aY\x{256}Z
|
||||
|
||||
/^[^a]{2}/8
|
||||
\x{100}bc
|
||||
|
||||
/^[^a]{2,}/8
|
||||
\x{100}bcAa
|
||||
|
||||
/^[^a]{2,}?/8
|
||||
\x{100}bca
|
||||
|
||||
/[^a]+/8ig
|
||||
bcd
|
||||
\x{100}aY\x{256}Z
|
||||
|
||||
/^[^a]{2}/8i
|
||||
\x{100}bc
|
||||
|
||||
/^[^a]{2,}/8i
|
||||
\x{100}bcAa
|
||||
|
||||
/^[^a]{2,}?/8i
|
||||
\x{100}bca
|
||||
|
||||
/\x{100}{0,0}/8
|
||||
abcd
|
||||
|
||||
/\x{100}?/8
|
||||
abcd
|
||||
\x{100}\x{100}
|
||||
|
||||
/\x{100}{0,3}/8
|
||||
\x{100}\x{100}
|
||||
\x{100}\x{100}\x{100}\x{100}
|
||||
|
||||
/\x{100}*/8
|
||||
abce
|
||||
\x{100}\x{100}\x{100}\x{100}
|
||||
|
||||
/\x{100}{1,1}/8
|
||||
abcd\x{100}\x{100}\x{100}\x{100}
|
||||
|
||||
/\x{100}{1,3}/8
|
||||
abcd\x{100}\x{100}\x{100}\x{100}
|
||||
|
||||
/\x{100}+/8
|
||||
abcd\x{100}\x{100}\x{100}\x{100}
|
||||
|
||||
/\x{100}{3}/8
|
||||
abcd\x{100}\x{100}\x{100}XX
|
||||
|
||||
/\x{100}{3,5}/8
|
||||
abcd\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}XX
|
||||
|
||||
/\x{100}{3,}/8
|
||||
abcd\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}XX
|
||||
|
||||
/(?<=a\x{100}{2}b)X/8+
|
||||
Xyyya\x{100}\x{100}bXzzz
|
||||
|
||||
/\D*/8
|
||||
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
|
||||
|
||||
/\D*/8
|
||||
\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}
|
||||
|
||||
/\D/8
|
||||
1X2
|
||||
1\x{100}2
|
||||
|
||||
/>\S/8
|
||||
> >X Y
|
||||
> >\x{100} Y
|
||||
|
||||
/\W/8
|
||||
A.B
|
||||
A\x{100}B
|
||||
|
||||
/\d/8
|
||||
\x{100}3
|
||||
|
||||
/\s/8
|
||||
\x{100} X
|
||||
|
||||
/\w/8
|
||||
\x{100}X
|
||||
|
||||
/\D+/8
|
||||
12abcd34
|
||||
*** Failers
|
||||
1234
|
||||
|
||||
/\D{2,3}/8
|
||||
12abcd34
|
||||
12ab34
|
||||
*** Failers
|
||||
1234
|
||||
12a34
|
||||
|
||||
/\D{2,3}?/8
|
||||
12abcd34
|
||||
12ab34
|
||||
*** Failers
|
||||
1234
|
||||
12a34
|
||||
|
||||
/\d+/8
|
||||
12abcd34
|
||||
*** Failers
|
||||
|
||||
/\d{2,3}/8
|
||||
12abcd34
|
||||
1234abcd
|
||||
*** Failers
|
||||
1.4
|
||||
|
||||
/\d{2,3}?/8
|
||||
12abcd34
|
||||
1234abcd
|
||||
*** Failers
|
||||
1.4
|
||||
|
||||
/\S+/8
|
||||
12abcd34
|
||||
*** Failers
|
||||
\ \
|
||||
|
||||
/\S{2,3}/8
|
||||
12abcd34
|
||||
1234abcd
|
||||
*** Failers
|
||||
\ \
|
||||
|
||||
/\S{2,3}?/8
|
||||
12abcd34
|
||||
1234abcd
|
||||
*** Failers
|
||||
\ \
|
||||
|
||||
/>\s+</8+
|
||||
12> <34
|
||||
*** Failers
|
||||
|
||||
/>\s{2,3}</8+
|
||||
ab> <cd
|
||||
ab> <ce
|
||||
*** Failers
|
||||
ab> <cd
|
||||
|
||||
/>\s{2,3}?</8+
|
||||
ab> <cd
|
||||
ab> <ce
|
||||
*** Failers
|
||||
ab> <cd
|
||||
|
||||
/\w+/8
|
||||
12 34
|
||||
*** Failers
|
||||
+++=*!
|
||||
|
||||
/\w{2,3}/8
|
||||
ab cd
|
||||
abcd ce
|
||||
*** Failers
|
||||
a.b.c
|
||||
|
||||
/\w{2,3}?/8
|
||||
ab cd
|
||||
abcd ce
|
||||
*** Failers
|
||||
a.b.c
|
||||
|
||||
/\W+/8
|
||||
12====34
|
||||
*** Failers
|
||||
abcd
|
||||
|
||||
/\W{2,3}/8
|
||||
ab====cd
|
||||
ab==cd
|
||||
*** Failers
|
||||
a.b.c
|
||||
|
||||
/\W{2,3}?/8
|
||||
ab====cd
|
||||
ab==cd
|
||||
*** Failers
|
||||
a.b.c
|
||||
|
||||
/[\x{100}]/8
|
||||
\x{100}
|
||||
Z\x{100}
|
||||
\x{100}Z
|
||||
*** Failers
|
||||
|
||||
/[Z\x{100}]/8
|
||||
Z\x{100}
|
||||
\x{100}
|
||||
\x{100}Z
|
||||
*** Failers
|
||||
|
||||
/[\x{100}\x{200}]/8
|
||||
ab\x{100}cd
|
||||
ab\x{200}cd
|
||||
*** Failers
|
||||
|
||||
/[\x{100}-\x{200}]/8
|
||||
ab\x{100}cd
|
||||
ab\x{200}cd
|
||||
ab\x{111}cd
|
||||
*** Failers
|
||||
|
||||
/[z-\x{200}]/8
|
||||
ab\x{100}cd
|
||||
ab\x{200}cd
|
||||
ab\x{111}cd
|
||||
abzcd
|
||||
ab|cd
|
||||
*** Failers
|
||||
|
||||
/[Q\x{100}\x{200}]/8
|
||||
ab\x{100}cd
|
||||
ab\x{200}cd
|
||||
Q?
|
||||
*** Failers
|
||||
|
||||
/[Q\x{100}-\x{200}]/8
|
||||
ab\x{100}cd
|
||||
ab\x{200}cd
|
||||
ab\x{111}cd
|
||||
Q?
|
||||
*** Failers
|
||||
|
||||
/[Qz-\x{200}]/8
|
||||
ab\x{100}cd
|
||||
ab\x{200}cd
|
||||
ab\x{111}cd
|
||||
abzcd
|
||||
ab|cd
|
||||
Q?
|
||||
*** Failers
|
||||
|
||||
/[\x{100}\x{200}]{1,3}/8
|
||||
ab\x{100}cd
|
||||
ab\x{200}cd
|
||||
ab\x{200}\x{100}\x{200}\x{100}cd
|
||||
*** Failers
|
||||
|
||||
/[\x{100}\x{200}]{1,3}?/8
|
||||
ab\x{100}cd
|
||||
ab\x{200}cd
|
||||
ab\x{200}\x{100}\x{200}\x{100}cd
|
||||
*** Failers
|
||||
|
||||
/[Q\x{100}\x{200}]{1,3}/8
|
||||
ab\x{100}cd
|
||||
ab\x{200}cd
|
||||
ab\x{200}\x{100}\x{200}\x{100}cd
|
||||
*** Failers
|
||||
|
||||
/[Q\x{100}\x{200}]{1,3}?/8
|
||||
ab\x{100}cd
|
||||
ab\x{200}cd
|
||||
ab\x{200}\x{100}\x{200}\x{100}cd
|
||||
*** Failers
|
||||
|
||||
/(?<=[\x{100}\x{200}])X/8
|
||||
abc\x{200}X
|
||||
abc\x{100}X
|
||||
*** Failers
|
||||
X
|
||||
|
||||
/(?<=[Q\x{100}\x{200}])X/8
|
||||
abc\x{200}X
|
||||
abc\x{100}X
|
||||
abQX
|
||||
*** Failers
|
||||
X
|
||||
|
||||
/(?<=[\x{100}\x{200}]{3})X/8
|
||||
abc\x{100}\x{200}\x{100}X
|
||||
*** Failers
|
||||
abc\x{200}X
|
||||
X
|
||||
|
||||
/[^\x{100}\x{200}]X/8
|
||||
AX
|
||||
\x{150}X
|
||||
\x{500}X
|
||||
*** Failers
|
||||
\x{100}X
|
||||
\x{200}X
|
||||
|
||||
/[^Q\x{100}\x{200}]X/8
|
||||
AX
|
||||
\x{150}X
|
||||
\x{500}X
|
||||
*** Failers
|
||||
\x{100}X
|
||||
\x{200}X
|
||||
QX
|
||||
|
||||
/[^\x{100}-\x{200}]X/8
|
||||
AX
|
||||
\x{500}X
|
||||
*** Failers
|
||||
\x{100}X
|
||||
\x{150}X
|
||||
\x{200}X
|
||||
|
||||
/a\Cb/
|
||||
aXb
|
||||
a\nb
|
||||
|
||||
/a\Cb/8
|
||||
aXb
|
||||
a\nb
|
||||
*** Failers
|
||||
a\x{100}b
|
||||
|
||||
/[z-\x{100}]/8i
|
||||
z
|
||||
Z
|
||||
\x{100}
|
||||
*** Failers
|
||||
\x{101}
|
||||
y
|
||||
|
||||
/[\xFF]/
|
||||
>\xff<
|
||||
|
||||
/[\xff]/8
|
||||
>\x{ff}<
|
||||
|
||||
/[^\xFF]/
|
||||
XYZ
|
||||
|
||||
/[^\xff]/8
|
||||
XYZ
|
||||
\x{123}
|
||||
|
||||
/^[ac]*b/8
|
||||
xb
|
||||
|
||||
/^[ac\x{100}]*b/8
|
||||
xb
|
||||
|
||||
/^[^x]*b/8i
|
||||
xb
|
||||
|
||||
/^[^x]*b/8
|
||||
xb
|
||||
|
||||
/^\d*b/8
|
||||
xb
|
||||
|
||||
/(|a)/g8
|
||||
catac
|
||||
a\x{256}a
|
||||
|
||||
/ End of testinput4 /
|
258
external-libs/pcre/testdata/testinput5
vendored
258
external-libs/pcre/testdata/testinput5
vendored
@ -1,258 +0,0 @@
|
||||
/\x{100}/8DM
|
||||
|
||||
/\x{1000}/8DM
|
||||
|
||||
/\x{10000}/8DM
|
||||
|
||||
/\x{100000}/8DM
|
||||
|
||||
/\x{1000000}/8DM
|
||||
|
||||
/\x{4000000}/8DM
|
||||
|
||||
/\x{7fffFFFF}/8DM
|
||||
|
||||
/[\x{ff}]/8DM
|
||||
|
||||
/[\x{100}]/8DM
|
||||
|
||||
/\x{ffffffff}/8
|
||||
|
||||
/\x{100000000}/8
|
||||
|
||||
/^\x{100}a\x{1234}/8
|
||||
\x{100}a\x{1234}bcd
|
||||
|
||||
/\x80/8D
|
||||
|
||||
/\xff/8D
|
||||
|
||||
/\x{0041}\x{2262}\x{0391}\x{002e}/D8
|
||||
\x{0041}\x{2262}\x{0391}\x{002e}
|
||||
|
||||
/\x{D55c}\x{ad6d}\x{C5B4}/D8
|
||||
\x{D55c}\x{ad6d}\x{C5B4}
|
||||
|
||||
/\x{65e5}\x{672c}\x{8a9e}/D8
|
||||
\x{65e5}\x{672c}\x{8a9e}
|
||||
|
||||
/\x{80}/D8
|
||||
|
||||
/\x{084}/D8
|
||||
|
||||
/\x{104}/D8
|
||||
|
||||
/\x{861}/D8
|
||||
|
||||
/\x{212ab}/D8
|
||||
|
||||
/.{3,5}X/D8
|
||||
\x{212ab}\x{212ab}\x{212ab}\x{861}X
|
||||
|
||||
|
||||
/.{3,5}?/D8
|
||||
\x{212ab}\x{212ab}\x{212ab}\x{861}
|
||||
|
||||
/-- These tests are here rather than in testinput4 because Perl 5.6 has --/
|
||||
/-- some problems with UTF-8 support, in the area of \x{..} where the --/
|
||||
/-- value is < 255. It grumbles about invalid UTF-8 strings. --/
|
||||
|
||||
/^[a\x{c0}]b/8
|
||||
\x{c0}b
|
||||
|
||||
/^([a\x{c0}]*?)aa/8
|
||||
a\x{c0}aaaa/
|
||||
|
||||
/^([a\x{c0}]*?)aa/8
|
||||
a\x{c0}aaaa/
|
||||
a\x{c0}a\x{c0}aaa/
|
||||
|
||||
/^([a\x{c0}]*)aa/8
|
||||
a\x{c0}aaaa/
|
||||
a\x{c0}a\x{c0}aaa/
|
||||
|
||||
/^([a\x{c0}]*)a\x{c0}/8
|
||||
a\x{c0}aaaa/
|
||||
a\x{c0}a\x{c0}aaa/
|
||||
|
||||
/-- --/
|
||||
|
||||
/(?<=\C)X/8
|
||||
Should produce an error diagnostic
|
||||
|
||||
/-- This one is here not because it's different to Perl, but because the --/
|
||||
/-- way the captured single-byte is displayed. (In Perl it becomes a --/
|
||||
/-- character, and you can't tell the difference.) --/
|
||||
|
||||
/X(\C)(.*)/8
|
||||
X\x{1234}
|
||||
X\nabc
|
||||
|
||||
/^[ab]/8D
|
||||
bar
|
||||
*** Failers
|
||||
c
|
||||
\x{ff}
|
||||
\x{100}
|
||||
|
||||
/^[^ab]/8D
|
||||
c
|
||||
\x{ff}
|
||||
\x{100}
|
||||
*** Failers
|
||||
aaa
|
||||
|
||||
/[^ab\xC0-\xF0]/8SD
|
||||
\x{f1}
|
||||
\x{bf}
|
||||
\x{100}
|
||||
\x{1000}
|
||||
*** Failers
|
||||
\x{c0}
|
||||
\x{f0}
|
||||
|
||||
/Ä€{3,4}/8SD
|
||||
\x{100}\x{100}\x{100}\x{100\x{100}
|
||||
|
||||
/(\x{100}+|x)/8SD
|
||||
|
||||
/(\x{100}*a|x)/8SD
|
||||
|
||||
/(\x{100}{0,2}a|x)/8SD
|
||||
|
||||
/(\x{100}{1,2}a|x)/8SD
|
||||
|
||||
/\x{100}*(\d+|"(?1)")/8
|
||||
1234
|
||||
"1234"
|
||||
\x{100}1234
|
||||
"\x{100}1234"
|
||||
\x{100}\x{100}12ab
|
||||
\x{100}\x{100}"12"
|
||||
*** Failers
|
||||
\x{100}\x{100}abcd
|
||||
|
||||
/\x{100}/8D
|
||||
|
||||
/\x{100}*/8D
|
||||
|
||||
/a\x{100}*/8D
|
||||
|
||||
/ab\x{100}*/8D
|
||||
|
||||
/a\x{100}\x{101}*/8D
|
||||
|
||||
/a\x{100}\x{101}+/8D
|
||||
|
||||
/\x{100}*A/8D
|
||||
A
|
||||
|
||||
/\x{100}*\d(?R)/8D
|
||||
|
||||
/[^\x{c4}]/D
|
||||
|
||||
/[^\x{c4}]/8D
|
||||
|
||||
/[\x{100}]/8DM
|
||||
\x{100}
|
||||
Z\x{100}
|
||||
\x{100}Z
|
||||
*** Failers
|
||||
|
||||
/[Z\x{100}]/8DM
|
||||
Z\x{100}
|
||||
\x{100}
|
||||
\x{100}Z
|
||||
*** Failers
|
||||
|
||||
/[\x{200}-\x{100}]/8
|
||||
|
||||
/[Ä€-Ä„]/8
|
||||
\x{100}
|
||||
\x{104}
|
||||
*** Failers
|
||||
\x{105}
|
||||
\x{ff}
|
||||
|
||||
/[z-\x{100}]/8D
|
||||
|
||||
/[z-\x{100}]/8Di
|
||||
|
||||
/[z\Qa-d]Ä€\E]/8D
|
||||
\x{100}
|
||||
Ä€
|
||||
|
||||
/[\xFF]/D
|
||||
>\xff<
|
||||
|
||||
/[\xff]/D8
|
||||
>\x{ff}<
|
||||
|
||||
/[^\xFF]/D
|
||||
|
||||
/[^\xff]/8D
|
||||
|
||||
/[Ä-Ü]/8
|
||||
Ö # Matches without Study
|
||||
\x{d6}
|
||||
|
||||
/[Ä-Ü]/8S
|
||||
Ö <-- Same with Study
|
||||
\x{d6}
|
||||
|
||||
/[\x{c4}-\x{dc}]/8
|
||||
Ö # Matches without Study
|
||||
\x{d6}
|
||||
|
||||
/[\x{c4}-\x{dc}]/8S
|
||||
Ö <-- Same with Study
|
||||
\x{d6}
|
||||
|
||||
/[Ã]/8
|
||||
|
||||
/Ã/8
|
||||
|
||||
/ÃÃÃxxx/8
|
||||
|
||||
/ÃÃÃxxx/8?D
|
||||
|
||||
/abc/8
|
||||
Ã]
|
||||
Ã
|
||||
ÃÃÃ
|
||||
ÃÃÃ\?
|
||||
|
||||
/anything/8
|
||||
\xc0\x80
|
||||
\xc1\x8f
|
||||
\xe0\x9f\x80
|
||||
\xf0\x8f\x80\x80
|
||||
\xf8\x87\x80\x80\x80
|
||||
\xfc\x83\x80\x80\x80\x80
|
||||
\xfe\x80\x80\x80\x80\x80
|
||||
\xff\x80\x80\x80\x80\x80
|
||||
\xc3\x8f
|
||||
\xe0\xaf\x80
|
||||
\xe1\x80\x80
|
||||
\xf0\x9f\x80\x80
|
||||
\xf1\x8f\x80\x80
|
||||
\xf8\x88\x80\x80\x80
|
||||
\xf9\x87\x80\x80\x80
|
||||
\xfc\x84\x80\x80\x80\x80
|
||||
\xfd\x83\x80\x80\x80\x80
|
||||
|
||||
/\x{100}abc(xyz(?1))/8D
|
||||
|
||||
/[^\x{100}]abc(xyz(?1))/8D
|
||||
|
||||
/[ab\x{100}]abc(xyz(?1))/8D
|
||||
|
||||
/(\x{100}(b(?2)c))?/D8
|
||||
|
||||
/(\x{100}(b(?2)c)){0,2}/D8
|
||||
|
||||
/(\x{100}(b(?1)c))?/D8
|
||||
|
||||
/(\x{100}(b(?1)c)){0,2}/D8
|
||||
|
||||
/ End of testinput5 /
|
6274
external-libs/pcre/testdata/testoutput1
vendored
6274
external-libs/pcre/testdata/testoutput1
vendored
File diff suppressed because it is too large
Load Diff
4575
external-libs/pcre/testdata/testoutput2
vendored
4575
external-libs/pcre/testdata/testoutput2
vendored
File diff suppressed because it is too large
Load Diff
115
external-libs/pcre/testdata/testoutput3
vendored
115
external-libs/pcre/testdata/testoutput3
vendored
@ -1,115 +0,0 @@
|
||||
PCRE version 4.5 01-December-2003
|
||||
|
||||
/^[\w]+/
|
||||
*** Failers
|
||||
No match
|
||||
École
|
||||
No match
|
||||
|
||||
/^[\w]+/Lfr_FR
|
||||
École
|
||||
0: École
|
||||
|
||||
/^[\w]+/
|
||||
*** Failers
|
||||
No match
|
||||
École
|
||||
No match
|
||||
|
||||
/^[\W]+/
|
||||
École
|
||||
0: \xc9
|
||||
|
||||
/^[\W]+/Lfr_FR
|
||||
*** Failers
|
||||
0: ***
|
||||
École
|
||||
No match
|
||||
|
||||
/[\b]/
|
||||
\b
|
||||
0: \x08
|
||||
*** Failers
|
||||
No match
|
||||
a
|
||||
No match
|
||||
|
||||
/[\b]/Lfr_FR
|
||||
\b
|
||||
0: \x08
|
||||
*** Failers
|
||||
No match
|
||||
a
|
||||
No match
|
||||
|
||||
/^\w+/
|
||||
*** Failers
|
||||
No match
|
||||
École
|
||||
No match
|
||||
|
||||
/^\w+/Lfr_FR
|
||||
École
|
||||
0: École
|
||||
|
||||
/(.+)\b(.+)/
|
||||
École
|
||||
0: \xc9cole
|
||||
1: \xc9
|
||||
2: cole
|
||||
|
||||
/(.+)\b(.+)/Lfr_FR
|
||||
*** Failers
|
||||
0: *** Failers
|
||||
1: ***
|
||||
2: Failers
|
||||
École
|
||||
No match
|
||||
|
||||
/École/i
|
||||
École
|
||||
0: \xc9cole
|
||||
*** Failers
|
||||
No match
|
||||
école
|
||||
No match
|
||||
|
||||
/École/iLfr_FR
|
||||
École
|
||||
0: École
|
||||
école
|
||||
0: école
|
||||
|
||||
/\w/IS
|
||||
Capturing subpattern count = 0
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
Starting character set: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
|
||||
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
|
||||
|
||||
/\w/ISLfr_FR
|
||||
Capturing subpattern count = 0
|
||||
No options
|
||||
No first char
|
||||
No need char
|
||||
Starting character set: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
|
||||
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
|
||||
µ À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ğ Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü İ Ş ß à á â ã ä
|
||||
å æ ç è é ê ë ì í î ï ğ ñ ò ó ô õ ö ø ù ú û ü ı ş ÿ
|
||||
|
||||
/^[\xc8-\xc9]/iLfr_FR
|
||||
École
|
||||
0: É
|
||||
école
|
||||
0: é
|
||||
|
||||
/^[\xc8-\xc9]/Lfr_FR
|
||||
École
|
||||
0: É
|
||||
*** Failers
|
||||
No match
|
||||
école
|
||||
No match
|
||||
|
||||
/ End of testinput3 /
|
909
external-libs/pcre/testdata/testoutput4
vendored
909
external-libs/pcre/testdata/testoutput4
vendored
@ -1,909 +0,0 @@
|
||||
PCRE version 4.5 01-December-2003
|
||||
|
||||
/-- Do not use the \x{} construct except with patterns that have the --/
|
||||
/-- /8 option set, because PCRE doesn't recognize them as UTF-8 unless --/
|
||||
No match
|
||||
/-- that option is set. However, the latest Perls recognize them always. --/
|
||||
No match
|
||||
|
||||
/a.b/8
|
||||
acb
|
||||
0: acb
|
||||
a\x7fb
|
||||
0: a\x{7f}b
|
||||
a\x{100}b
|
||||
0: a\x{100}b
|
||||
*** Failers
|
||||
No match
|
||||
a\nb
|
||||
No match
|
||||
|
||||
/a(.{3})b/8
|
||||
a\x{4000}xyb
|
||||
0: a\x{4000}xyb
|
||||
1: \x{4000}xy
|
||||
a\x{4000}\x7fyb
|
||||
0: a\x{4000}\x{7f}yb
|
||||
1: \x{4000}\x{7f}y
|
||||
a\x{4000}\x{100}yb
|
||||
0: a\x{4000}\x{100}yb
|
||||
1: \x{4000}\x{100}y
|
||||
*** Failers
|
||||
No match
|
||||
a\x{4000}b
|
||||
No match
|
||||
ac\ncb
|
||||
No match
|
||||
|
||||
/a(.*?)(.)/
|
||||
a\xc0\x88b
|
||||
0: a\xc0
|
||||
1:
|
||||
2: \xc0
|
||||
|
||||
/a(.*?)(.)/8
|
||||
a\x{100}b
|
||||
0: a\x{100}
|
||||
1:
|
||||
2: \x{100}
|
||||
|
||||
/a(.*)(.)/
|
||||
a\xc0\x88b
|
||||
0: a\xc0\x88b
|
||||
1: \xc0\x88
|
||||
2: b
|
||||
|
||||
/a(.*)(.)/8
|
||||
a\x{100}b
|
||||
0: a\x{100}b
|
||||
1: \x{100}
|
||||
2: b
|
||||
|
||||
/a(.)(.)/
|
||||
a\xc0\x92bcd
|
||||
0: a\xc0\x92
|
||||
1: \xc0
|
||||
2: \x92
|
||||
|
||||
/a(.)(.)/8
|
||||
a\x{240}bcd
|
||||
0: a\x{240}b
|
||||
1: \x{240}
|
||||
2: b
|
||||
|
||||
/a(.?)(.)/
|
||||
a\xc0\x92bcd
|
||||
0: a\xc0\x92
|
||||
1: \xc0
|
||||
2: \x92
|
||||
|
||||
/a(.?)(.)/8
|
||||
a\x{240}bcd
|
||||
0: a\x{240}b
|
||||
1: \x{240}
|
||||
2: b
|
||||
|
||||
/a(.??)(.)/
|
||||
a\xc0\x92bcd
|
||||
0: a\xc0
|
||||
1:
|
||||
2: \xc0
|
||||
|
||||
/a(.??)(.)/8
|
||||
a\x{240}bcd
|
||||
0: a\x{240}
|
||||
1:
|
||||
2: \x{240}
|
||||
|
||||
/a(.{3})b/8
|
||||
a\x{1234}xyb
|
||||
0: a\x{1234}xyb
|
||||
1: \x{1234}xy
|
||||
a\x{1234}\x{4321}yb
|
||||
0: a\x{1234}\x{4321}yb
|
||||
1: \x{1234}\x{4321}y
|
||||
a\x{1234}\x{4321}\x{3412}b
|
||||
0: a\x{1234}\x{4321}\x{3412}b
|
||||
1: \x{1234}\x{4321}\x{3412}
|
||||
*** Failers
|
||||
No match
|
||||
a\x{1234}b
|
||||
No match
|
||||
ac\ncb
|
||||
No match
|
||||
|
||||
/a(.{3,})b/8
|
||||
a\x{1234}xyb
|
||||
0: a\x{1234}xyb
|
||||
1: \x{1234}xy
|
||||
a\x{1234}\x{4321}yb
|
||||
0: a\x{1234}\x{4321}yb
|
||||
1: \x{1234}\x{4321}y
|
||||
a\x{1234}\x{4321}\x{3412}b
|
||||
0: a\x{1234}\x{4321}\x{3412}b
|
||||
1: \x{1234}\x{4321}\x{3412}
|
||||
axxxxbcdefghijb
|
||||
0: axxxxbcdefghijb
|
||||
1: xxxxbcdefghij
|
||||
a\x{1234}\x{4321}\x{3412}\x{3421}b
|
||||
0: a\x{1234}\x{4321}\x{3412}\x{3421}b
|
||||
1: \x{1234}\x{4321}\x{3412}\x{3421}
|
||||
*** Failers
|
||||
No match
|
||||
a\x{1234}b
|
||||
No match
|
||||
|
||||
/a(.{3,}?)b/8
|
||||
a\x{1234}xyb
|
||||
0: a\x{1234}xyb
|
||||
1: \x{1234}xy
|
||||
a\x{1234}\x{4321}yb
|
||||
0: a\x{1234}\x{4321}yb
|
||||
1: \x{1234}\x{4321}y
|
||||
a\x{1234}\x{4321}\x{3412}b
|
||||
0: a\x{1234}\x{4321}\x{3412}b
|
||||
1: \x{1234}\x{4321}\x{3412}
|
||||
axxxxbcdefghijb
|
||||
0: axxxxb
|
||||
1: xxxx
|
||||
a\x{1234}\x{4321}\x{3412}\x{3421}b
|
||||
0: a\x{1234}\x{4321}\x{3412}\x{3421}b
|
||||
1: \x{1234}\x{4321}\x{3412}\x{3421}
|
||||
*** Failers
|
||||
No match
|
||||
a\x{1234}b
|
||||
No match
|
||||
|
||||
/a(.{3,5})b/8
|
||||
a\x{1234}xyb
|
||||
0: a\x{1234}xyb
|
||||
1: \x{1234}xy
|
||||
a\x{1234}\x{4321}yb
|
||||
0: a\x{1234}\x{4321}yb
|
||||
1: \x{1234}\x{4321}y
|
||||
a\x{1234}\x{4321}\x{3412}b
|
||||
0: a\x{1234}\x{4321}\x{3412}b
|
||||
1: \x{1234}\x{4321}\x{3412}
|
||||
axxxxbcdefghijb
|
||||
0: axxxxb
|
||||
1: xxxx
|
||||
a\x{1234}\x{4321}\x{3412}\x{3421}b
|
||||
0: a\x{1234}\x{4321}\x{3412}\x{3421}b
|
||||
1: \x{1234}\x{4321}\x{3412}\x{3421}
|
||||
axbxxbcdefghijb
|
||||
0: axbxxb
|
||||
1: xbxx
|
||||
axxxxxbcdefghijb
|
||||
0: axxxxxb
|
||||
1: xxxxx
|
||||
*** Failers
|
||||
No match
|
||||
a\x{1234}b
|
||||
No match
|
||||
axxxxxxbcdefghijb
|
||||
No match
|
||||
|
||||
/a(.{3,5}?)b/8
|
||||
a\x{1234}xyb
|
||||
0: a\x{1234}xyb
|
||||
1: \x{1234}xy
|
||||
a\x{1234}\x{4321}yb
|
||||
0: a\x{1234}\x{4321}yb
|
||||
1: \x{1234}\x{4321}y
|
||||
a\x{1234}\x{4321}\x{3412}b
|
||||
0: a\x{1234}\x{4321}\x{3412}b
|
||||
1: \x{1234}\x{4321}\x{3412}
|
||||
axxxxbcdefghijb
|
||||
0: axxxxb
|
||||
1: xxxx
|
||||
a\x{1234}\x{4321}\x{3412}\x{3421}b
|
||||
0: a\x{1234}\x{4321}\x{3412}\x{3421}b
|
||||
1: \x{1234}\x{4321}\x{3412}\x{3421}
|
||||
axbxxbcdefghijb
|
||||
0: axbxxb
|
||||
1: xbxx
|
||||
axxxxxbcdefghijb
|
||||
0: axxxxxb
|
||||
1: xxxxx
|
||||
*** Failers
|
||||
No match
|
||||
a\x{1234}b
|
||||
No match
|
||||
axxxxxxbcdefghijb
|
||||
No match
|
||||
|
||||
/^[a\x{c0}]/8
|
||||
*** Failers
|
||||
No match
|
||||
\x{100}
|
||||
No match
|
||||
|
||||
/(?<=aXb)cd/8
|
||||
aXbcd
|
||||
0: cd
|
||||
|
||||
/(?<=a\x{100}b)cd/8
|
||||
a\x{100}bcd
|
||||
0: cd
|
||||
|
||||
/(?<=a\x{100000}b)cd/8
|
||||
a\x{100000}bcd
|
||||
0: cd
|
||||
|
||||
/(?:\x{100}){3}b/8
|
||||
\x{100}\x{100}\x{100}b
|
||||
0: \x{100}\x{100}\x{100}b
|
||||
*** Failers
|
||||
No match
|
||||
\x{100}\x{100}b
|
||||
No match
|
||||
|
||||
/\x{ab}/8
|
||||
\x{ab}
|
||||
0: \x{ab}
|
||||
\xc2\xab
|
||||
0: \x{ab}
|
||||
*** Failers
|
||||
No match
|
||||
\x00{ab}
|
||||
No match
|
||||
|
||||
/(?<=(.))X/8
|
||||
WXYZ
|
||||
0: X
|
||||
1: W
|
||||
\x{256}XYZ
|
||||
0: X
|
||||
1: \x{256}
|
||||
*** Failers
|
||||
No match
|
||||
XYZ
|
||||
No match
|
||||
|
||||
/X(\C{3})/8
|
||||
X\x{1234}
|
||||
0: X\x{1234}
|
||||
1: \x{1234}
|
||||
|
||||
/X(\C{4})/8
|
||||
X\x{1234}YZ
|
||||
0: X\x{1234}Y
|
||||
1: \x{1234}Y
|
||||
|
||||
/X\C*/8
|
||||
XYZabcdce
|
||||
0: XYZabcdce
|
||||
|
||||
/X\C*?/8
|
||||
XYZabcde
|
||||
0: X
|
||||
|
||||
/X\C{3,5}/8
|
||||
Xabcdefg
|
||||
0: Xabcde
|
||||
X\x{1234}
|
||||
0: X\x{1234}
|
||||
X\x{1234}YZ
|
||||
0: X\x{1234}YZ
|
||||
X\x{1234}\x{512}
|
||||
0: X\x{1234}\x{512}
|
||||
X\x{1234}\x{512}YZ
|
||||
0: X\x{1234}\x{512}
|
||||
|
||||
/X\C{3,5}?/8
|
||||
Xabcdefg
|
||||
0: Xabc
|
||||
X\x{1234}
|
||||
0: X\x{1234}
|
||||
X\x{1234}YZ
|
||||
0: X\x{1234}
|
||||
X\x{1234}\x{512}
|
||||
0: X\x{1234}
|
||||
|
||||
/[^a]+/8g
|
||||
bcd
|
||||
0: bcd
|
||||
\x{100}aY\x{256}Z
|
||||
0: \x{100}
|
||||
0: Y\x{256}Z
|
||||
|
||||
/^[^a]{2}/8
|
||||
\x{100}bc
|
||||
0: \x{100}b
|
||||
|
||||
/^[^a]{2,}/8
|
||||
\x{100}bcAa
|
||||
0: \x{100}bcA
|
||||
|
||||
/^[^a]{2,}?/8
|
||||
\x{100}bca
|
||||
0: \x{100}b
|
||||
|
||||
/[^a]+/8ig
|
||||
bcd
|
||||
0: bcd
|
||||
\x{100}aY\x{256}Z
|
||||
0: \x{100}
|
||||
0: Y\x{256}Z
|
||||
|
||||
/^[^a]{2}/8i
|
||||
\x{100}bc
|
||||
0: \x{100}b
|
||||
|
||||
/^[^a]{2,}/8i
|
||||
\x{100}bcAa
|
||||
0: \x{100}bc
|
||||
|
||||
/^[^a]{2,}?/8i
|
||||
\x{100}bca
|
||||
0: \x{100}b
|
||||
|
||||
/\x{100}{0,0}/8
|
||||
abcd
|
||||
0:
|
||||
|
||||
/\x{100}?/8
|
||||
abcd
|
||||
0:
|
||||
\x{100}\x{100}
|
||||
0: \x{100}
|
||||
|
||||
/\x{100}{0,3}/8
|
||||
\x{100}\x{100}
|
||||
0: \x{100}\x{100}
|
||||
\x{100}\x{100}\x{100}\x{100}
|
||||
0: \x{100}\x{100}\x{100}
|
||||
|
||||
/\x{100}*/8
|
||||
abce
|
||||
0:
|
||||
\x{100}\x{100}\x{100}\x{100}
|
||||
0: \x{100}\x{100}\x{100}\x{100}
|
||||
|
||||
/\x{100}{1,1}/8
|
||||
abcd\x{100}\x{100}\x{100}\x{100}
|
||||
0: \x{100}
|
||||
|
||||
/\x{100}{1,3}/8
|
||||
abcd\x{100}\x{100}\x{100}\x{100}
|
||||
0: \x{100}\x{100}\x{100}
|
||||
|
||||
/\x{100}+/8
|
||||
abcd\x{100}\x{100}\x{100}\x{100}
|
||||
0: \x{100}\x{100}\x{100}\x{100}
|
||||
|
||||
/\x{100}{3}/8
|
||||
abcd\x{100}\x{100}\x{100}XX
|
||||
0: \x{100}\x{100}\x{100}
|
||||
|
||||
/\x{100}{3,5}/8
|
||||
abcd\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}XX
|
||||
0: \x{100}\x{100}\x{100}\x{100}\x{100}
|
||||
|
||||
/\x{100}{3,}/8
|
||||
abcd\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}XX
|
||||
0: \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}
|
||||
|
||||
/(?<=a\x{100}{2}b)X/8+
|
||||
Xyyya\x{100}\x{100}bXzzz
|
||||
0: X
|
||||
0+ zzz
|
||||
|
||||
/\D*/8
|
||||
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
|
||||
0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
|
||||
|
||||
/\D*/8
|
||||
\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}
|
||||
0: \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}
|
||||
|
||||
/\D/8
|
||||
1X2
|
||||
0: X
|
||||
1\x{100}2
|
||||
0: \x{100}
|
||||
|
||||
/>\S/8
|
||||
> >X Y
|
||||
0: >X
|
||||
> >\x{100} Y
|
||||
0: >\x{100}
|
||||
|
||||
/\W/8
|
||||
A.B
|
||||
0: .
|
||||
A\x{100}B
|
||||
0: \x{100}
|
||||
|
||||
/\d/8
|
||||
\x{100}3
|
||||
0: 3
|
||||
|
||||
/\s/8
|
||||
\x{100} X
|
||||
0:
|
||||
|
||||
/\w/8
|
||||
\x{100}X
|
||||
0: X
|
||||
|
||||
/\D+/8
|
||||
12abcd34
|
||||
0: abcd
|
||||
*** Failers
|
||||
0: *** Failers
|
||||
1234
|
||||
No match
|
||||
|
||||
/\D{2,3}/8
|
||||
12abcd34
|
||||
0: abc
|
||||
12ab34
|
||||
0: ab
|
||||
*** Failers
|
||||
0: ***
|
||||
1234
|
||||
No match
|
||||
12a34
|
||||
No match
|
||||
|
||||
/\D{2,3}?/8
|
||||
12abcd34
|
||||
0: ab
|
||||
12ab34
|
||||
0: ab
|
||||
*** Failers
|
||||
0: **
|
||||
1234
|
||||
No match
|
||||
12a34
|
||||
No match
|
||||
|
||||
/\d+/8
|
||||
12abcd34
|
||||
0: 12
|
||||
*** Failers
|
||||
No match
|
||||
|
||||
/\d{2,3}/8
|
||||
12abcd34
|
||||
0: 12
|
||||
1234abcd
|
||||
0: 123
|
||||
*** Failers
|
||||
No match
|
||||
1.4
|
||||
No match
|
||||
|
||||
/\d{2,3}?/8
|
||||
12abcd34
|
||||
0: 12
|
||||
1234abcd
|
||||
0: 12
|
||||
*** Failers
|
||||
No match
|
||||
1.4
|
||||
No match
|
||||
|
||||
/\S+/8
|
||||
12abcd34
|
||||
0: 12abcd34
|
||||
*** Failers
|
||||
0: ***
|
||||
\ \
|
||||
No match
|
||||
|
||||
/\S{2,3}/8
|
||||
12abcd34
|
||||
0: 12a
|
||||
1234abcd
|
||||
0: 123
|
||||
*** Failers
|
||||
0: ***
|
||||
\ \
|
||||
No match
|
||||
|
||||
/\S{2,3}?/8
|
||||
12abcd34
|
||||
0: 12
|
||||
1234abcd
|
||||
0: 12
|
||||
*** Failers
|
||||
0: **
|
||||
\ \
|
||||
No match
|
||||
|
||||
/>\s+</8+
|
||||
12> <34
|
||||
0: > <
|
||||
0+ 34
|
||||
*** Failers
|
||||
No match
|
||||
|
||||
/>\s{2,3}</8+
|
||||
ab> <cd
|
||||
0: > <
|
||||
0+ cd
|
||||
ab> <ce
|
||||
0: > <
|
||||
0+ ce
|
||||
*** Failers
|
||||
No match
|
||||
ab> <cd
|
||||
No match
|
||||
|
||||
/>\s{2,3}?</8+
|
||||
ab> <cd
|
||||
0: > <
|
||||
0+ cd
|
||||
ab> <ce
|
||||
0: > <
|
||||
0+ ce
|
||||
*** Failers
|
||||
No match
|
||||
ab> <cd
|
||||
No match
|
||||
|
||||
/\w+/8
|
||||
12 34
|
||||
0: 12
|
||||
*** Failers
|
||||
0: Failers
|
||||
+++=*!
|
||||
No match
|
||||
|
||||
/\w{2,3}/8
|
||||
ab cd
|
||||
0: ab
|
||||
abcd ce
|
||||
0: abc
|
||||
*** Failers
|
||||
0: Fai
|
||||
a.b.c
|
||||
No match
|
||||
|
||||
/\w{2,3}?/8
|
||||
ab cd
|
||||
0: ab
|
||||
abcd ce
|
||||
0: ab
|
||||
*** Failers
|
||||
0: Fa
|
||||
a.b.c
|
||||
No match
|
||||
|
||||
/\W+/8
|
||||
12====34
|
||||
0: ====
|
||||
*** Failers
|
||||
0: ***
|
||||
abcd
|
||||
No match
|
||||
|
||||
/\W{2,3}/8
|
||||
ab====cd
|
||||
0: ===
|
||||
ab==cd
|
||||
0: ==
|
||||
*** Failers
|
||||
0: ***
|
||||
a.b.c
|
||||
No match
|
||||
|
||||
/\W{2,3}?/8
|
||||
ab====cd
|
||||
0: ==
|
||||
ab==cd
|
||||
0: ==
|
||||
*** Failers
|
||||
0: **
|
||||
a.b.c
|
||||
No match
|
||||
|
||||
/[\x{100}]/8
|
||||
\x{100}
|
||||
0: \x{100}
|
||||
Z\x{100}
|
||||
0: \x{100}
|
||||
\x{100}Z
|
||||
0: \x{100}
|
||||
*** Failers
|
||||
No match
|
||||
|
||||
/[Z\x{100}]/8
|
||||
Z\x{100}
|
||||
0: Z
|
||||
\x{100}
|
||||
0: \x{100}
|
||||
\x{100}Z
|
||||
0: \x{100}
|
||||
*** Failers
|
||||
No match
|
||||
|
||||
/[\x{100}\x{200}]/8
|
||||
ab\x{100}cd
|
||||
0: \x{100}
|
||||
ab\x{200}cd
|
||||
0: \x{200}
|
||||
*** Failers
|
||||
No match
|
||||
|
||||
/[\x{100}-\x{200}]/8
|
||||
ab\x{100}cd
|
||||
0: \x{100}
|
||||
ab\x{200}cd
|
||||
0: \x{200}
|
||||
ab\x{111}cd
|
||||
0: \x{111}
|
||||
*** Failers
|
||||
No match
|
||||
|
||||
/[z-\x{200}]/8
|
||||
ab\x{100}cd
|
||||
0: \x{100}
|
||||
ab\x{200}cd
|
||||
0: \x{200}
|
||||
ab\x{111}cd
|
||||
0: \x{111}
|
||||
abzcd
|
||||
0: z
|
||||
ab|cd
|
||||
0: |
|
||||
*** Failers
|
||||
No match
|
||||
|
||||
/[Q\x{100}\x{200}]/8
|
||||
ab\x{100}cd
|
||||
0: \x{100}
|
||||
ab\x{200}cd
|
||||
0: \x{200}
|
||||
Q?
|
||||
0: Q
|
||||
*** Failers
|
||||
No match
|
||||
|
||||
/[Q\x{100}-\x{200}]/8
|
||||
ab\x{100}cd
|
||||
0: \x{100}
|
||||
ab\x{200}cd
|
||||
0: \x{200}
|
||||
ab\x{111}cd
|
||||
0: \x{111}
|
||||
Q?
|
||||
0: Q
|
||||
*** Failers
|
||||
No match
|
||||
|
||||
/[Qz-\x{200}]/8
|
||||
ab\x{100}cd
|
||||
0: \x{100}
|
||||
ab\x{200}cd
|
||||
0: \x{200}
|
||||
ab\x{111}cd
|
||||
0: \x{111}
|
||||
abzcd
|
||||
0: z
|
||||
ab|cd
|
||||
0: |
|
||||
Q?
|
||||
0: Q
|
||||
*** Failers
|
||||
No match
|
||||
|
||||
/[\x{100}\x{200}]{1,3}/8
|
||||
ab\x{100}cd
|
||||
0: \x{100}
|
||||
ab\x{200}cd
|
||||
0: \x{200}
|
||||
ab\x{200}\x{100}\x{200}\x{100}cd
|
||||
0: \x{200}\x{100}\x{200}
|
||||
*** Failers
|
||||
No match
|
||||
|
||||
/[\x{100}\x{200}]{1,3}?/8
|
||||
ab\x{100}cd
|
||||
0: \x{100}
|
||||
ab\x{200}cd
|
||||
0: \x{200}
|
||||
ab\x{200}\x{100}\x{200}\x{100}cd
|
||||
0: \x{200}
|
||||
*** Failers
|
||||
No match
|
||||
|
||||
/[Q\x{100}\x{200}]{1,3}/8
|
||||
ab\x{100}cd
|
||||
0: \x{100}
|
||||
ab\x{200}cd
|
||||
0: \x{200}
|
||||
ab\x{200}\x{100}\x{200}\x{100}cd
|
||||
0: \x{200}\x{100}\x{200}
|
||||
*** Failers
|
||||
No match
|
||||
|
||||
/[Q\x{100}\x{200}]{1,3}?/8
|
||||
ab\x{100}cd
|
||||
0: \x{100}
|
||||
ab\x{200}cd
|
||||
0: \x{200}
|
||||
ab\x{200}\x{100}\x{200}\x{100}cd
|
||||
0: \x{200}
|
||||
*** Failers
|
||||
No match
|
||||
|
||||
/(?<=[\x{100}\x{200}])X/8
|
||||
abc\x{200}X
|
||||
0: X
|
||||
abc\x{100}X
|
||||
0: X
|
||||
*** Failers
|
||||
No match
|
||||
X
|
||||
No match
|
||||
|
||||
/(?<=[Q\x{100}\x{200}])X/8
|
||||
abc\x{200}X
|
||||
0: X
|
||||
abc\x{100}X
|
||||
0: X
|
||||
abQX
|
||||
0: X
|
||||
*** Failers
|
||||
No match
|
||||
X
|
||||
No match
|
||||
|
||||
/(?<=[\x{100}\x{200}]{3})X/8
|
||||
abc\x{100}\x{200}\x{100}X
|
||||
0: X
|
||||
*** Failers
|
||||
No match
|
||||
abc\x{200}X
|
||||
No match
|
||||
X
|
||||
No match
|
||||
|
||||
/[^\x{100}\x{200}]X/8
|
||||
AX
|
||||
0: AX
|
||||
\x{150}X
|
||||
0: \x{150}X
|
||||
\x{500}X
|
||||
0: \x{500}X
|
||||
*** Failers
|
||||
No match
|
||||
\x{100}X
|
||||
No match
|
||||
\x{200}X
|
||||
No match
|
||||
|
||||
/[^Q\x{100}\x{200}]X/8
|
||||
AX
|
||||
0: AX
|
||||
\x{150}X
|
||||
0: \x{150}X
|
||||
\x{500}X
|
||||
0: \x{500}X
|
||||
*** Failers
|
||||
No match
|
||||
\x{100}X
|
||||
No match
|
||||
\x{200}X
|
||||
No match
|
||||
QX
|
||||
No match
|
||||
|
||||
/[^\x{100}-\x{200}]X/8
|
||||
AX
|
||||
0: AX
|
||||
\x{500}X
|
||||
0: \x{500}X
|
||||
*** Failers
|
||||
No match
|
||||
\x{100}X
|
||||
No match
|
||||
\x{150}X
|
||||
No match
|
||||
\x{200}X
|
||||
No match
|
||||
|
||||
/a\Cb/
|
||||
aXb
|
||||
0: aXb
|
||||
a\nb
|
||||
0: a\x0ab
|
||||
|
||||
/a\Cb/8
|
||||
aXb
|
||||
0: aXb
|
||||
a\nb
|
||||
0: a\x{0a}b
|
||||
*** Failers
|
||||
No match
|
||||
a\x{100}b
|
||||
No match
|
||||
|
||||
/[z-\x{100}]/8i
|
||||
z
|
||||
0: z
|
||||
Z
|
||||
0: Z
|
||||
\x{100}
|
||||
0: \x{100}
|
||||
*** Failers
|
||||
No match
|
||||
\x{101}
|
||||
No match
|
||||
y
|
||||
No match
|
||||
|
||||
/[\xFF]/
|
||||
>\xff<
|
||||
0: \xff
|
||||
|
||||
/[\xff]/8
|
||||
>\x{ff}<
|
||||
0: \x{ff}
|
||||
|
||||
/[^\xFF]/
|
||||
XYZ
|
||||
0: X
|
||||
|
||||
/[^\xff]/8
|
||||
XYZ
|
||||
0: X
|
||||
\x{123}
|
||||
0: \x{123}
|
||||
|
||||
/^[ac]*b/8
|
||||
xb
|
||||
No match
|
||||
|
||||
/^[ac\x{100}]*b/8
|
||||
xb
|
||||
No match
|
||||
|
||||
/^[^x]*b/8i
|
||||
xb
|
||||
No match
|
||||
|
||||
/^[^x]*b/8
|
||||
xb
|
||||
No match
|
||||
|
||||
/^\d*b/8
|
||||
xb
|
||||
No match
|
||||
|
||||
/(|a)/g8
|
||||
catac
|
||||
0:
|
||||
1:
|
||||
0:
|
||||
1:
|
||||
0: a
|
||||
1: a
|
||||
0:
|
||||
1:
|
||||
0:
|
||||
1:
|
||||
0: a
|
||||
1: a
|
||||
0:
|
||||
1:
|
||||
0:
|
||||
1:
|
||||
a\x{256}a
|
||||
0:
|
||||
1:
|
||||
0: a
|
||||
1: a
|
||||
0:
|
||||
1:
|
||||
0:
|
||||
1:
|
||||
0: a
|
||||
1: a
|
||||
0:
|
||||
1:
|
||||
|
||||
/ End of testinput4 /
|
1063
external-libs/pcre/testdata/testoutput5
vendored
1063
external-libs/pcre/testdata/testoutput5
vendored
File diff suppressed because it is too large
Load Diff
Loading…
x
Reference in New Issue
Block a user