2
1
mirror of https://github.com/qpdf/qpdf.git synced 2025-01-31 02:48:31 +00:00

remove files not needed for building

git-svn-id: svn+q:///qpdf/trunk@767 71b93d88-0707-0410-a8cf-f5a4172ac649
This commit is contained in:
Jay Berkenbilt 2009-10-10 17:41:30 +00:00
parent eb355c60c1
commit f3bf8d3110
89 changed files with 0 additions and 52553 deletions

File diff suppressed because it is too large Load Diff

View File

@ -1,185 +0,0 @@
Basic Installation
==================
These are generic installation instructions that apply to systems that
can run the `configure' shell script - Unix systems and any that imitate
it. They are not specific to PCRE. There are PCRE-specific instructions
for non-Unix systems in the file NON-UNIX-USE.
The `configure' shell script attempts to guess correct values for
various system-dependent variables used during compilation. It uses
those values to create a `Makefile' in each directory of the package.
It may also create one or more `.h' files containing system-dependent
definitions. Finally, it creates a shell script `config.status' that
you can run in the future to recreate the current configuration, a file
`config.cache' that saves the results of its tests to speed up
reconfiguring, and a file `config.log' containing compiler output
(useful mainly for debugging `configure').
If you need to do unusual things to compile the package, please try
to figure out how `configure' could check whether to do them, and mail
diffs or instructions to the address given in the `README' so they can
be considered for the next release. If at some point `config.cache'
contains results you don't want to keep, you may remove or edit it.
The file `configure.in' is used to create `configure' by a program
called `autoconf'. You only need `configure.in' if you want to change
it or regenerate `configure' using a newer version of `autoconf'.
The simplest way to compile this package is:
1. `cd' to the directory containing the package's source code and type
`./configure' to configure the package for your system. If you're
using `csh' on an old version of System V, you might need to type
`sh ./configure' instead to prevent `csh' from trying to execute
`configure' itself.
Running `configure' takes awhile. While running, it prints some
messages telling which features it is checking for.
2. Type `make' to compile the package.
3. Optionally, type `make check' to run any self-tests that come with
the package.
4. Type `make install' to install the programs and any data files and
documentation.
5. You can remove the program binaries and object files from the
source code directory by typing `make clean'. To also remove the
files that `configure' created (so you can compile the package for
a different kind of computer), type `make distclean'. There is
also a `make maintainer-clean' target, but that is intended mainly
for the package's developers. If you use it, you may have to get
all sorts of other programs in order to regenerate files that came
with the distribution.
Compilers and Options
=====================
Some systems require unusual options for compilation or linking that
the `configure' script does not know about. You can give `configure'
initial values for variables by setting them in the environment. Using
a Bourne-compatible shell, you can do that on the command line like
this:
CC=c89 CFLAGS=-O2 LIBS=-lposix ./configure
Or on systems that have the `env' program, you can do it like this:
env CPPFLAGS=-I/usr/local/include LDFLAGS=-s ./configure
Compiling For Multiple Architectures
====================================
You can compile the package for more than one kind of computer at the
same time, by placing the object files for each architecture in their
own directory. To do this, you must use a version of `make' that
supports the `VPATH' variable, such as GNU `make'. `cd' to the
directory where you want the object files and executables to go and run
the `configure' script. `configure' automatically checks for the
source code in the directory that `configure' is in and in `..'.
If you have to use a `make' that does not supports the `VPATH'
variable, you have to compile the package for one architecture at a time
in the source code directory. After you have installed the package for
one architecture, use `make distclean' before reconfiguring for another
architecture.
Installation Names
==================
By default, `make install' will install the package's files in
`/usr/local/bin', `/usr/local/man', etc. You can specify an
installation prefix other than `/usr/local' by giving `configure' the
option `--prefix=PATH'.
You can specify separate installation prefixes for
architecture-specific files and architecture-independent files. If you
give `configure' the option `--exec-prefix=PATH', the package will use
PATH as the prefix for installing programs and libraries.
Documentation and other data files will still use the regular prefix.
In addition, if you use an unusual directory layout you can give
options like `--bindir=PATH' to specify different values for particular
kinds of files. Run `configure --help' for a list of the directories
you can set and what kinds of files go in them.
If the package supports it, you can cause programs to be installed
with an extra prefix or suffix on their names by giving `configure' the
option `--program-prefix=PREFIX' or `--program-suffix=SUFFIX'.
Optional Features
=================
Some packages pay attention to `--enable-FEATURE' options to
`configure', where FEATURE indicates an optional part of the package.
They may also pay attention to `--with-PACKAGE' options, where PACKAGE
is something like `gnu-as' or `x' (for the X Window System). The
`README' should mention any `--enable-' and `--with-' options that the
package recognizes.
For packages that use the X Window System, `configure' can usually
find the X include and library files automatically, but if it doesn't,
you can use the `configure' options `--x-includes=DIR' and
`--x-libraries=DIR' to specify their locations.
Specifying the System Type
==========================
There may be some features `configure' can not figure out
automatically, but needs to determine by the type of host the package
will run on. Usually `configure' can figure that out, but if it prints
a message saying it can not guess the host type, give it the
`--host=TYPE' option. TYPE can either be a short name for the system
type, such as `sun4', or a canonical name with three fields:
CPU-COMPANY-SYSTEM
See the file `config.sub' for the possible values of each field. If
`config.sub' isn't included in this package, then this package doesn't
need to know the host type.
If you are building compiler tools for cross-compiling, you can also
use the `--target=TYPE' option to select the type of system they will
produce code for and the `--build=TYPE' option to select the type of
system on which you are compiling the package.
Sharing Defaults
================
If you want to set default values for `configure' scripts to share,
you can create a site shell script called `config.site' that gives
default values for variables like `CC', `cache_file', and `prefix'.
`configure' looks for `PREFIX/share/config.site' if it exists, then
`PREFIX/etc/config.site' if it exists. Or, you can set the
`CONFIG_SITE' environment variable to the location of the site script.
A warning: not all `configure' scripts look for a site script.
Operation Controls
==================
`configure' recognizes the following options to control how it
operates.
`--cache-file=FILE'
Use and save the results of the tests in FILE instead of
`./config.cache'. Set FILE to `/dev/null' to disable caching, for
debugging `configure'.
`--help'
Print a summary of the options to `configure', and exit.
`--quiet'
`--silent'
`-q'
Do not print messages saying which checks are being made. To
suppress all normal output, redirect it to `/dev/null' (any error
messages will still be shown).
`--srcdir=DIR'
Look for the package's source code in directory DIR. Usually
`configure' can determine that directory automatically.
`--version'
Print the version of Autoconf used to generate the `configure'
script, and exit.
`configure' also accepts some other, not widely useful, options.

View File

@ -1,279 +0,0 @@
# Makefile.in for PCRE (Perl-Compatible Regular Expression) library.
#############################################################################
# PCRE is developed on a Unix system. I do not use Windows or Macs, and know
# nothing about building software on them. Although the code of PCRE should
# be very portable, the building system in this Makefile is designed for Unix
# systems. However, there are features that have been supplied to me by various
# people that should make it work on MinGW and Cygwin systems.
# This setting enables Unix-style directory scanning in pcregrep, triggered
# by the -f option. Maybe one day someone will add code for other systems.
PCREGREP_OSTYPE=-DIS_UNIX
#############################################################################
#---------------------------------------------------------------------------#
# The following lines are modified by "configure" to insert data that it is #
# given in its arguments, or which it finds out for itself. #
#---------------------------------------------------------------------------#
SHELL = @SHELL@
prefix = @prefix@
exec_prefix = @exec_prefix@
top_srcdir = @top_srcdir@
mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs
# NB: top_builddir is not referred to directly below, but it is used in the
# setting of $(LIBTOOL), so don't remove it!
top_builddir = .
# BINDIR is the directory in which the pcregrep, pcretest, and pcre-config
# commands are installed.
# INCDIR is the directory in which the public header files pcre.h and
# pcreposix.h are installed.
# LIBDIR is the directory in which the libraries are installed.
# MANDIR is the directory in which the man pages are installed.
BINDIR = @bindir@
LIBDIR = @libdir@
INCDIR = @includedir@
MANDIR = @mandir@
# EXEEXT is set by configure to the extention of an executable file
# OBJEXT is set by configure to the extention of an object file
# The BUILD_* equivalents are the same but for the host we're building on
EXEEXT = @EXEEXT@
OBJEXT = @OBJEXT@
# Note that these are just here to have a convenient place to look at the
# outcome.
BUILD_EXEEXT = @BUILD_EXEEXT@
BUILD_OBJEXT = @BUILD_OBJEXT@
# The compiler, C flags, preprocessor flags, etc
CC = @CC@
CFLAGS = @CFLAGS@
CPPFLAGS = @CPPFLAGS@
CC_FOR_BUILD = @CC_FOR_BUILD@
CFLAGS_FOR_BUILD = @CFLAGS_FOR_BUILD@
CPPFLAGS_FOR_BUILD = @CPPFLAGS_FOR_BUILD@
UTF8 = @UTF8@
NEWLINE = @NEWLINE@
POSIX_MALLOC_THRESHOLD = @POSIX_MALLOC_THRESHOLD@
LINK_SIZE = @LINK_SIZE@
MATCH_LIMIT = @MATCH_LIMIT@
NO_RECURSE = @NO_RECURSE@
EBCDIC = @EBCDIC@
INSTALL = @INSTALL@
INSTALL_DATA = @INSTALL_DATA@
# LIBTOOL enables the building of shared and static libraries. It is set up
# to do one or the other or both by ./configure.
LIBTOOL = @LIBTOOL@
LTCOMPILE = $(LIBTOOL) --mode=compile $(CC) -c $(CFLAGS) -I. -I$(top_srcdir) $(NEWLINE) $(LINK_SIZE) $(MATCH_LIMIT) $(NO_RECURSE) $(EBCDIC)
@ON_WINDOWS@LINK = $(CC) $(CFLAGS) -I. -I$(top_srcdir) -L.libs
@NOT_ON_WINDOWS@LINK = $(LIBTOOL) --mode=link $(CC) $(CFLAGS) -I. -I$(top_srcdir)
LINKLIB = $(LIBTOOL) --mode=link $(CC) $(CFLAGS) -I. -I$(top_srcdir)
LINK_FOR_BUILD = $(LIBTOOL) --mode=link $(CC_FOR_BUILD) $(CFLAGS_FOR_BUILD) -I. -I$(top_srcdir)
# These are the version numbers for the shared libraries
PCRELIBVERSION = @PCRE_LIB_VERSION@
PCREPOSIXLIBVERSION = @PCRE_POSIXLIB_VERSION@
##############################################################################
OBJ = maketables.@OBJEXT@ get.@OBJEXT@ study.@OBJEXT@ pcre.@OBJEXT@ @POSIX_OBJ@
LOBJ = maketables.lo get.lo study.lo pcre.lo @POSIX_LOBJ@
all: libpcre.la @POSIX_LIB@ pcretest@EXEEXT@ pcregrep@EXEEXT@ @ON_WINDOWS@ winshared
pcregrep@EXEEXT@: libpcre.la pcregrep.@OBJEXT@ @ON_WINDOWS@ winshared
$(LINK) -o pcregrep@EXEEXT@ pcregrep.@OBJEXT@ libpcre.la
pcretest@EXEEXT@: libpcre.la @POSIX_LIB@ pcretest.@OBJEXT@ @ON_WINDOWS@ winshared
$(LINK) $(PURIFY) $(EFENCE) -o pcretest@EXEEXT@ pcretest.@OBJEXT@ \
libpcre.la @POSIX_LIB@
libpcre.la: $(OBJ)
-rm -f libpcre.la
$(LINKLIB) -rpath $(LIBDIR) -version-info \
'$(PCRELIBVERSION)' -o libpcre.la $(LOBJ)
libpcreposix.la: libpcre.la pcreposix.@OBJEXT@
-rm -f libpcreposix.la
$(LINKLIB) -rpath $(LIBDIR) libpcre.la -version-info \
'$(PCREPOSIXLIBVERSION)' -o libpcreposix.la pcreposix.lo
pcre.@OBJEXT@: $(top_srcdir)/chartables.c $(top_srcdir)/pcre.c \
$(top_srcdir)/internal.h $(top_srcdir)/printint.c \
pcre.h config.h Makefile
$(LTCOMPILE) $(UTF8) $(POSIX_MALLOC_THRESHOLD) $(top_srcdir)/pcre.c
pcreposix.@OBJEXT@: $(top_srcdir)/pcreposix.c $(top_srcdir)/pcreposix.h \
$(top_srcdir)/internal.h pcre.h config.h Makefile
$(LTCOMPILE) $(POSIX_MALLOC_THRESHOLD) $(top_srcdir)/pcreposix.c
maketables.@OBJEXT@: $(top_srcdir)/maketables.c $(top_srcdir)/internal.h \
pcre.h config.h Makefile
$(LTCOMPILE) $(top_srcdir)/maketables.c
get.@OBJEXT@: $(top_srcdir)/get.c $(top_srcdir)/internal.h \
pcre.h config.h Makefile
$(LTCOMPILE) $(top_srcdir)/get.c
study.@OBJEXT@: $(top_srcdir)/study.c $(top_srcdir)/internal.h \
pcre.h config.h Makefile
$(LTCOMPILE) $(UTF8) $(top_srcdir)/study.c
pcretest.@OBJEXT@: $(top_srcdir)/pcretest.c $(top_srcdir)/internal.h \
$(top_srcdir)/printint.c \
pcre.h config.h Makefile
$(CC) -c $(CFLAGS) -I. $(UTF8) $(LINK_SIZE) $(top_srcdir)/pcretest.c
pcregrep.@OBJEXT@: $(top_srcdir)/pcregrep.c pcre.h Makefile config.h
$(CC) -c $(CFLAGS) -I. $(UTF8) $(PCREGREP_OSTYPE) $(top_srcdir)/pcregrep.c
# Some Windows-specific targets for MinGW. Do not use for Cygwin.
winshared : .libs/@WIN_PREFIX@pcre.dll .libs/@WIN_PREFIX@pcreposix.dll
.libs/@WIN_PREFIX@pcre.dll : libpcre.la
$(CC) $(CFLAGS) -shared -o $@ \
-Wl,--whole-archive .libs/libpcre.a \
-Wl,--out-implib,.libs/libpcre.dll.a \
-Wl,--output-def,.libs/@WIN_PREFIX@pcre.dll-def \
-Wl,--export-all-symbols \
-Wl,--no-whole-archive
sed -e "s#dlname=''#dlname='../bin/@WIN_PREFIX@pcre.dll'#" \
-e "s#library_names=''#library_names='libpcre.dll.a'#" \
< .libs/libpcre.lai > .libs/libpcre.lai.tmp && \
mv .libs/libpcre.lai.tmp .libs/libpcre.lai
sed -e "s#dlname=''#dlname='../bin/@WIN_PREFIX@pcre.dll'#" \
-e "s#library_names=''#library_names='libpcre.dll.a'#" \
< libpcre.la > libpcre.la.tmp && \
mv libpcre.la.tmp libpcre.la
.libs/@WIN_PREFIX@pcreposix.dll: libpcreposix.la libpcre.la
$(CC) $(CFLAGS) -shared -o $@ \
-Wl,--whole-archive .libs/libpcreposix.a \
-Wl,--out-implib,.libs/@WIN_PREFIX@pcreposix.dll.a \
-Wl,--output-def,.libs/@WIN_PREFIX@libpcreposix.dll-def \
-Wl,--export-all-symbols \
-Wl,--no-whole-archive .libs/libpcre.a
sed -e "s#dlname=''#dlname='../bin/@WIN_PREFIX@pcreposix.dll'#" \
-e "s#library_names=''#library_names='libpcreposix.dll.a'#"\
< .libs/libpcreposix.lai > .libs/libpcreposix.lai.tmp && \
mv .libs/libpcreposix.lai.tmp .libs/libpcreposix.lai
sed -e "s#dlname=''#dlname='../bin/@WIN_PREFIX@pcreposix.dll'#" \
-e "s#library_names=''#library_names='libpcreposix.dll.a'#"\
< libpcreposix.la > libpcreposix.la.tmp && \
mv libpcreposix.la.tmp libpcreposix.la
wininstall : winshared
$(mkinstalldirs) $(DESTDIR)$(LIBDIR)
$(mkinstalldirs) $(DESTDIR)$(BINDIR)
$(INSTALL) .libs/@WIN_PREFIX@pcre.dll $(DESTDIR)$(BINDIR)/@WIN_PREFIX@pcre.dll
$(INSTALL) .libs/@WIN_PREFIX@pcreposix.dll $(DESTDIR)$(BINDIR)/@WIN_PREFIX@pcreposix.dll
$(INSTALL) .libs/@WIN_PREFIX@libpcreposix.dll.a $(DESTDIR)$(LIBDIR)/@WIN_PREFIX@libpcreposix.dll.a
$(INSTALL) .libs/@WIN_PREFIX@libpcre.dll.a $(DESTDIR)$(LIBDIR)/@WIN_PREFIX@libpcre.dll.a
-strip -g $(DESTDIR)$(BINDIR)/@WIN_PREFIX@pcre.dll
-strip -g $(DESTDIR)$(BINDIR)/@WIN_PREFIX@pcreposix.dll
-strip $(DESTDIR)$(BINDIR)/pcregrep@EXEEXT@
-strip $(DESTDIR)$(BINDIR)/pcretest@EXEEXT@
# An auxiliary program makes the default character table source
$(top_srcdir)/chartables.c: dftables
./dftables $(top_srcdir)/chartables.c
dftables.@BUILD_OBJEXT@: $(top_srcdir)/dftables.c $(top_srcdir)/maketables.c \
$(top_srcdir)/internal.h pcre.h config.h Makefile
$(CC_FOR_BUILD) -c $(CFLAGS_FOR_BUILD) -I. $(top_srcdir)/dftables.c
dftables: dftables.@BUILD_OBJEXT@
$(LINK_FOR_BUILD) -o dftables dftables.@OBJEXT@
install: all @ON_WINDOWS@ wininstall
@NOT_ON_WINDOWS@ $(mkinstalldirs) $(DESTDIR)$(LIBDIR)
@NOT_ON_WINDOWS@ echo "$(LIBTOOL) --mode=install $(INSTALL) libpcre.la $(DESTDIR)$(LIBDIR)/libpcre.la"
@NOT_ON_WINDOWS@ $(LIBTOOL) --mode=install $(INSTALL) libpcre.la $(DESTDIR)$(LIBDIR)/libpcre.la
@NOT_ON_WINDOWS@ echo "$(LIBTOOL) --mode=install $(INSTALL) libpcreposix.la $(DESTDIR)$(LIBDIR)/libpcreposix.la"
@NOT_ON_WINDOWS@ $(LIBTOOL) --mode=install $(INSTALL) libpcreposix.la $(DESTDIR)$(LIBDIR)/libpcreposix.la
@NOT_ON_WINDOWS@ $(LIBTOOL) --finish $(DESTDIR)$(LIBDIR)
$(mkinstalldirs) $(DESTDIR)$(INCDIR)
$(INSTALL_DATA) pcre.h $(DESTDIR)$(INCDIR)/pcre.h
$(INSTALL_DATA) $(top_srcdir)/pcreposix.h $(DESTDIR)$(INCDIR)/pcreposix.h
$(mkinstalldirs) $(DESTDIR)$(MANDIR)/man3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre.3 $(DESTDIR)$(MANDIR)/man3/pcre.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcreapi.3 $(DESTDIR)$(MANDIR)/man3/pcreapi.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcrebuild.3 $(DESTDIR)$(MANDIR)/man3/pcrebuild.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcrecallout.3 $(DESTDIR)$(MANDIR)/man3/pcrecallout.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcrecompat.3 $(DESTDIR)$(MANDIR)/man3/pcrecompat.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcrepattern.3 $(DESTDIR)$(MANDIR)/man3/pcrepattern.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcreperform.3 $(DESTDIR)$(MANDIR)/man3/pcreperform.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcreposix.3 $(DESTDIR)$(MANDIR)/man3/pcreposix.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcresample.3 $(DESTDIR)$(MANDIR)/man3/pcresample.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_compile.3 $(DESTDIR)$(MANDIR)/man3/pcre_compile.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_config.3 $(DESTDIR)$(MANDIR)/man3/pcre_config.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_copy_named_substring.3 $(DESTDIR)$(MANDIR)/man3/pcre_copy_named_substring.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_copy_substring.3 $(DESTDIR)$(MANDIR)/man3/pcre_copy_substring.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_exec.3 $(DESTDIR)$(MANDIR)/man3/pcre_exec.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_free_substring.3 $(DESTDIR)$(MANDIR)/man3/pcre_free_substring.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_free_substring_list.3 $(DESTDIR)$(MANDIR)/man3/pcre_free_substring_list.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_fullinfo.3 $(DESTDIR)$(MANDIR)/man3/pcre_fullinfo.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_get_named_substring.3 $(DESTDIR)$(MANDIR)/man3/pcre_get_named_substring.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_get_stringnumber.3 $(DESTDIR)$(MANDIR)/man3/pcre_get_stringnumber.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_get_substring.3 $(DESTDIR)$(MANDIR)/man3/pcre_get_substring.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_get_substring_list.3 $(DESTDIR)$(MANDIR)/man3/pcre_get_substring_list.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_info.3 $(DESTDIR)$(MANDIR)/man3/pcre_info.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_maketables.3 $(DESTDIR)$(MANDIR)/man3/pcre_maketables.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_study.3 $(DESTDIR)$(MANDIR)/man3/pcre_study.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_version.3 $(DESTDIR)$(MANDIR)/man3/pcre_version.3
$(mkinstalldirs) $(DESTDIR)$(MANDIR)/man1
$(INSTALL_DATA) $(top_srcdir)/doc/pcregrep.1 $(DESTDIR)$(MANDIR)/man1/pcregrep.1
$(INSTALL_DATA) $(top_srcdir)/doc/pcretest.1 $(DESTDIR)$(MANDIR)/man1/pcretest.1
$(mkinstalldirs) $(DESTDIR)$(BINDIR)
$(LIBTOOL) --mode=install $(INSTALL) pcregrep@EXEEXT@ $(DESTDIR)$(BINDIR)/pcregrep@EXEEXT@
$(LIBTOOL) --mode=install $(INSTALL) pcretest@EXEEXT@ $(DESTDIR)$(BINDIR)/pcretest@EXEEXT@
$(INSTALL) pcre-config $(DESTDIR)$(BINDIR)/pcre-config
# We deliberately omit dftables and chartables.c from 'make clean'; once made
# chartables.c shouldn't change, and if people have edited the tables by hand,
# you don't want to throw them away.
clean:; -rm -rf *.@OBJEXT@ *.lo *.a *.la .libs pcretest@EXEEXT@ pcregrep@EXEEXT@ testtry
# But "make distclean" should get back to a virgin distribution
distclean: clean
-rm -f chartables.c libtool pcre-config pcre.h \
Makefile config.h config.status config.log config.cache
check: runtest
@WIN_PREFIX@pcre.dll : winshared
cp .libs/@WIN_PREFIX@pcre.dll .
test: runtest
runtest: all @ON_WINDOWS@ @WIN_PREFIX@pcre.dll
@./RunTest
# End

View File

@ -1,154 +0,0 @@
News about PCRE releases
------------------------
Release 4.5 01-Dec-03
---------------------
Again mainly a bug-fix and tidying release, with only a couple of new features:
1. It's possible now to compile PCRE so that it does not use recursive
function calls when matching. Instead it gets memory from the heap. This slows
things down, but may be necessary on systems with limited stacks.
2. UTF-8 string checking has been tightened to reject overlong sequences and to
check that a starting offset points to the start of a character. Failure of the
latter returns a new error code: PCRE_ERROR_BADUTF8_OFFSET.
3. PCRE can now be compiled for systems that use EBCDIC code.
Release 4.4 21-Aug-03
---------------------
This is mainly a bug-fix and tidying release. The only new feature is that PCRE
checks UTF-8 strings for validity by default. There is an option to suppress
this, just in case anybody wants that teeny extra bit of performance.
Releases 4.1 - 4.3
------------------
Sorry, I forgot about updating the NEWS file for these releases. Please take a
look at ChangeLog.
Release 4.0 17-Feb-03
---------------------
There have been a lot of changes for the 4.0 release, adding additional
functionality and mending bugs. Below is a list of the highlights of the new
functionality. For full details of these features, please consult the
documentation. For a complete list of changes, see the ChangeLog file.
1. Support for Perl's \Q...\E escapes.
2. "Possessive quantifiers" ?+, *+, ++, and {,}+ which come from Sun's Java
package. They provide some syntactic sugar for simple cases of "atomic
grouping".
3. Support for the \G assertion. It is true when the current matching position
is at the start point of the match.
4. A new feature that provides some of the functionality that Perl provides
with (?{...}). The facility is termed a "callout". The way it is done in PCRE
is for the caller to provide an optional function, by setting pcre_callout to
its entry point. To get the function called, the regex must include (?C) at
appropriate points.
5. Support for recursive calls to individual subpatterns. This makes it really
easy to get totally confused.
6. Support for named subpatterns. The Python syntax (?P<name>...) is used to
name a group.
7. Several extensions to UTF-8 support; it is now fairly complete. There is an
option for pcregrep to make it operate in UTF-8 mode.
8. The single man page has been split into a number of separate man pages.
These also give rise to individual HTML pages which are put in a separate
directory. There is an index.html page that lists them all. Some hyperlinking
between the pages has been installed.
Release 3.5 15-Aug-01
---------------------
1. The configuring system has been upgraded to use later versions of autoconf
and libtool. By default it builds both a shared and a static library if the OS
supports it. You can use --disable-shared or --disable-static on the configure
command if you want only one of them.
2. The pcretest utility is now installed along with pcregrep because it is
useful for users (to test regexs) and by doing this, it automatically gets
relinked by libtool. The documentation has been turned into a man page, so
there are now .1, .txt, and .html versions in /doc.
3. Upgrades to pcregrep:
(i) Added long-form option names like gnu grep.
(ii) Added --help to list all options with an explanatory phrase.
(iii) Added -r, --recursive to recurse into sub-directories.
(iv) Added -f, --file to read patterns from a file.
4. Added --enable-newline-is-cr and --enable-newline-is-lf to the configure
script, to force use of CR or LF instead of \n in the source. On non-Unix
systems, the value can be set in config.h.
5. The limit of 200 on non-capturing parentheses is a _nesting_ limit, not an
absolute limit. Changed the text of the error message to make this clear, and
likewise updated the man page.
6. The limit of 99 on the number of capturing subpatterns has been removed.
The new limit is 65535, which I hope will not be a "real" limit.
Release 3.3 01-Aug-00
---------------------
There is some support for UTF-8 character strings. This is incomplete and
experimental. The documentation describes what is and what is not implemented.
Otherwise, this is just a bug-fixing release.
Release 3.0 01-Feb-00
---------------------
1. A "configure" script is now used to configure PCRE for Unix systems. It
builds a Makefile, a config.h file, and the pcre-config script.
2. PCRE is built as a shared library by default.
3. There is support for POSIX classes such as [:alpha:].
5. There is an experimental recursion feature.
----------------------------------------------------------------------------
IMPORTANT FOR THOSE UPGRADING FROM VERSIONS BEFORE 2.00
Please note that there has been a change in the API such that a larger
ovector is required at matching time, to provide some additional workspace.
The new man page has details. This change was necessary in order to support
some of the new functionality in Perl 5.005.
IMPORTANT FOR THOSE UPGRADING FROM VERSION 2.00
Another (I hope this is the last!) change has been made to the API for the
pcre_compile() function. An additional argument has been added to make it
possible to pass over a pointer to character tables built in the current
locale by pcre_maketables(). To use the default tables, this new arguement
should be passed as NULL.
IMPORTANT FOR THOSE UPGRADING FROM VERSION 2.05
Yet another (and again I hope this really is the last) change has been made
to the API for the pcre_exec() function. An additional argument has been
added to make it possible to start the match other than at the start of the
subject string. This is important if there are lookbehinds. The new man
page has the details, but you just want to convert existing programs, all
you need to do is to stick in a new fifth argument to pcre_exec(), with a
value of zero. For example, change
pcre_exec(pattern, extra, subject, length, options, ovec, ovecsize)
to
pcre_exec(pattern, extra, subject, length, 0, options, ovec, ovecsize)
****

View File

@ -1,122 +0,0 @@
Compiling PCRE on non-Unix systems
----------------------------------
See below for comments on Cygwin or MinGW usage. I (Philip Hazel) have no
knowledge of Windows sytems and how their libraries work. The items in the
PCRE Makefile that relate to anything other than Unix-like systems have been
contributed by PCRE users. There are some other comments and files in the
Contrib directory on the ftp site that you may find useful.
The following are generic comments about building PCRE:
If you want to compile PCRE for a non-Unix system (or perhaps, more strictly,
for a system that does not support "configure" and make files), note that PCRE
consists entirely of code written in Standard C, and so should compile
successfully on any machine with a Standard C compiler and library, using
normal compiling commands to do the following:
(1) Copy or rename the file config.in as config.h, and change the macros that
define HAVE_STRERROR and HAVE_MEMMOVE to define them as 1 rather than 0.
Unfortunately, because of the way Unix autoconf works, the default setting has
to be 0. You may also want to make changes to other macros in config.h. In
particular, if you want to force a specific value for newline, you can define
the NEWLINE macro. The default is to use '\n', thereby using whatever value
your compiler gives to '\n'.
(2) Copy or rename the file pcre.in as pcre.h, and change the macro definitions
for PCRE_MAJOR, PCRE_MINOR, and PCRE_DATE near its start to the values set in
configure.in.
(3) Compile dftables.c as a stand-alone program, and then run it with
the single argument "chartables.c". This generates a set of standard
character tables and writes them to that file.
(4) Compile maketables.c, get.c, study.c and pcre.c and link them all
together into an object library in whichever form your system keeps such
libraries. This is the pcre library (chartables.c is included by means of an
#include directive). If your system has static and shared libraries, you may
have to do this once for each type.
(5) Similarly, compile pcreposix.c and link it (on its own) as the pcreposix
library.
(6) Compile the test program pcretest.c. This needs the functions in the
pcre and pcreposix libraries when linking.
(7) Run pcretest on the testinput files in the testdata directory, and check
that the output matches the corresponding testoutput files. You must use the
-i option when checking testinput2. Note that the supplied files are in Unix
format, with just LF characters as line terminators. You may need to edit them
to change this if your system uses a different convention.
If you have a system without "configure" but where you can use a Makefile, edit
Makefile.in to create Makefile, substituting suitable values for the variables
at the head of the file.
Some help in building a Win32 DLL of PCRE in GnuWin32 environments was
contributed by Paul Sokolovsky. These environments are Mingw32
(http://www.xraylith.wisc.edu/~khan/software/gnu-win32/) and CygWin
(http://sourceware.cygnus.com/cygwin/). Paul comments:
For CygWin, set CFLAGS=-mno-cygwin, and do 'make dll'. You'll get
pcre.dll (containing pcreposix also), libpcre.dll.a, and dynamically
linked pgrep and pcretest. If you have /bin/sh, run RunTest (three
main test go ok, locale not supported).
Changes to do MinGW with autoconf 2.50 were supplied by Fred Cox
<sailorFred@yahoo.com>, who comments as follows:
If you are using the PCRE DLL, the normal Unix style configure && make &&
make check && make install should just work[*]. If you want to statically
link against the .a file, you must define PCRE_STATIC before including
pcre.h, otherwise the pcre_malloc and pcre_free exported functions will be
declared __declspec(dllimport), with hilarious results. See the configure.in
and pcretest.c for how it is done for the static test.
Also, there will only be a libpcre.la, not a libpcreposix.la, as you
would expect from the Unix version. The single DLL includes the pcreposix
interface.
[*] But note that the supplied test files are in Unix format, with just LF
characters as line terminators. You will have to edit them to change to CR LF
terminators.
A script for building PCRE using Borland's C++ compiler for use with VPASCAL
was contributed by Alexander Tokarev. It is called makevp.bat.
These are some further comments about Win32 builds from Mark Evans. They
were contributed before Fred Cox's changes were made, so it is possible that
they may no longer be relevant.
"The documentation for Win32 builds is a bit shy. Under MSVC6 I
followed their instructions to the letter, but there were still
some things missing.
(1) Must #define STATIC for entire project if linking statically.
(I see no reason to use DLLs for code this compact.) This of
course is a project setting in MSVC under Preprocessor.
(2) Missing some #ifdefs relating to the function pointers
pcre_malloc and pcre_free. See my solution below. (The stubs
may not be mandatory but they made me feel better.)"
=========================
#ifdef _WIN32
#include <malloc.h>
void* malloc_stub(size_t N)
{ return malloc(N); }
void free_stub(void* p)
{ free(p); }
void *(*pcre_malloc)(size_t) = &malloc_stub;
void (*pcre_free)(void *) = &free_stub;
#else
void *(*pcre_malloc)(size_t) = malloc;
void (*pcre_free)(void *) = free;
#endif
=========================
****

View File

@ -1,139 +0,0 @@
#! /bin/sh
# This file is generated by configure from RunTest.in. Make any changes
# to that file.
# Run PCRE tests
cf=diff
testdata=@top_srcdir@/testdata
# Select which tests to run; if no selection, run all
do1=no
do2=no
do3=no
do4=no
do5=no
while [ $# -gt 0 ] ; do
case $1 in
1) do1=yes;;
2) do2=yes;;
3) do3=yes;;
4) do4=yes;;
5) do5=yes;;
*) echo "Unknown test number $1"; exit 1;;
esac
shift
done
if [ "@UTF8@" = "" ] ; then
if [ $do4 = yes ] ; then
echo "Can't run test 4 because UFT8 support is not configured"
exit 1
fi
if [ $do5 = yes ] ; then
echo "Can't run test 5 because UFT8 support is not configured"
exit 1
fi
fi
if [ $do1 = no -a $do2 = no -a $do3 = no -a $do4 = no -a\
$do5 = no ] ; then
do1=yes
do2=yes
do3=yes
if [ "@UTF8@" != "" ] ; then do4=yes; fi
if [ "@UTF8@" != "" ] ; then do5=yes; fi
fi
# Show which release
./pcretest /dev/null
# Primary test, Perl-compatible
if [ $do1 = yes ] ; then
echo "Testing main functionality (Perl compatible)"
./pcretest $testdata/testinput1 testtry
if [ $? = 0 ] ; then
$cf testtry $testdata/testoutput1
if [ $? != 0 ] ; then exit 1; fi
echo " "
else exit 1
fi
fi
# PCRE tests that are not Perl-compatible - API & error tests, mostly
if [ $do2 = yes ] ; then
echo "Testing API and error handling (not Perl compatible)"
./pcretest -i $testdata/testinput2 testtry
if [ $? = 0 ] ; then
$cf testtry $testdata/testoutput2
if [ $? != 0 ] ; then exit 1; fi
else exit 1
fi
fi
if [ $do1 = yes -a $do2 = yes ] ; then
echo " "
echo "The two main tests ran OK"
echo " "
fi
# Locale-specific tests, provided the "fr_FR" locale is available
if [ $do3 = yes ] ; then
locale -a | grep '^fr_FR$' >/dev/null
if [ $? -eq 0 ] ; then
echo "Testing locale-specific features (using 'fr_FR' locale)"
./pcretest $testdata/testinput3 testtry
if [ $? = 0 ] ; then
$cf testtry $testdata/testoutput3
if [ $? != 0 ] ; then
echo " "
echo "Locale test did not run entirely successfully."
echo "This usually means that there is a problem with the locale"
echo "settings rather than a bug in PCRE."
else
echo "Locale test ran OK"
fi
echo " "
else exit 1
fi
else
echo "Cannot test locale-specific features - 'fr_FR' locale not found,"
echo "or the \"locale\" command is not available to check for it."
echo " "
fi
fi
# Additional tests for UTF8 support
if [ $do4 = yes ] ; then
echo "Testing UTF-8 support (Perl compatible)"
./pcretest $testdata/testinput4 testtry
if [ $? = 0 ] ; then
$cf testtry $testdata/testoutput4
if [ $? != 0 ] ; then exit 1; fi
else exit 1
fi
echo "UTF8 test ran OK"
echo " "
fi
if [ $do5 = yes ] ; then
echo "Testing API and internals for UTF-8 support (not Perl compatible)"
./pcretest $testdata/testinput5 testtry
if [ $? = 0 ] ; then
$cf testtry $testdata/testoutput5
if [ $? != 0 ] ; then exit 1; fi
else exit 1
fi
echo "UTF8 internals test ran OK"
echo " "
fi
# End

File diff suppressed because it is too large Load Diff

View File

@ -1,107 +0,0 @@
/* On Unix systems config.in is converted by configure into config.h. PCRE is
written in Standard C, but there are a few non-standard things it can cope
with, allowing it to run on SunOS4 and other "close to standard" systems.
On a non-Unix system you should just copy this file into config.h, and set up
the macros the way you need them. You should normally change the definitions of
HAVE_STRERROR and HAVE_MEMMOVE to 1. Unfortunately, because of the way autoconf
works, these cannot be made the defaults. If your system has bcopy() and not
memmove(), change the definition of HAVE_BCOPY instead of HAVE_MEMMOVE. If your
system has neither bcopy() nor memmove(), leave them both as 0; an emulation
function will be used. */
/* If you are compiling for a system that uses EBCDIC instead of ASCII
character codes, define this macro as 1. On systems that can use "configure",
this can be done via --enable-ebcdic. */
#ifndef EBCDIC
#define EBCDIC 0
#endif
/* If you are compiling for a system that needs some magic to be inserted
before the definition of an exported function, define this macro to contain the
relevant magic. It apears at the start of every exported function. */
#define EXPORT
/* Define to empty if the "const" keyword does not work. */
#undef const
/* Define to "unsigned" if <stddef.h> doesn't define size_t. */
#undef size_t
/* The following two definitions are mainly for the benefit of SunOS4, which
doesn't have the strerror() or memmove() functions that should be present in
all Standard C libraries. The macros HAVE_STRERROR and HAVE_MEMMOVE should
normally be defined with the value 1 for other systems, but unfortunately we
can't make this the default because "configure" files generated by autoconf
will only change 0 to 1; they won't change 1 to 0 if the functions are not
found. */
#define HAVE_STRERROR 0
#define HAVE_MEMMOVE 0
/* There are some non-Unix systems that don't even have bcopy(). If this macro
is false, an emulation is used. If HAVE_MEMMOVE is set to 1, the value of
HAVE_BCOPY is not relevant. */
#define HAVE_BCOPY 0
/* The value of NEWLINE determines the newline character. The default is to
leave it up to the compiler, but some sites want to force a particular value.
On Unix systems, "configure" can be used to override this default. */
#ifndef NEWLINE
#define NEWLINE '\n'
#endif
/* The value of LINK_SIZE determines the number of bytes used to store
links as offsets within the compiled regex. The default is 2, which allows for
compiled patterns up to 64K long. This covers the vast majority of cases.
However, PCRE can also be compiled to use 3 or 4 bytes instead. This allows for
longer patterns in extreme cases. On Unix systems, "configure" can be used to
override this default. */
#ifndef LINK_SIZE
#define LINK_SIZE 2
#endif
/* The value of MATCH_LIMIT determines the default number of times the match()
function can be called during a single execution of pcre_exec(). (There is a
runtime method of setting a different limit.) The limit exists in order to
catch runaway regular expressions that take for ever to determine that they do
not match. The default is set very large so that it does not accidentally catch
legitimate cases. On Unix systems, "configure" can be used to override this
default default. */
#ifndef MATCH_LIMIT
#define MATCH_LIMIT 10000000
#endif
/* When calling PCRE via the POSIX interface, additional working storage is
required for holding the pointers to capturing substrings because PCRE requires
three integers per substring, whereas the POSIX interface provides only two. If
the number of expected substrings is small, the wrapper function uses space on
the stack, because this is faster than using malloc() for each call. The
threshold above which the stack is no longer use is defined by POSIX_MALLOC_
THRESHOLD. On Unix systems, "configure" can be used to override this default.
*/
#ifndef POSIX_MALLOC_THRESHOLD
#define POSIX_MALLOC_THRESHOLD 10
#endif
/* PCRE uses recursive function calls to handle backtracking while matching.
This can sometimes be a problem on systems that have stacks of limited size.
Define NO_RECURSE to get a version that doesn't use recursion in the match()
function; instead it creates its own stack by steam using pcre_recurse_malloc
to get memory. For more detail, see comments and other stuff just above the
match() function. On Unix systems, "configure" can be used to set this in the
Makefile (use --disable-recursion). */
/* #define NO_RECURSE */
/* End */

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,201 +0,0 @@
dnl Process this file with autoconf to produce a configure script.
dnl This is required at the start; the name is the name of a file
dnl it should be seeing, to verify it is in the same directory.
AC_INIT(dftables.c)
dnl A safety precaution
AC_PREREQ(2.57)
dnl Arrange to build config.h from config.in. Note that pcre.h is
dnl built differently, as it is just a "substitution" file.
dnl Manual says this macro should come right after AC_INIT.
AC_CONFIG_HEADER(config.h:config.in)
dnl Provide the current PCRE version information. Do not use numbers
dnl with leading zeros for the minor version, as they end up in a C
dnl macro, and may be treated as octal constants. Stick to single
dnl digits for minor numbers less than 10. There are unlikely to be
dnl that many releases anyway.
PCRE_MAJOR=4
PCRE_MINOR=5
PCRE_DATE=01-December-2003
PCRE_VERSION=${PCRE_MAJOR}.${PCRE_MINOR}
dnl Default values for miscellaneous macros
POSIX_MALLOC_THRESHOLD=-DPOSIX_MALLOC_THRESHOLD=10
dnl Provide versioning information for libtool shared libraries that
dnl are built by default on Unix systems.
PCRE_LIB_VERSION=0:1:0
PCRE_POSIXLIB_VERSION=0:0:0
dnl Checks for programs.
AC_PROG_CC
AC_PROG_INSTALL
AC_LIBTOOL_WIN32_DLL
AC_PROG_LIBTOOL
dnl We need to find a compiler for compiling a program to run on the local host
dnl while building. It needs to be different from CC when cross-compiling.
dnl There is a macro called AC_PROG_CC_FOR_BUILD in the GNU archive for
dnl figuring this out automatically. Unfortunately, it does not work with the
dnl latest versions of autoconf. So for the moment, we just default to the
dnl same values as the "main" compiler. People who are corss-compiling will
dnl just have to adjust the Makefile by hand or set these values when they
dnl run "configure".
CC_FOR_BUILD=${CC_FOR_BUILD:-'$(CC)'}
CFLAGS_FOR_BUILD=${CFLAGS_FOR_BUILD:-'$(CFLAGS)'}
BUILD_EXEEXT=${BUILD_EXEEXT:-'$(EXEEXT)'}
BUILD_OBJEXT=${BUILD_OBJEXT:-'$(OBJEXT)'}
dnl Checks for header files.
AC_HEADER_STDC
AC_CHECK_HEADERS(limits.h)
dnl Checks for typedefs, structures, and compiler characteristics.
AC_C_CONST
AC_TYPE_SIZE_T
dnl Checks for library functions.
AC_CHECK_FUNCS(bcopy memmove strerror)
dnl Handle --enable-utf8
AC_ARG_ENABLE(utf8,
[ --enable-utf8 enable UTF8 support],
if test "$enableval" = "yes"; then
UTF8=-DSUPPORT_UTF8
fi
)
dnl Handle --enable-newline-is-cr
AC_ARG_ENABLE(newline-is-cr,
[ --enable-newline-is-cr use CR as the newline character],
if test "$enableval" = "yes"; then
NEWLINE=-DNEWLINE=13
fi
)
dnl Handle --enable-newline-is-lf
AC_ARG_ENABLE(newline-is-lf,
[ --enable-newline-is-lf use LF as the newline character],
if test "$enableval" = "yes"; then
NEWLINE=-DNEWLINE=10
fi
)
dnl Handle --enable-ebcdic
AC_ARG_ENABLE(ebcdic,
[ --enable-ebcdic assume EBCDIC coding rather than ASCII],
if test "$enableval" == "yes"; then
EBCDIC=-DEBCDIC=1
fi
)
dnl Handle --disable-stack-for-recursion
AC_ARG_ENABLE(recursion,
[ --disable-stack-for-recursion disable use of stack recursion when matching],
if test "$enableval" = "no"; then
NO_RECURSE=-DNO_RECURSE
fi
)
dnl There doesn't seem to be a straightforward way of having parameters
dnl that set values, other than fudging the --with thing. So that's what
dnl I've done.
dnl Handle --with-posix-malloc-threshold=n
AC_ARG_WITH(posix-malloc-threshold,
[ --with-posix-malloc-threshold=5 threshold for POSIX malloc usage],
POSIX_MALLOC_THRESHOLD=-DPOSIX_MALLOC_THRESHOLD=$withval
)
dnl Handle --with-link-size=n
AC_ARG_WITH(link-size,
[ --with-link-size=2 internal link size (2, 3, or 4 allowed)],
LINK_SIZE=-DLINK_SIZE=$withval
)
dnl Handle --with-match_limit=n
AC_ARG_WITH(match-limit,
[ --with-match-limit=10000000 default limit on internal looping)],
MATCH_LIMIT=-DMATCH_LIMIT=$withval
)
dnl Now arrange to build libtool
AC_PROG_LIBTOOL
dnl "Export" these variables
AC_SUBST(BUILD_EXEEXT)
AC_SUBST(BUILD_OBJEXT)
AC_SUBST(CC_FOR_BUILD)
AC_SUBST(CFLAGS_FOR_BUILD)
AC_SUBST(EBCDIC)
AC_SUBST(HAVE_MEMMOVE)
AC_SUBST(HAVE_STRERROR)
AC_SUBST(LINK_SIZE)
AC_SUBST(MATCH_LIMIT)
AC_SUBST(NEWLINE)
AC_SUBST(NO_RECURSE)
AC_SUBST(PCRE_MAJOR)
AC_SUBST(PCRE_MINOR)
AC_SUBST(PCRE_DATE)
AC_SUBST(PCRE_VERSION)
AC_SUBST(PCRE_LIB_VERSION)
AC_SUBST(PCRE_POSIXLIB_VERSION)
AC_SUBST(POSIX_MALLOC_THRESHOLD)
AC_SUBST(UTF8)
dnl Stuff to make MinGW work better. Special treatment is no longer
dnl needed for Cygwin.
case $host_os in
mingw* )
POSIX_OBJ=pcreposix.o
POSIX_LOBJ=pcreposix.lo
POSIX_LIB=
ON_WINDOWS=
NOT_ON_WINDOWS="#"
WIN_PREFIX=
;;
* )
ON_WINDOWS="#"
NOT_ON_WINDOWS=
POSIX_OBJ=
POSIX_LOBJ=
POSIX_LIB=libpcreposix.la
WIN_PREFIX=
;;
esac
AC_SUBST(WIN_PREFIX)
AC_SUBST(ON_WINDOWS)
AC_SUBST(NOT_ON_WINDOWS)
AC_SUBST(POSIX_OBJ)
AC_SUBST(POSIX_LOBJ)
AC_SUBST(POSIX_LIB)
if test "x$enable_shared" = "xno" ; then
AC_DEFINE([PCRE_STATIC],[1],[to link statically])
fi
dnl This must be last; it determines what files are written as well as config.h
AC_OUTPUT(Makefile pcre.h:pcre.in pcre-config:pcre-config.in RunTest:RunTest.in,[chmod a+x RunTest pcre-config])

View File

@ -1,281 +0,0 @@
Technical Notes about PCRE
--------------------------
Many years ago I implemented some regular expression functions to an algorithm
suggested by Martin Richards. These were not Unix-like in form, and were quite
restricted in what they could do by comparison with Perl. The interesting part
about the algorithm was that the amount of space required to hold the compiled
form of an expression was known in advance. The code to apply an expression did
not operate by backtracking, as the original Henry Spencer code and current
Perl code does, but instead checked all possibilities simultaneously by keeping
a list of current states and checking all of them as it advanced through the
subject string. (In the terminology of Jeffrey Friedl's book, it was a "DFA
algorithm".) When the pattern was all used up, all remaining states were
possible matches, and the one matching the longest subset of the subject string
was chosen. This did not necessarily maximize the individual wild portions of
the pattern, as is expected in Unix and Perl-style regular expressions.
By contrast, the code originally written by Henry Spencer and subsequently
heavily modified for Perl actually compiles the expression twice: once in a
dummy mode in order to find out how much store will be needed, and then for
real. The execution function operates by backtracking and maximizing (or,
optionally, minimizing in Perl) the amount of the subject that matches
individual wild portions of the pattern. This is an "NFA algorithm" in Friedl's
terminology.
For the set of functions that forms PCRE (which are unrelated to those
mentioned above), I tried at first to invent an algorithm that used an amount
of store bounded by a multiple of the number of characters in the pattern, to
save on compiling time. However, because of the greater complexity in Perl
regular expressions, I couldn't do this. In any case, a first pass through the
pattern is needed, for a number of reasons. PCRE works by running a very
degenerate first pass to calculate a maximum store size, and then a second pass
to do the real compile - which may use a bit less than the predicted amount of
store. The idea is that this is going to turn out faster because the first pass
is degenerate and the second pass can just store stuff straight into the
vector. It does make the compiling functions bigger, of course, but they have
got quite big anyway to handle all the Perl stuff.
The compiled form of a pattern is a vector of bytes, containing items of
variable length. The first byte in an item is an opcode, and the length of the
item is either implicit in the opcode or contained in the data bytes which
follow it. A list of all the opcodes follows:
Opcodes with no following data
------------------------------
These items are all just one byte long
OP_END end of pattern
OP_ANY match any character
OP_ANYBYTE match any single byte, even in UTF-8 mode
OP_SOD match start of data: \A
OP_SOM, start of match (subject + offset): \G
OP_CIRC ^ (start of data, or after \n in multiline)
OP_NOT_WORD_BOUNDARY \W
OP_WORD_BOUNDARY \w
OP_NOT_DIGIT \D
OP_DIGIT \d
OP_NOT_WHITESPACE \S
OP_WHITESPACE \s
OP_NOT_WORDCHAR \W
OP_WORDCHAR \w
OP_EODN match end of data or \n at end: \Z
OP_EOD match end of data: \z
OP_DOLL $ (end of data, or before \n in multiline)
Repeating single characters
---------------------------
The common repeats (*, +, ?) when applied to a single character appear as
two-byte items using the following opcodes:
OP_STAR
OP_MINSTAR
OP_PLUS
OP_MINPLUS
OP_QUERY
OP_MINQUERY
Those with "MIN" in their name are the minimizing versions. Each is followed by
the character that is to be repeated. Other repeats make use of
OP_UPTO
OP_MINUPTO
OP_EXACT
which are followed by a two-byte count (most significant first) and the
repeated character. OP_UPTO matches from 0 to the given number. A repeat with a
non-zero minimum and a fixed maximum is coded as an OP_EXACT followed by an
OP_UPTO (or OP_MINUPTO).
Repeating character types
-------------------------
Repeats of things like \d are done exactly as for single characters, except
that instead of a character, the opcode for the type is stored in the data
byte. The opcodes are:
OP_TYPESTAR
OP_TYPEMINSTAR
OP_TYPEPLUS
OP_TYPEMINPLUS
OP_TYPEQUERY
OP_TYPEMINQUERY
OP_TYPEUPTO
OP_TYPEMINUPTO
OP_TYPEEXACT
Matching a character string
---------------------------
The OP_CHARS opcode is followed by a one-byte count and then that number of
characters. If there are more than 255 characters in sequence, successive
instances of OP_CHARS are used.
Character classes
-----------------
If there is only one character, OP_CHARS is used for a positive class,
and OP_NOT for a negative one (that is, for something like [^a]). However, in
UTF-8 mode, this applies only to characters with values < 128, because OP_NOT
is confined to single bytes.
Another set of repeating opcodes (OP_NOTSTAR etc.) are used for a repeated,
negated, single-character class. The normal ones (OP_STAR etc.) are used for a
repeated positive single-character class.
When there's more than one character in a class and all the characters are less
than 256, OP_CLASS is used for a positive class, and OP_NCLASS for a negative
one. In either case, the opcode is followed by a 32-byte bit map containing a 1
bit for every character that is acceptable. The bits are counted from the least
significant end of each byte.
The reason for having both OP_CLASS and OP_NCLASS is so that, in UTF-8 mode,
subject characters with values greater than 256 can be handled correctly. For
OP_CLASS they don't match, whereas for OP_NCLASS they do.
For classes containing characters with values > 255, OP_XCLASS is used. It
optionally uses a bit map (if any characters lie within it), followed by a list
of pairs and single characters. There is a flag character than indicates
whether it's a positive or a negative class.
Back references
---------------
OP_REF is followed by two bytes containing the reference number.
Repeating character classes and back references
-----------------------------------------------
Single-character classes are handled specially (see above). This applies to
OP_CLASS and OP_REF. In both cases, the repeat information follows the base
item. The matching code looks at the following opcode to see if it is one of
OP_CRSTAR
OP_CRMINSTAR
OP_CRPLUS
OP_CRMINPLUS
OP_CRQUERY
OP_CRMINQUERY
OP_CRRANGE
OP_CRMINRANGE
All but the last two are just single-byte items. The others are followed by
four bytes of data, comprising the minimum and maximum repeat counts.
Brackets and alternation
------------------------
A pair of non-capturing (round) brackets is wrapped round each expression at
compile time, so alternation always happens in the context of brackets.
Non-capturing brackets use the opcode OP_BRA, while capturing brackets use
OP_BRA+1, OP_BRA+2, etc. [Note for North Americans: "bracket" to some English
speakers, including myself, can be round, square, curly, or pointy. Hence this
usage.]
Originally PCRE was limited to 99 capturing brackets (so as not to use up all
the opcodes). From release 3.5, there is no limit. What happens is that the
first ones, up to EXTRACT_BASIC_MAX are handled with separate opcodes, as
above. If there are more, the opcode is set to EXTRACT_BASIC_MAX+1, and the
first operation in the bracket is OP_BRANUMBER, followed by a 2-byte bracket
number. This opcode is ignored while matching, but is fished out when handling
the bracket itself. (They could have all been done like this, but I was making
minimal changes.)
A bracket opcode is followed by two bytes which give the offset to the next
alternative OP_ALT or, if there aren't any branches, to the matching KET
opcode. Each OP_ALT is followed by two bytes giving the offset to the next one,
or to the KET opcode.
OP_KET is used for subpatterns that do not repeat indefinitely, while
OP_KETRMIN and OP_KETRMAX are used for indefinite repetitions, minimally or
maximally respectively. All three are followed by two bytes giving (as a
positive number) the offset back to the matching BRA opcode.
If a subpattern is quantified such that it is permitted to match zero times, it
is preceded by one of OP_BRAZERO or OP_BRAMINZERO. These are single-byte
opcodes which tell the matcher that skipping this subpattern entirely is a
valid branch.
A subpattern with an indefinite maximum repetition is replicated in the
compiled data its minimum number of times (or once with a BRAZERO if the
minimum is zero), with the final copy terminating with a KETRMIN or KETRMAX as
appropriate.
A subpattern with a bounded maximum repetition is replicated in a nested
fashion up to the maximum number of times, with BRAZERO or BRAMINZERO before
each replication after the minimum, so that, for example, (abc){2,5} is
compiled as (abc)(abc)((abc)((abc)(abc)?)?)?. The 99 and 200 bracket limits do
not apply to these internally generated brackets.
Assertions
----------
Forward assertions are just like other subpatterns, but starting with one of
the opcodes OP_ASSERT or OP_ASSERT_NOT. Backward assertions use the opcodes
OP_ASSERTBACK and OP_ASSERTBACK_NOT, and the first opcode inside the assertion
is OP_REVERSE, followed by a two byte count of the number of characters to move
back the pointer in the subject string. When operating in UTF-8 mode, the count
is a character count rather than a byte count. A separate count is present in
each alternative of a lookbehind assertion, allowing them to have different
fixed lengths.
Once-only subpatterns
---------------------
These are also just like other subpatterns, but they start with the opcode
OP_ONCE.
Conditional subpatterns
-----------------------
These are like other subpatterns, but they start with the opcode OP_COND. If
the condition is a back reference, this is stored at the start of the
subpattern using the opcode OP_CREF followed by two bytes containing the
reference number. If the condition is "in recursion" (coded as "(?(R)"), the
same scheme is used, with a "reference number" of 0xffff. Otherwise, a
conditional subpattern always starts with one of the assertions.
Recursion
---------
Recursion either matches the current regex, or some subexpression. The opcode
OP_RECURSE is followed by an value which is the offset to the starting bracket
from the start of the whole pattern.
Callout
-------
OP_CALLOUT is followed by one byte of data that holds a callout number in the
range 0 to 255.
Changing options
----------------
If any of the /i, /m, or /s options are changed within a pattern, an OP_OPT
opcode is compiled, followed by one byte containing the new settings of these
flags. If there are several alternatives, there is an occurrence of OP_OPT at
the start of all those following the first options change, to set appropriate
options for the start of the alternative. Immediately after the end of the
group there is another such item to reset the flags to their previous values. A
change of flag right at the very start of the pattern can be handled entirely
at compile time, and so does not cause anything to be put into the compiled
data.
Philip Hazel
August 2003

View File

@ -1,102 +0,0 @@
<html>
<head>
<title>PCRE specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>Perl-compatible Regular Expressions (PCRE)</h1>
<p>
The HTML documentation for PCRE comprises the following pages:
</p>
<table>
<tr><td><a href="pcre.html">pcre</a></td>
<td>&nbsp;&nbsp;Introductory page</td></tr>
<tr><td><a href="pcreapi.html">pcreapi</a></td>
<td>&nbsp;&nbsp;PCRE's native API</td></tr>
<tr><td><a href="pcrebuild.html">pcrebuild</a></td>
<td>&nbsp;&nbsp;Options for building PCRE</td></tr>
<tr><td><a href="pcrecallout.html">pcrecallout</a></td>
<td>&nbsp;&nbsp;The <i>callout</i> facility</td></tr>
<tr><td><a href="pcrecompat.html">pcrecompat</a></td>
<td>&nbsp;&nbsp;Compability with Perl</td></tr>
<tr><td><a href="pcregrep.html">pcregrep</a></td>
<td>&nbsp;&nbsp;The <b>pcregrep</b> command</td></tr>
<tr><td><a href="pcrepattern.html">pcrepattern</a></td>
<td>&nbsp;&nbsp;Regular expressions supported by PCRE</td></tr>
<tr><td><a href="pcreperform.html">pcreperform</a></td>
<td>&nbsp;&nbsp;Some comments on performance</td></tr>
<tr><td><a href="pcreposix.html">pcreposix</a></td>
<td>&nbsp;&nbsp;The POSIX API to the PCRE library</td></tr>
<tr><td><a href="pcresample.html">pcresample</a></td>
<td>&nbsp;&nbsp;Description of the sample program</td></tr>
<tr><td><a href="pcretest.html">pcretest</a></td>
<td>&nbsp;&nbsp;The <b>pcretest</b> command for testing PCRE</td></tr>
</table>
<p>
There are also individual pages that summarize the interface for each function
in the library:
</p>
<table>
<tr><td><a href="pcre_compile.html">pcre_compile</a></td>
<td>&nbsp;&nbsp;Compile a regular expression</td></tr>
<tr><td><a href="pcre_config.html">pcre_config</a></td>
<td>&nbsp;&nbsp;Show build-time configuration options</td></tr>
<tr><td><a href="pcre_copy_named_substring.html">pcre_copy_named_substring</a></td>
<td>&nbsp;&nbsp;Extract named substring into given buffer</td></tr>
<tr><td><a href="pcre_copy_substring.html">pcre_copy_substring</a></td>
<td>&nbsp;&nbsp;Extract numbered substring into given buffer</td></tr>
<tr><td><a href="pcre_exec.html">pcre_exec</a></td>
<td>&nbsp;&nbsp;Match a compiled pattern to a subject string</td></tr>
<tr><td><a href="pcre_free_substring.html">pcre_free_substring</a></td>
<td>&nbsp;&nbsp;Free extracted substring</td></tr>
<tr><td><a href="pcre_free_substring_list.html">pcre_free_substring_list</a></td>
<td>&nbsp;&nbsp;Free list of extracted substrings</td></tr>
<tr><td><a href="pcre_fullinfo.html">pcre_fullinfo</a></td>
<td>&nbsp;&nbsp;Extract information about a pattern</td></tr>
<tr><td><a href="pcre_get_named_substring.html">pcre_get_named_substring</a></td>
<td>&nbsp;&nbsp;Extract named substring into new memory</td></tr>
<tr><td><a href="pcre_get_stringnumber.html">pcre_get_stringnumber</a></td>
<td>&nbsp;&nbsp;Convert captured string name to number</td></tr>
<tr><td><a href="pcre_get_substring.html">pcre_get_substring</a></td>
<td>&nbsp;&nbsp;Extract numbered substring into new memory</td></tr>
<tr><td><a href="pcre_get_substring_list.html">pcre_get_substring_list</a></td>
<td>&nbsp;&nbsp;Extract all substrings into new memory</td></tr>
<tr><td><a href="pcre_info.html">pcre_info</a></td>
<td>&nbsp;&nbsp;Obsolete information extraction function</td></tr>
<tr><td><a href="pcre_maketables.html">pcre_maketables</a></td>
<td>&nbsp;&nbsp;Build character tables in current locale</td></tr>
<tr><td><a href="pcre_study.html">pcre_study</a></td>
<td>&nbsp;&nbsp;Study a compiled pattern</td></tr>
<tr><td><a href="pcre_version.html">pcre_version</a></td>
<td>&nbsp;&nbsp;Return PCRE version and release date</td></tr>
</table>
</html>

View File

@ -1,190 +0,0 @@
<html>
<head>
<title>pcre specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page, in case the
conversion went wrong.<br>
<ul>
<li><a name="TOC1" href="#SEC1">DESCRIPTION</a>
<li><a name="TOC2" href="#SEC2">USER DOCUMENTATION</a>
<li><a name="TOC3" href="#SEC3">LIMITATIONS</a>
<li><a name="TOC4" href="#SEC4">UTF-8 SUPPORT</a>
<li><a name="TOC5" href="#SEC5">AUTHOR</a>
</ul>
<br><a name="SEC1" href="#TOC1">DESCRIPTION</a><br>
<P>
The PCRE library is a set of functions that implement regular expression
pattern matching using the same syntax and semantics as Perl, with just a few
differences. The current implementation of PCRE (release 4.x) corresponds
approximately with Perl 5.8, including support for UTF-8 encoded strings.
However, this support has to be explicitly enabled; it is not the default.
</P>
<P>
PCRE is written in C and released as a C library. However, a number of people
have written wrappers and interfaces of various kinds. A C++ class is included
in these contributions, which can be found in the <i>Contrib</i> directory at
the primary FTP site, which is:
</P>
<a href="ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre">ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre</a>
<P>
Details of exactly which Perl regular expression features are and are not
supported by PCRE are given in separate documents. See the
<a href="pcrepattern.html"><b>pcrepattern</b></a>
and
<a href="pcrecompat.html"><b>pcrecompat</b></a>
pages.
</P>
<P>
Some features of PCRE can be included, excluded, or changed when the library is
built. The
<a href="pcre_config.html"><b>pcre_config()</b></a>
function makes it possible for a client to discover which features are
available. Documentation about building PCRE for various operating systems can
be found in the <b>README</b> file in the source distribution.
</P>
<br><a name="SEC2" href="#TOC1">USER DOCUMENTATION</a><br>
<P>
The user documentation for PCRE has been split up into a number of different
sections. In the "man" format, each of these is a separate "man page". In the
HTML format, each is a separate page, linked from the index page. In the plain
text format, all the sections are concatenated, for ease of searching. The
sections are as follows:
</P>
<P>
<pre>
pcre this document
pcreapi details of PCRE's native API
pcrebuild options for building PCRE
pcrecallout details of the callout feature
pcrecompat discussion of Perl compatibility
pcregrep description of the <b>pcregrep</b> command
pcrepattern syntax and semantics of supported
regular expressions
pcreperform discussion of performance issues
pcreposix the POSIX-compatible API
pcresample discussion of the sample program
pcretest the <b>pcretest</b> testing command
</PRE>
</P>
<P>
In addition, in the "man" and HTML formats, there is a short page for each
library function, listing its arguments and results.
</P>
<br><a name="SEC3" href="#TOC1">LIMITATIONS</a><br>
<P>
There are some size limitations in PCRE but it is hoped that they will never in
practice be relevant.
</P>
<P>
The maximum length of a compiled pattern is 65539 (sic) bytes if PCRE is
compiled with the default internal linkage size of 2. If you want to process
regular expressions that are truly enormous, you can compile PCRE with an
internal linkage size of 3 or 4 (see the <b>README</b> file in the source
distribution and the
<a href="pcrebuild.html"><b>pcrebuild</b></a>
documentation for details). If these cases the limit is substantially larger.
However, the speed of execution will be slower.
</P>
<P>
All values in repeating quantifiers must be less than 65536.
The maximum number of capturing subpatterns is 65535.
</P>
<P>
There is no limit to the number of non-capturing subpatterns, but the maximum
depth of nesting of all kinds of parenthesized subpattern, including capturing
subpatterns, assertions, and other types of subpattern, is 200.
</P>
<P>
The maximum length of a subject string is the largest positive number that an
integer variable can hold. However, PCRE uses recursion to handle subpatterns
and indefinite repetition. This means that the available stack space may limit
the size of a subject string that can be processed by certain patterns.
</P>
<a name="utf8support"></a><br><a name="SEC4" href="#TOC1">UTF-8 SUPPORT</a><br>
<P>
Starting at release 3.3, PCRE has had some support for character strings
encoded in the UTF-8 format. For release 4.0 this has been greatly extended to
cover most common requirements.
</P>
<P>
In order process UTF-8 strings, you must build PCRE to include UTF-8 support in
the code, and, in addition, you must call
<a href="pcre_compile.html"><b>pcre_compile()</b></a>
with the PCRE_UTF8 option flag. When you do this, both the pattern and any
subject strings that are matched against it are treated as UTF-8 strings
instead of just strings of bytes.
</P>
<P>
If you compile PCRE with UTF-8 support, but do not use it at run time, the
library will be a bit bigger, but the additional run time overhead is limited
to testing the PCRE_UTF8 flag in several places, so should not be very large.
</P>
<P>
The following comments apply when PCRE is running in UTF-8 mode:
</P>
<P>
1. When you set the PCRE_UTF8 flag, the strings passed as patterns and subjects
are checked for validity on entry to the relevant functions. If an invalid
UTF-8 string is passed, an error return is given. In some situations, you may
already know that your strings are valid, and therefore want to skip these
checks in order to improve performance. If you set the PCRE_NO_UTF8_CHECK flag
at compile time or at run time, PCRE assumes that the pattern or subject it
is given (respectively) contains only valid UTF-8 codes. In this case, it does
not diagnose an invalid UTF-8 string. If you pass an invalid UTF-8 string to
PCRE when PCRE_NO_UTF8_CHECK is set, the results are undefined. Your program
may crash.
</P>
<P>
2. In a pattern, the escape sequence \x{...}, where the contents of the braces
is a string of hexadecimal digits, is interpreted as a UTF-8 character whose
code number is the given hexadecimal number, for example: \x{1234}. If a
non-hexadecimal digit appears between the braces, the item is not recognized.
This escape sequence can be used either as a literal, or within a character
class.
</P>
<P>
3. The original hexadecimal escape sequence, \xhh, matches a two-byte UTF-8
character if the value is greater than 127.
</P>
<P>
4. Repeat quantifiers apply to complete UTF-8 characters, not to individual
bytes, for example: \x{100}{3}.
</P>
<P>
5. The dot metacharacter matches one UTF-8 character instead of a single byte.
</P>
<P>
6. The escape sequence \C can be used to match a single byte in UTF-8 mode,
but its use can lead to some strange effects.
</P>
<P>
7. The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly
test characters of any code value, but the characters that PCRE recognizes as
digits, spaces, or word characters remain the same set as before, all with
values less than 256.
</P>
<P>
8. Case-insensitive matching applies only to characters whose values are less
than 256. PCRE does not support the notion of "case" for higher-valued
characters.
</P>
<P>
9. PCRE does not support the use of Unicode tables and properties or the Perl
escapes \p, \P, and \X.
</P>
<br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
<P>
Philip Hazel &#60;ph10@cam.ac.uk&#62;
<br>
University Computing Service,
<br>
Cambridge CB2 3QG, England.
<br>
Phone: +44 1223 334714
</P>
<P>
Last updated: 20 August 2003
<br>
Copyright &copy; 1997-2003 University of Cambridge.

View File

@ -1,71 +0,0 @@
<html>
<head>
<title>pcre_compile specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page, in case the
conversion went wrong.<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>pcre *pcre_compile(const char *<i>pattern</i>, int <i>options</i>,</b>
<b>const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
<b>const unsigned char *<i>tableptr</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function compiles a regular expression into an internal form. Its
arguments are:
</P>
<P>
<pre>
<i>pattern</i> A zero-terminated string containing the
regular expression to be compiled
<i>options</i> Zero or more option bits
<i>errptr</i> Where to put an error message
<i>erroffset</i> Offset in pattern where error was found
<i>tableptr</i> Pointer to character tables, or NULL to
use the built-in default
</PRE>
</P>
<P>
The option bits are:
</P>
<P>
<pre>
PCRE_ANCHORED Force pattern anchoring
PCRE_CASELESS Do caseless matching
PCRE_DOLLAR_ENDONLY $ not to match newline at end
PCRE_DOTALL . matches anything including NL
PCRE_EXTENDED Ignore whitespace and # comments
PCRE_EXTRA PCRE extra features
(not much use currently)
PCRE_MULTILINE ^ and $ match newlines within data
PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
theses (named ones available)
PCRE_UNGREEDY Invert greediness of quantifiers
PCRE_UTF8 Run in UTF-8 mode
PCRE_NO_UTF8_CHECK Do not check the pattern for UTF-8
validity (only relevant if
PCRE_UTF8 is set)
</PRE>
</P>
<P>
PCRE must be compiled with UTF-8 support in order to use PCRE_UTF8
(or PCRE_NO_UTF8_CHECK).
</P>
<P>
The yield of the function is a pointer to a private data structure that
contains the compiled pattern, or NULL if an error was detected.
</P>
<P>
There is a complete description of the PCRE API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page.

View File

@ -1,56 +0,0 @@
<html>
<head>
<title>pcre_config specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page, in case the
conversion went wrong.<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_config(int <i>what</i>, void *<i>where</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function makes it possible for a client program to find out which optional
features are available in the version of the PCRE library it is using. Its
arguments are as follows:
</P>
<P>
<pre>
<i>what</i> A code specifying what information is required
<i>where</i> Points to where to put the data
</PRE>
</P>
<P>
The available codes are:
</P>
<P>
<pre>
PCRE_CONFIG_LINK_SIZE Internal link size: 2, 3, or 4
PCRE_CONFIG_MATCH_LIMIT Internal resource limit
PCRE_CONFIG_NEWLINE Value of the newline character
PCRE_CONFIG_POSIX_MALLOC_THRESHOLD
Threshold of return slots, above
which <b>malloc()</b> is used by
the POSIX API
PCRE_CONFIG_STACKRECURSE Recursion implementation (1=stack 0=heap)
PCRE_CONFIG_UTF8 Availability of UTF-8 support (1=yes 0=no)
</PRE>
</P>
<P>
The function yields 0 on success or PCRE_ERROR_BADOPTION otherwise.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page, and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.

View File

@ -1,46 +0,0 @@
<html>
<head>
<title>pcre_copy_named_substring specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page, in case the
conversion went wrong.<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_copy_named_substring(const pcre *<i>code</i>,</b>
<b>const char *<i>subject</i>, int *<i>ovector</i>,</b>
<b>int <i>stringcount</i>, const char *<i>stringname</i>,</b>
<b>char *<i>buffer</i>, int <i>buffersize</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This is a convenience function for extracting a captured substring, identified
by name, into a given buffer. The arguments are:
</P>
<P>
<pre>
<i>code</i> Pattern that was successfully matched
<i>subject</i> Subject that has been successfully matched
<i>ovector</i> Offset vector that <b>pcre_exec()</b> used
<i>stringcount</i> Value returned by <b>pcre_exec()</b>
<i>stringname</i> Name of the required substring
<i>buffer</i> Buffer to receive the string
<i>buffersize</i> Size of buffer
</PRE>
</P>
<P>
The yield is the length of the substring, PCRE_ERROR_NOMEMORY if the buffer was
too small, or PCRE_ERROR_NOSUBSTRING if the string name is invalid.
</P>
<P>
There is a complete description of the PCRE API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page.

View File

@ -1,44 +0,0 @@
<html>
<head>
<title>pcre_copy_substring specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page, in case the
conversion went wrong.<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_copy_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b>
<b>int <i>stringcount</i>, int <i>stringnumber</i>, char *<i>buffer</i>,</b>
<b>int <i>buffersize</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This is a convenience function for extracting a captured substring into a given
buffer. The arguments are:
</P>
<P>
<pre>
<i>subject</i> Subject that has been successfully matched
<i>ovector</i> Offset vector that <b>pcre_exec()</b> used
<i>stringcount</i> Value returned by <b>pcre_exec()</b>
<i>stringnumber</i> Number of the required substring
<i>buffer</i> Buffer to receive the string
<i>buffersize</i> Size of buffer
</PRE>
</P>
<P>
The yield is the legnth of the string, PCRE_ERROR_NOMEMORY if the buffer was
too small, or PCRE_ERROR_NOSUBSTRING if the string number is invalid.
</P>
<P>
There is a complete description of the PCRE API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page.

View File

@ -1,58 +0,0 @@
<html>
<head>
<title>pcre_exec specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page, in case the
conversion went wrong.<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_exec(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
<b>const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
<b>int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function matches a compiled regular expression against a given subject
string, and returns offsets to capturing subexpressions. Its arguments are:
</P>
<P>
<pre>
<i>code</i> Points to the compiled pattern
<i>extra</i> Points to an associated <b>pcre_extra</b> structure,
or is NULL
<i>subject</i> Points to the subject string
<i>length</i> Length of the subject string, in bytes
<i>startoffset</i> Offset in bytes in the subject at which to
start matching
<i>options</i> Option bits
<i>ovector</i> Points to a vector of ints for result offsets
<i>ovecsize</i> Size of the vector (a multiple of 3)
</PRE>
</P>
<P>
The options are:
</P>
<P>
<pre>
PCRE_ANCHORED Match only at the first position
PCRE_NOTBOL Subject is not the beginning of a line
PCRE_NOTEOL Subject is not the end of a line
PCRE_NOTEMPTY An empty string is not a valid match
PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
validity (only relevant if PCRE_UTF8
was set at compile time)
</PRE>
</P>
<P>
There is a complete description of the PCRE API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page.

View File

@ -1,29 +0,0 @@
<html>
<head>
<title>pcre_free_substring specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page, in case the
conversion went wrong.<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>void pcre_free_substring(const char *<i>stringptr</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This is a convenience function for freeing the store obtained by a previous
call to <b>pcre_get_substring()</b> or <b>pcre_get_named_substring()</b>. Its
only argument is a pointer to the string.
</P>
<P>
There is a complete description of the PCRE API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page.

View File

@ -1,29 +0,0 @@
<html>
<head>
<title>pcre_free_substring_list specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page, in case the
conversion went wrong.<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>void pcre_free_substring_list(const char **<i>stringptr</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This is a convenience function for freeing the store obtained by a previous
call to <b>pcre_get_substring_list()</b>. Its only argument is a pointer to the
list of string pointers.
</P>
<P>
There is a complete description of the PCRE API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page.

View File

@ -1,68 +0,0 @@
<html>
<head>
<title>pcre_fullinfo specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page, in case the
conversion went wrong.<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_fullinfo(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
<b>int <i>what</i>, void *<i>where</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function returns information about a compiled pattern. Its arguments are:
</P>
<P>
<pre>
<i>code</i> Compiled regular expression
<i>extra</i> Result of <b>pcre_study()</b> or NULL
<i>what</i> What information is required
<i>where</i> Where to put the information
</PRE>
</P>
<P>
The following information is available:
</P>
<P>
<pre>
PCRE_INFO_BACKREFMAX Number of highest back reference
PCRE_INFO_CAPTURECOUNT Number of capturing subpatterns
PCRE_INFO_FIRSTBYTE Fixed first byte for a match, or
-1 for start of string
or after newline, or
-2 otherwise
PCRE_INFO_FIRSTTABLE Table of first bytes
(after studying)
PCRE_INFO_LASTLITERAL Literal last byte required
PCRE_INFO_NAMECOUNT Number of named subpatterns
PCRE_INFO_NAMEENTRYSIZE Size of name table entry
PCRE_INFO_NAMETABLE Pointer to name table
PCRE_INFO_OPTIONS Options used for compilation
PCRE_INFO_SIZE Size of compiled pattern
</PRE>
</P>
<P>
The yield of the function is zero on success or:
</P>
<P>
<pre>
PCRE_ERROR_NULL the argument <i>code</i> was NULL
the argument <i>where</i> was NULL
PCRE_ERROR_BADMAGIC the "magic number" was not found
PCRE_ERROR_BADOPTION the value of <i>what</i> was invalid
</PRE>
</P>
<P>
There is a complete description of the PCRE API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page.

View File

@ -1,46 +0,0 @@
<html>
<head>
<title>pcre_get_named_substring specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page, in case the
conversion went wrong.<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_get_named_substring(const pcre *<i>code</i>,</b>
<b>const char *<i>subject</i>, int *<i>ovector</i>,</b>
<b>int <i>stringcount</i>, const char *<i>stringname</i>,</b>
<b>const char **<i>stringptr</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This is a convenience function for extracting a captured substring by name. The
arguments are:
</P>
<P>
<pre>
<i>code</i> Compiled pattern
<i>subject</i> Subject that has been successfully matched
<i>ovector</i> Offset vector that <b>pcre_exec()</b> used
<i>stringcount</i> Value returned by <b>pcre_exec()</b>
<i>stringname</i> Name of the required substring
<i>stringptr</i> Where to put the string pointer
</PRE>
</P>
<P>
The yield is the length of the extracted substring, PCRE_ERROR_NOMEMORY if
sufficient memory could not be obtained, or PCRE_ERROR_NOSUBSTRING if the
string name is invalid.
</P>
<P>
There is a complete description of the PCRE API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page.

View File

@ -1,39 +0,0 @@
<html>
<head>
<title>pcre_get_stringnumber specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page, in case the
conversion went wrong.<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_get_stringnumber(const pcre *<i>code</i>,</b>
<b>const char *<i>name</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This convenience function finds the number of a named substring capturing
parenthesis in a compiled pattern. Its arguments are:
</P>
<P>
<pre>
<i>code</i> Compiled regular expression
<i>name</i> Name whose number is required
</PRE>
</P>
<P>
The yield of the function is the number of the parenthesis if the name is
found, or PCRE_ERROR_NOSUBSTRING otherwise.
</P>
<P>
There is a complete description of the PCRE API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page.

View File

@ -1,44 +0,0 @@
<html>
<head>
<title>pcre_get_substring specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page, in case the
conversion went wrong.<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_get_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b>
<b>int <i>stringcount</i>, int <i>stringnumber</i>,</b>
<b>const char **<i>stringptr</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This is a convenience function for extracting a captured substring. The
arguments are:
</P>
<P>
<pre>
<i>subject</i> Subject that has been successfully matched
<i>ovector</i> Offset vector that <b>pcre_exec()</b> used
<i>stringcount</i> Value returned by <b>pcre_exec()</b>
<i>stringnumber</i> Number of the required substring
<i>stringptr</i> Where to put the string pointer
</PRE>
</P>
<P>
The yield is the length of the substring, PCRE_ERROR_NOMEMORY if sufficient
memory could not be obtained, or PCRE_ERROR_NOSUBSTRING if the string number is
invalid.
</P>
<P>
There is a complete description of the PCRE API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page.

View File

@ -1,41 +0,0 @@
<html>
<head>
<title>pcre_get_substring_list specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page, in case the
conversion went wrong.<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_get_substring_list(const char *<i>subject</i>,</b>
<b>int *<i>ovector</i>, int <i>stringcount</i>, const char ***<i>listptr</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This is a convenience function for extracting a list of all the captured
substrings. The arguments are:
</P>
<P>
<pre>
<i>subject</i> Subject that has been successfully matched
<i>ovector</i> Offset vector that <b>pcre_exec</b> used
<i>stringcount</i> Value returned by <b>pcre_exec</b>
<i>listptr</i> Where to put a pointer to the list
</PRE>
</P>
<P>
The yield is zero on success or PCRE_ERROR_NOMEMORY if sufficient memory could
not be obtained.
</P>
<P>
There is a complete description of the PCRE API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page.

View File

@ -1,28 +0,0 @@
<html>
<head>
<title>pcre_info specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page, in case the
conversion went wrong.<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_info(const pcre *<i>code</i>, int *<i>optptr</i>, int</b>
<b>*<i>firstcharptr</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function is obsolete. You should be using <b>pcre_fullinfo()</b> instead.
</P>
<P>
There is a complete description of the PCRE API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page.

View File

@ -1,31 +0,0 @@
<html>
<head>
<title>pcre_maketables specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page, in case the
conversion went wrong.<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>const unsigned char *pcre_maketables(void);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function builds a set of character tables which can be passed to
<b>pcre_compile()</b> to override PCRE's internal, built-in tables (which were
made by <b>pcre_maketables()</b> when PCRE was compiled). You might want to do
this if you are using a non-standard locale. The function yields a pointer to
the tables.
</P>
<P>
There is a complete description of the PCRE API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page.

View File

@ -1,45 +0,0 @@
<html>
<head>
<title>pcre_study specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page, in case the
conversion went wrong.<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>pcre_extra *pcre_study(const pcre *<i>code</i>, int <i>options</i>,</b>
<b>const char **<i>errptr</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function studies a compiled pattern, to see if additional information can
be extracted that might speed up matching. Its arguments are:
</P>
<P>
<pre>
<i>code</i> A compiled regular expression
<i>options</i> Options for <b>pcre_study()</b>
<i>errptr</i> Where to put an error message
</PRE>
</P>
<P>
If the function returns NULL, either it could not find any additional
information, or there was an error. You can tell the difference by looking at
the error value. It is NULL in first case.
</P>
<P>
There are currently no options defined; the value of the second argument should
always be zero.
</P>
<P>
There is a complete description of the PCRE API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page.

View File

@ -1,28 +0,0 @@
<html>
<head>
<title>pcre_version specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page, in case the
conversion went wrong.<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>char *pcre_version(void);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function returns a character string that gives the version number of the
PCRE library, and its date of release.
</P>
<P>
There is a complete description of the PCRE API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page.

File diff suppressed because it is too large Load Diff

View File

@ -1,189 +0,0 @@
<html>
<head>
<title>pcrebuild specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page, in case the
conversion went wrong.<br>
<ul>
<li><a name="TOC1" href="#SEC1">PCRE BUILD-TIME OPTIONS</a>
<li><a name="TOC2" href="#SEC2">UTF-8 SUPPORT</a>
<li><a name="TOC3" href="#SEC3">CODE VALUE OF NEWLINE</a>
<li><a name="TOC4" href="#SEC4">BUILDING SHARED AND STATIC LIBRARIES</a>
<li><a name="TOC5" href="#SEC5">POSIX MALLOC USAGE</a>
<li><a name="TOC6" href="#SEC6">LIMITING PCRE RESOURCE USAGE</a>
<li><a name="TOC7" href="#SEC7">HANDLING VERY LARGE PATTERNS</a>
<li><a name="TOC8" href="#SEC8">AVOIDING EXCESSIVE STACK USAGE</a>
<li><a name="TOC9" href="#SEC9">USING EBCDIC CODE</a>
</ul>
<br><a name="SEC1" href="#TOC1">PCRE BUILD-TIME OPTIONS</a><br>
<P>
This document describes the optional features of PCRE that can be selected when
the library is compiled. They are all selected, or deselected, by providing
options to the <b>configure</b> script which is run before the <b>make</b>
command. The complete list of options for <b>configure</b> (which includes the
standard ones such as the selection of the installation directory) can be
obtained by running
</P>
<P>
<pre>
./configure --help
</PRE>
</P>
<P>
The following sections describe certain options whose names begin with --enable
or --disable. These settings specify changes to the defaults for the
<b>configure</b> command. Because of the way that <b>configure</b> works,
--enable and --disable always come in pairs, so the complementary option always
exists as well, but as it specifies the default, it is not described.
</P>
<br><a name="SEC2" href="#TOC1">UTF-8 SUPPORT</a><br>
<P>
To build PCRE with support for UTF-8 character strings, add
</P>
<P>
<pre>
--enable-utf8
</PRE>
</P>
<P>
to the <b>configure</b> command. Of itself, this does not make PCRE treat
strings as UTF-8. As well as compiling PCRE with this option, you also have
have to set the PCRE_UTF8 option when you call the <b>pcre_compile()</b>
function.
</P>
<br><a name="SEC3" href="#TOC1">CODE VALUE OF NEWLINE</a><br>
<P>
By default, PCRE treats character 10 (linefeed) as the newline character. This
is the normal newline character on Unix-like systems. You can compile PCRE to
use character 13 (carriage return) instead by adding
</P>
<P>
<pre>
--enable-newline-is-cr
</PRE>
</P>
<P>
to the <b>configure</b> command. For completeness there is also a
--enable-newline-is-lf option, which explicitly specifies linefeed as the
newline character.
</P>
<br><a name="SEC4" href="#TOC1">BUILDING SHARED AND STATIC LIBRARIES</a><br>
<P>
The PCRE building process uses <b>libtool</b> to build both shared and static
Unix libraries by default. You can suppress one of these by adding one of
</P>
<P>
<pre>
--disable-shared
--disable-static
</PRE>
</P>
<P>
to the <b>configure</b> command, as required.
</P>
<br><a name="SEC5" href="#TOC1">POSIX MALLOC USAGE</a><br>
<P>
When PCRE is called through the POSIX interface (see the <b>pcreposix</b>
documentation), additional working storage is required for holding the pointers
to capturing substrings because PCRE requires three integers per substring,
whereas the POSIX interface provides only two. If the number of expected
substrings is small, the wrapper function uses space on the stack, because this
is faster than using <b>malloc()</b> for each call. The default threshold above
which the stack is no longer used is 10; it can be changed by adding a setting
such as
</P>
<P>
<pre>
--with-posix-malloc-threshold=20
</PRE>
</P>
<P>
to the <b>configure</b> command.
</P>
<br><a name="SEC6" href="#TOC1">LIMITING PCRE RESOURCE USAGE</a><br>
<P>
Internally, PCRE has a function called <b>match()</b> which it calls repeatedly
(possibly recursively) when performing a matching operation. By limiting the
number of times this function may be called, a limit can be placed on the
resources used by a single call to <b>pcre_exec()</b>. The limit can be changed
at run time, as described in the <b>pcreapi</b> documentation. The default is 10
million, but this can be changed by adding a setting such as
</P>
<P>
<pre>
--with-match-limit=500000
</PRE>
</P>
<P>
to the <b>configure</b> command.
</P>
<br><a name="SEC7" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>
<P>
Within a compiled pattern, offset values are used to point from one part to
another (for example, from an opening parenthesis to an alternation
metacharacter). By default two-byte values are used for these offsets, leading
to a maximum size for a compiled pattern of around 64K. This is sufficient to
handle all but the most gigantic patterns. Nevertheless, some people do want to
process enormous patterns, so it is possible to compile PCRE to use three-byte
or four-byte offsets by adding a setting such as
</P>
<P>
<pre>
--with-link-size=3
</PRE>
</P>
<P>
to the <b>configure</b> command. The value given must be 2, 3, or 4. Using
longer offsets slows down the operation of PCRE because it has to load
additional bytes when handling them.
</P>
<P>
If you build PCRE with an increased link size, test 2 (and test 5 if you are
using UTF-8) will fail. Part of the output of these tests is a representation
of the compiled pattern, and this changes with the link size.
</P>
<br><a name="SEC8" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br>
<P>
PCRE implements backtracking while matching by making recursive calls to an
internal function called <b>match()</b>. In environments where the size of the
stack is limited, this can severely limit PCRE's operation. (The Unix
environment does not usually suffer from this problem.) An alternative approach
that uses memory from the heap to remember data, instead of using recursive
function calls, has been implemented to work round this problem. If you want to
build a version of PCRE that works this way, add
</P>
<P>
<pre>
--disable-stack-for-recursion
</PRE>
</P>
<P>
to the <b>configure</b> command. With this configuration, PCRE will use the
<b>pcre_stack_malloc</b> and <b>pcre_stack_free</b> variables to call memory
management functions. Separate functions are provided because the usage is very
predictable: the block sizes requested are always the same, and the blocks are
always freed in reverse order. A calling program might be able to implement
optimized functions that perform better than the standard <b>malloc()</b> and
<b>free()</b> functions. PCRE runs noticeably more slowly when built in this
way.
</P>
<br><a name="SEC9" href="#TOC1">USING EBCDIC CODE</a><br>
<P>
PCRE assumes by default that it will run in an environment where the character
code is ASCII (or UTF-8, which is a superset of ASCII). PCRE can, however, be
compiled to run in an EBCDIC environment by adding
</P>
<P>
<pre>
--enable-ebcdic
</PRE>
</P>
<P>
to the <b>configure</b> command.
</P>
<P>
Last updated: 09 December 2003
<br>
Copyright &copy; 1997-2003 University of Cambridge.

View File

@ -1,117 +0,0 @@
<html>
<head>
<title>pcrecallout specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page, in case the
conversion went wrong.<br>
<ul>
<li><a name="TOC1" href="#SEC1">PCRE CALLOUTS</a>
<li><a name="TOC2" href="#SEC2">RETURN VALUES</a>
</ul>
<br><a name="SEC1" href="#TOC1">PCRE CALLOUTS</a><br>
<P>
<b>int (*pcre_callout)(pcre_callout_block *);</b>
</P>
<P>
PCRE provides a feature called "callout", which is a means of temporarily
passing control to the caller of PCRE in the middle of pattern matching. The
caller of PCRE provides an external function by putting its entry point in the
global variable <i>pcre_callout</i>. By default, this variable contains NULL,
which disables all calling out.
</P>
<P>
Within a regular expression, (?C) indicates the points at which the external
function is to be called. Different callout points can be identified by putting
a number less than 256 after the letter C. The default value is zero.
For example, this pattern has two callout points:
</P>
<P>
<pre>
(?C1)\dabc(?C2)def
</PRE>
</P>
<P>
During matching, when PCRE reaches a callout point (and <i>pcre_callout</i> is
set), the external function is called. Its only argument is a pointer to a
<b>pcre_callout</b> block. This contains the following variables:
</P>
<P>
<pre>
int <i>version</i>;
int <i>callout_number</i>;
int *<i>offset_vector</i>;
const char *<i>subject</i>;
int <i>subject_length</i>;
int <i>start_match</i>;
int <i>current_position</i>;
int <i>capture_top</i>;
int <i>capture_last</i>;
void *<i>callout_data</i>;
</PRE>
</P>
<P>
The <i>version</i> field is an integer containing the version number of the
block format. The current version is zero. The version number may change in
future if additional fields are added, but the intention is never to remove any
of the existing fields.
</P>
<P>
The <i>callout_number</i> field contains the number of the callout, as compiled
into the pattern (that is, the number after ?C).
</P>
<P>
The <i>offset_vector</i> field is a pointer to the vector of offsets that was
passed by the caller to <b>pcre_exec()</b>. The contents can be inspected in
order to extract substrings that have been matched so far, in the same way as
for extracting substrings after a match has completed.
</P>
<P>
The <i>subject</i> and <i>subject_length</i> fields contain copies the values
that were passed to <b>pcre_exec()</b>.
</P>
<P>
The <i>start_match</i> field contains the offset within the subject at which the
current match attempt started. If the pattern is not anchored, the callout
function may be called several times for different starting points.
</P>
<P>
The <i>current_position</i> field contains the offset within the subject of the
current match pointer.
</P>
<P>
The <i>capture_top</i> field contains one more than the number of the highest
numbered captured substring so far. If no substrings have been captured,
the value of <i>capture_top</i> is one.
</P>
<P>
The <i>capture_last</i> field contains the number of the most recently captured
substring.
</P>
<P>
The <i>callout_data</i> field contains a value that is passed to
<b>pcre_exec()</b> by the caller specifically so that it can be passed back in
callouts. It is passed in the <i>pcre_callout</i> field of the <b>pcre_extra</b>
data structure. If no such data was passed, the value of <i>callout_data</i> in
a <b>pcre_callout</b> block is NULL. There is a description of the
<b>pcre_extra</b> structure in the <b>pcreapi</b> documentation.
</P>
<br><a name="SEC2" href="#TOC1">RETURN VALUES</a><br>
<P>
The callout function returns an integer. If the value is zero, matching
proceeds as normal. If the value is greater than zero, matching fails at the
current point, but backtracking to test other possibilities goes ahead, just as
if a lookahead assertion had failed. If the value is less than zero, the match
is abandoned, and <b>pcre_exec()</b> returns the value.
</P>
<P>
Negative values should normally be chosen from the set of PCRE_ERROR_xxx
values. In particular, PCRE_ERROR_NOMATCH forces a standard "no match" failure.
The error number PCRE_ERROR_CALLOUT is reserved for use by callout functions;
it will never be used by PCRE itself.
</P>
<P>
Last updated: 21 January 2003
<br>
Copyright &copy; 1997-2003 University of Cambridge.

View File

@ -1,136 +0,0 @@
<html>
<head>
<title>pcrecompat specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page, in case the
conversion went wrong.<br>
<ul>
<li><a name="TOC1" href="#SEC1">DIFFERENCES FROM PERL</a>
</ul>
<br><a name="SEC1" href="#TOC1">DIFFERENCES FROM PERL</a><br>
<P>
This document describes the differences in the ways that PCRE and Perl handle
regular expressions. The differences described here are with respect to Perl
5.8.
</P>
<P>
1. PCRE does not have full UTF-8 support. Details of what it does have are
given in the
<a href="pcre.html#utf8support">section on UTF-8 support</a>
in the main
<a href="pcre.html"><b>pcre</b></a>
page.
</P>
<P>
2. PCRE does not allow repeat quantifiers on lookahead assertions. Perl permits
them, but they do not mean what you might think. For example, (?!a){3} does
not assert that the next three characters are not "a". It just asserts that the
next character is not "a" three times.
</P>
<P>
3. Capturing subpatterns that occur inside negative lookahead assertions are
counted, but their entries in the offsets vector are never set. Perl sets its
numerical variables from any such patterns that are matched before the
assertion fails to match something (thereby succeeding), but only if the
negative lookahead assertion contains just one branch.
</P>
<P>
4. Though binary zero characters are supported in the subject string, they are
not allowed in a pattern string because it is passed as a normal C string,
terminated by zero. The escape sequence "\0" can be used in the pattern to
represent a binary zero.
</P>
<P>
5. The following Perl escape sequences are not supported: \l, \u, \L,
\U, \P, \p, \N, and \X. In fact these are implemented by Perl's general
string-handling and are not part of its pattern matching engine. If any of
these are encountered by PCRE, an error is generated.
</P>
<P>
6. PCRE does support the \Q...\E escape for quoting substrings. Characters in
between are treated as literals. This is slightly different from Perl in that $
and @ are also handled as literals inside the quotes. In Perl, they cause
variable interpolation (but of course PCRE does not have variables). Note the
following examples:
</P>
<P>
<pre>
Pattern PCRE matches Perl matches
</PRE>
</P>
<P>
<pre>
\Qabc$xyz\E abc$xyz abc followed by the
contents of $xyz
\Qabc\$xyz\E abc\$xyz abc\$xyz
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz
</PRE>
</P>
<P>
The \Q...\E sequence is recognized both inside and outside character classes.
</P>
<P>
7. Fairly obviously, PCRE does not support the (?{code}) and (?p{code})
constructions. However, there is some experimental support for recursive
patterns using the non-Perl items (?R), (?number) and (?P&#62;name). Also, the PCRE
"callout" feature allows an external function to be called during pattern
matching.
</P>
<P>
8. There are some differences that are concerned with the settings of captured
strings when part of a pattern is repeated. For example, matching "aba" against
the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b".
</P>
<P>
9. PCRE provides some extensions to the Perl regular expression facilities:
</P>
<P>
(a) Although lookbehind assertions must match fixed length strings, each
alternative branch of a lookbehind assertion can match a different length of
string. Perl requires them all to have the same length.
</P>
<P>
(b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $
meta-character matches only at the very end of the string.
</P>
<P>
&copy; If PCRE_EXTRA is set, a backslash followed by a letter with no special
meaning is faulted.
</P>
<P>
(d) If PCRE_UNGREEDY is set, the greediness of the repetition quantifiers is
inverted, that is, by default they are not greedy, but if followed by a
question mark they are.
</P>
<P>
(e) PCRE_ANCHORED can be used to force a pattern to be tried only at the first
matching position in the subject string.
</P>
<P>
(f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NO_AUTO_CAPTURE
options for <b>pcre_exec()</b> have no Perl equivalents.
</P>
<P>
(g) The (?R), (?number), and (?P&#62;name) constructs allows for recursive pattern
matching (Perl can do this using the (?p{code}) construct, which PCRE cannot
support.)
</P>
<P>
(h) PCRE supports named capturing substrings, using the Python syntax.
</P>
<P>
(i) PCRE supports the possessive quantifier "++" syntax, taken from Sun's Java
package.
</P>
<P>
(j) The (R) condition, for testing recursion, is a PCRE extension.
</P>
<P>
(k) The callout facility is PCRE-specific.
</P>
<P>
Last updated: 09 December 2003
<br>
Copyright &copy; 1997-2003 University of Cambridge.

View File

@ -1,153 +0,0 @@
<html>
<head>
<title>pcregrep specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page, in case the
conversion went wrong.<br>
<ul>
<li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
<li><a name="TOC3" href="#SEC3">OPTIONS</a>
<li><a name="TOC4" href="#SEC4">LONG OPTIONS</a>
<li><a name="TOC5" href="#SEC5">DIAGNOSTICS</a>
<li><a name="TOC6" href="#SEC6">AUTHOR</a>
</ul>
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
<P>
<b>pcregrep [-Vcfhilnrsuvx] [long options] [pattern] [file1 file2 ...]</b>
</P>
<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
<P>
<b>pcregrep</b> searches files for character patterns, in the same way as other
grep commands do, but it uses the PCRE regular expression library to support
patterns that are compatible with the regular expressions of Perl 5. See
<a href="pcrepattern.html"><b>pcrepattern</b></a>
for a full description of syntax and semantics of the regular expressions that
PCRE supports.
</P>
<P>
A pattern must be specified on the command line unless the <b>-f</b> option is
used (see below).
</P>
<P>
If no files are specified, <b>pcregrep</b> reads the standard input. By default,
each line that matches the pattern is copied to the standard output, and if
there is more than one file, the file name is printed before each line of
output. However, there are options that can change how <b>pcregrep</b> behaves.
</P>
<P>
Lines are limited to BUFSIZ characters. BUFSIZ is defined in <b>&#60;stdio.h&#62;</b>.
The newline character is removed from the end of each line before it is matched
against the pattern.
</P>
<br><a name="SEC3" href="#TOC1">OPTIONS</a><br>
<P>
<b>-V</b>
Write the version number of the PCRE library being used to the standard error
stream.
</P>
<P>
<b>-c</b>
Do not print individual lines; instead just print a count of the number of
lines that would otherwise have been printed. If several files are given, a
count is printed for each of them.
</P>
<P>
<b>-f</b><i>filename</i>
Read a number of patterns from the file, one per line, and match all of them
against each line of input. A line is output if any of the patterns match it.
When <b>-f</b> is used, no pattern is taken from the command line; all arguments
are treated as file names. There is a maximum of 100 patterns. Trailing white
space is removed, and blank lines are ignored. An empty file contains no
patterns and therefore matches nothing.
</P>
<P>
<b>-h</b>
Suppress printing of filenames when searching multiple files.
</P>
<P>
<b>-i</b>
Ignore upper/lower case distinctions during comparisons.
</P>
<P>
<b>-l</b>
Instead of printing lines from the files, just print the names of the files
containing lines that would have been printed. Each file name is printed
once, on a separate line.
</P>
<P>
<b>-n</b>
Precede each line by its line number in the file.
</P>
<P>
<b>-r</b>
If any file is a directory, recursively scan the files it contains. Without
<b>-r</b> a directory is scanned as a normal file.
</P>
<P>
<b>-s</b>
Work silently, that is, display nothing except error messages.
The exit status indicates whether any matches were found.
</P>
<P>
<b>-u</b>
Operate in UTF-8 mode. This option is available only if PCRE has been compiled
with UTF-8 support. Both the pattern and each subject line are assumed to be
valid strings of UTF-8 characters.
</P>
<P>
<b>-v</b>
Invert the sense of the match, so that lines which do <i>not</i> match the
pattern are now the ones that are found.
</P>
<P>
<b>-x</b>
Force the pattern to be anchored (it must start matching at the beginning of
the line) and in addition, require it to match the entire line. This is
equivalent to having ^ and $ characters at the start and end of each
alternative branch in the regular expression.
</P>
<br><a name="SEC4" href="#TOC1">LONG OPTIONS</a><br>
<P>
Long forms of all the options are available, as in GNU grep. They are shown in
the following table:
</P>
<P>
<pre>
-c --count
-h --no-filename
-i --ignore-case
-l --files-with-matches
-n --line-number
-r --recursive
-s --no-messages
-u --utf-8
-V --version
-v --invert-match
-x --line-regex
-x --line-regexp
</PRE>
</P>
<P>
In addition, --file=<i>filename</i> is equivalent to -f<i>filename</i>, and
--help shows the list of options and then exits.
</P>
<br><a name="SEC5" href="#TOC1">DIAGNOSTICS</a><br>
<P>
Exit status is 0 if any matches were found, 1 if no matches were found, and 2
for syntax errors or inacessible files (even if matches were found).
</P>
<br><a name="SEC6" href="#TOC1">AUTHOR</a><br>
<P>
Philip Hazel &#60;ph10@cam.ac.uk&#62;
<br>
University Computing Service
<br>
Cambridge CB2 3QG, England.
</P>
<P>
Last updated: 03 February 2003
<br>
Copyright &copy; 1997-2003 University of Cambridge.

File diff suppressed because it is too large Load Diff

View File

@ -1,93 +0,0 @@
<html>
<head>
<title>pcreperform specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page, in case the
conversion went wrong.<br>
<ul>
<li><a name="TOC1" href="#SEC1">PCRE PERFORMANCE</a>
</ul>
<br><a name="SEC1" href="#TOC1">PCRE PERFORMANCE</a><br>
<P>
Certain items that may appear in regular expression patterns are more efficient
than others. It is more efficient to use a character class like [aeiou] than a
set of alternatives such as (a|e|i|o|u). In general, the simplest construction
that provides the required behaviour is usually the most efficient. Jeffrey
Friedl's book contains a lot of discussion about optimizing regular expressions
for efficient performance.
</P>
<P>
When a pattern begins with .* not in parentheses, or in parentheses that are
not the subject of a backreference, and the PCRE_DOTALL option is set, the
pattern is implicitly anchored by PCRE, since it can match only at the start of
a subject string. However, if PCRE_DOTALL is not set, PCRE cannot make this
optimization, because the . metacharacter does not then match a newline, and if
the subject string contains newlines, the pattern may match from the character
immediately following one of them instead of from the very start. For example,
the pattern
</P>
<P>
<pre>
.*second
</PRE>
</P>
<P>
matches the subject "first\nand second" (where \n stands for a newline
character), with the match starting at the seventh character. In order to do
this, PCRE has to retry the match starting after every newline in the subject.
</P>
<P>
If you are using such a pattern with subject strings that do not contain
newlines, the best performance is obtained by setting PCRE_DOTALL, or starting
the pattern with ^.* to indicate explicit anchoring. That saves PCRE from
having to scan along the subject looking for a newline to restart at.
</P>
<P>
Beware of patterns that contain nested indefinite repeats. These can take a
long time to run when applied to a string that does not match. Consider the
pattern fragment
</P>
<P>
<pre>
(a+)*
</PRE>
</P>
<P>
This can match "aaaa" in 33 different ways, and this number increases very
rapidly as the string gets longer. (The * repeat can match 0, 1, 2, 3, or 4
times, and for each of those cases other than 0, the + repeats can match
different numbers of times.) When the remainder of the pattern is such that the
entire match is going to fail, PCRE has in principle to try every possible
variation, and this can take an extremely long time.
</P>
<P>
An optimization catches some of the more simple cases such as
</P>
<P>
<pre>
(a+)*b
</PRE>
</P>
<P>
where a literal character follows. Before embarking on the standard matching
procedure, PCRE checks that there is a "b" later in the subject string, and if
there is not, it fails the match immediately. However, when there is no
following literal this optimization cannot be used. You can see the difference
by comparing the behaviour of
</P>
<P>
<pre>
(a+)*\d
</PRE>
</P>
<P>
with the pattern above. The former gives a failure almost instantly when
applied to a whole line of "a" characters, whereas the latter takes an
appreciable time with strings longer than about 20 characters.
</P>
<P>
Last updated: 03 February 2003
<br>
Copyright &copy; 1997-2003 University of Cambridge.

View File

@ -1,237 +0,0 @@
<html>
<head>
<title>pcreposix specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page, in case the
conversion went wrong.<br>
<ul>
<li><a name="TOC1" href="#SEC1">SYNOPSIS OF POSIX API</a>
<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
<li><a name="TOC3" href="#SEC3">COMPILING A PATTERN</a>
<li><a name="TOC4" href="#SEC4">MATCHING NEWLINE CHARACTERS</a>
<li><a name="TOC5" href="#SEC5">MATCHING A PATTERN</a>
<li><a name="TOC6" href="#SEC6">ERROR MESSAGES</a>
<li><a name="TOC7" href="#SEC7">STORAGE</a>
<li><a name="TOC8" href="#SEC8">AUTHOR</a>
</ul>
<br><a name="SEC1" href="#TOC1">SYNOPSIS OF POSIX API</a><br>
<P>
<b>#include &#60;pcreposix.h&#62;</b>
</P>
<P>
<b>int regcomp(regex_t *<i>preg</i>, const char *<i>pattern</i>,</b>
<b>int <i>cflags</i>);</b>
</P>
<P>
<b>int regexec(regex_t *<i>preg</i>, const char *<i>string</i>,</b>
<b>size_t <i>nmatch</i>, regmatch_t <i>pmatch</i>[], int <i>eflags</i>);</b>
</P>
<P>
<b>size_t regerror(int <i>errcode</i>, const regex_t *<i>preg</i>,</b>
<b>char *<i>errbuf</i>, size_t <i>errbuf_size</i>);</b>
</P>
<P>
<b>void regfree(regex_t *<i>preg</i>);</b>
</P>
<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
<P>
This set of functions provides a POSIX-style API to the PCRE regular expression
package. See the
<a href="pcreapi.html"><b>pcreapi</b></a>
documentation for a description of the native API, which contains additional
functionality.
</P>
<P>
The functions described here are just wrapper functions that ultimately call
the PCRE native API. Their prototypes are defined in the <b>pcreposix.h</b>
header file, and on Unix systems the library itself is called
<b>pcreposix.a</b>, so can be accessed by adding <b>-lpcreposix</b> to the
command for linking an application which uses them. Because the POSIX functions
call the native ones, it is also necessary to add \fR-lpcre\fR.
</P>
<P>
I have implemented only those option bits that can be reasonably mapped to PCRE
native options. In addition, the options REG_EXTENDED and REG_NOSUB are defined
with the value zero. They have no effect, but since programs that are written
to the POSIX interface often use them, this makes it easier to slot in PCRE as
a replacement library. Other POSIX options are not even defined.
</P>
<P>
When PCRE is called via these functions, it is only the API that is POSIX-like
in style. The syntax and semantics of the regular expressions themselves are
still those of Perl, subject to the setting of various PCRE options, as
described below. "POSIX-like in style" means that the API approximates to the
POSIX definition; it is not fully POSIX-compatible, and in multi-byte encoding
domains it is probably even less compatible.
</P>
<P>
The header for these functions is supplied as <b>pcreposix.h</b> to avoid any
potential clash with other POSIX libraries. It can, of course, be renamed or
aliased as <b>regex.h</b>, which is the "correct" name. It provides two
structure types, <i>regex_t</i> for compiled internal forms, and
<i>regmatch_t</i> for returning captured substrings. It also defines some
constants whose names start with "REG_"; these are used for setting options and
identifying error codes.
</P>
<br><a name="SEC3" href="#TOC1">COMPILING A PATTERN</a><br>
<P>
The function <b>regcomp()</b> is called to compile a pattern into an
internal form. The pattern is a C string terminated by a binary zero, and
is passed in the argument <i>pattern</i>. The <i>preg</i> argument is a pointer
to a regex_t structure which is used as a base for storing information about
the compiled expression.
</P>
<P>
The argument <i>cflags</i> is either zero, or contains one or more of the bits
defined by the following macros:
</P>
<P>
<pre>
REG_ICASE
</PRE>
</P>
<P>
The PCRE_CASELESS option is set when the expression is passed for compilation
to the native function.
</P>
<P>
<pre>
REG_NEWLINE
</PRE>
</P>
<P>
The PCRE_MULTILINE option is set when the expression is passed for compilation
to the native function. Note that this does <i>not</i> mimic the defined POSIX
behaviour for REG_NEWLINE (see the following section).
</P>
<P>
In the absence of these flags, no options are passed to the native function.
This means the the regex is compiled with PCRE default semantics. In
particular, the way it handles newline characters in the subject string is the
Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only
<i>some</i> of the effects specified for REG_NEWLINE. It does not affect the way
newlines are matched by . (they aren't) or by a negative class such as [^a]
(they are).
</P>
<P>
The yield of <b>regcomp()</b> is zero on success, and non-zero otherwise. The
<i>preg</i> structure is filled in on success, and one member of the structure
is public: <i>re_nsub</i> contains the number of capturing subpatterns in
the regular expression. Various error codes are defined in the header file.
</P>
<br><a name="SEC4" href="#TOC1">MATCHING NEWLINE CHARACTERS</a><br>
<P>
This area is not simple, because POSIX and Perl take different views of things.
It is not possible to get PCRE to obey POSIX semantics, but then PCRE was never
intended to be a POSIX engine. The following table lists the different
possibilities for matching newline characters in PCRE:
</P>
<P>
<pre>
Default Change with
</PRE>
</P>
<P>
<pre>
. matches newline no PCRE_DOTALL
newline matches [^a] yes not changeable
$ matches \n at end yes PCRE_DOLLARENDONLY
$ matches \n in middle no PCRE_MULTILINE
^ matches \n in middle no PCRE_MULTILINE
</PRE>
</P>
<P>
This is the equivalent table for POSIX:
</P>
<P>
<pre>
Default Change with
</PRE>
</P>
<P>
<pre>
. matches newline yes REG_NEWLINE
newline matches [^a] yes REG_NEWLINE
$ matches \n at end no REG_NEWLINE
$ matches \n in middle no REG_NEWLINE
^ matches \n in middle no REG_NEWLINE
</PRE>
</P>
<P>
PCRE's behaviour is the same as Perl's, except that there is no equivalent for
PCRE_DOLLARENDONLY in Perl. In both PCRE and Perl, there is no way to stop
newline from matching [^a].
</P>
<P>
The default POSIX newline handling can be obtained by setting PCRE_DOTALL and
PCRE_DOLLARENDONLY, but there is no way to make PCRE behave exactly as for the
REG_NEWLINE action.
</P>
<br><a name="SEC5" href="#TOC1">MATCHING A PATTERN</a><br>
<P>
The function <b>regexec()</b> is called to match a pre-compiled pattern
<i>preg</i> against a given <i>string</i>, which is terminated by a zero byte,
subject to the options in <i>eflags</i>. These can be:
</P>
<P>
<pre>
REG_NOTBOL
</PRE>
</P>
<P>
The PCRE_NOTBOL option is set when calling the underlying PCRE matching
function.
</P>
<P>
<pre>
REG_NOTEOL
</PRE>
</P>
<P>
The PCRE_NOTEOL option is set when calling the underlying PCRE matching
function.
</P>
<P>
The portion of the string that was matched, and also any captured substrings,
are returned via the <i>pmatch</i> argument, which points to an array of
<i>nmatch</i> structures of type <i>regmatch_t</i>, containing the members
<i>rm_so</i> and <i>rm_eo</i>. These contain the offset to the first character of
each substring and the offset to the first character after the end of each
substring, respectively. The 0th element of the vector relates to the entire
portion of <i>string</i> that was matched; subsequent elements relate to the
capturing subpatterns of the regular expression. Unused entries in the array
have both structure members set to -1.
</P>
<P>
A successful match yields a zero return; various error codes are defined in the
header file, of which REG_NOMATCH is the "expected" failure code.
</P>
<br><a name="SEC6" href="#TOC1">ERROR MESSAGES</a><br>
<P>
The <b>regerror()</b> function maps a non-zero errorcode from either
<b>regcomp()</b> or <b>regexec()</b> to a printable message. If <i>preg</i> is not
NULL, the error should have arisen from the use of that structure. A message
terminated by a binary zero is placed in <i>errbuf</i>. The length of the
message, including the zero, is limited to <i>errbuf_size</i>. The yield of the
function is the size of buffer needed to hold the whole message.
</P>
<br><a name="SEC7" href="#TOC1">STORAGE</a><br>
<P>
Compiling a regular expression causes memory to be allocated and associated
with the <i>preg</i> structure. The function <b>regfree()</b> frees all such
memory, after which <i>preg</i> may no longer be used as a compiled expression.
</P>
<br><a name="SEC8" href="#TOC1">AUTHOR</a><br>
<P>
Philip Hazel &#60;ph10@cam.ac.uk&#62;
<br>
University Computing Service,
<br>
Cambridge CB2 3QG, England.
</P>
<P>
Last updated: 03 February 2003
<br>
Copyright &copy; 1997-2003 University of Cambridge.

View File

@ -1,79 +0,0 @@
<html>
<head>
<title>pcresample specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page, in case the
conversion went wrong.<br>
<ul>
<li><a name="TOC1" href="#SEC1">PCRE SAMPLE PROGRAM</a>
</ul>
<br><a name="SEC1" href="#TOC1">PCRE SAMPLE PROGRAM</a><br>
<P>
A simple, complete demonstration program, to get you started with using PCRE,
is supplied in the file <i>pcredemo.c</i> in the PCRE distribution.
</P>
<P>
The program compiles the regular expression that is its first argument, and
matches it against the subject string in its second argument. No PCRE options
are set, and default character tables are used. If matching succeeds, the
program outputs the portion of the subject that matched, together with the
contents of any captured substrings.
</P>
<P>
If the -g option is given on the command line, the program then goes on to
check for further matches of the same regular expression in the same subject
string. The logic is a little bit tricky because of the possibility of matching
an empty string. Comments in the code explain what is going on.
</P>
<P>
On a Unix system that has PCRE installed in <i>/usr/local</i>, you can compile
the demonstration program using a command like this:
</P>
<P>
<pre>
gcc -o pcredemo pcredemo.c -I/usr/local/include \
-L/usr/local/lib -lpcre
</PRE>
</P>
<P>
Then you can run simple tests like this:
</P>
<P>
<pre>
./pcredemo 'cat|dog' 'the cat sat on the mat'
./pcredemo -g 'cat|dog' 'the dog sat on the cat'
</PRE>
</P>
<P>
Note that there is a much more comprehensive test program, called
<b>pcretest</b>, which supports many more facilities for testing regular
expressions and the PCRE library. The <b>pcredemo</b> program is provided as a
simple coding example.
</P>
<P>
On some operating systems (e.g. Solaris) you may get an error like this when
you try to run <b>pcredemo</b>:
</P>
<P>
<pre>
ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such file or directory
</PRE>
</P>
<P>
This is caused by the way shared library support works on those systems. You
need to add
</P>
<P>
<pre>
-R/usr/local/lib
</PRE>
</P>
<P>
to the compile command to get round this problem.
</P>
<P>
Last updated: 28 January 2003
<br>
Copyright &copy; 1997-2003 University of Cambridge.

View File

@ -1,443 +0,0 @@
<html>
<head>
<title>pcretest specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
This HTML document has been generated automatically from the original man page.
If there is any nonsense in it, please consult the man page, in case the
conversion went wrong.<br>
<ul>
<li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
<li><a name="TOC2" href="#SEC2">OPTIONS</a>
<li><a name="TOC3" href="#SEC3">DESCRIPTION</a>
<li><a name="TOC4" href="#SEC4">PATTERN MODIFIERS</a>
<li><a name="TOC5" href="#SEC5">CALLOUTS</a>
<li><a name="TOC6" href="#SEC6">DATA LINES</a>
<li><a name="TOC7" href="#SEC7">OUTPUT FROM PCRETEST</a>
<li><a name="TOC8" href="#SEC8">AUTHOR</a>
</ul>
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
<P>
<b>pcretest [-d] [-i] [-m] [-o osize] [-p] [-t] [source] [destination]</b>
</P>
<P>
<b>pcretest</b> was written as a test program for the PCRE regular expression
library itself, but it can also be used for experimenting with regular
expressions. This document describes the features of the test program; for
details of the regular expressions themselves, see the
<a href="pcrepattern.html"><b>pcrepattern</b></a>
documentation. For details of PCRE and its options, see the
<a href="pcreapi.html"><b>pcreapi</b></a>
documentation.
</P>
<br><a name="SEC2" href="#TOC1">OPTIONS</a><br>
<P>
<b>-C</b>
Output the version number of the PCRE library, and all available information
about the optional features that are included, and then exit.
</P>
<P>
<b>-d</b>
Behave as if each regex had the <b>/D</b> modifier (see below); the internal
form is output after compilation.
</P>
<P>
<b>-i</b>
Behave as if each regex had the <b>/I</b> modifier; information about the
compiled pattern is given after compilation.
</P>
<P>
<b>-m</b>
Output the size of each compiled pattern after it has been compiled. This is
equivalent to adding /M to each regular expression. For compatibility with
earlier versions of pcretest, <b>-s</b> is a synonym for <b>-m</b>.
</P>
<P>
<b>-o</b> <i>osize</i>
Set the number of elements in the output vector that is used when calling PCRE
to be <i>osize</i>. The default value is 45, which is enough for 14 capturing
subexpressions. The vector size can be changed for individual matching calls by
including \O in the data line (see below).
</P>
<P>
<b>-p</b>
Behave as if each regex has <b>/P</b> modifier; the POSIX wrapper API is used
to call PCRE. None of the other options has any effect when <b>-p</b> is set.
</P>
<P>
<b>-t</b>
Run each compile, study, and match many times with a timer, and output
resulting time per compile or match (in milliseconds). Do not set <b>-t</b> with
<b>-m</b>, because you will then get the size output 20000 times and the timing
will be distorted.
</P>
<br><a name="SEC3" href="#TOC1">DESCRIPTION</a><br>
<P>
If <b>pcretest</b> is given two filename arguments, it reads from the first and
writes to the second. If it is given only one filename argument, it reads from
that file and writes to stdout. Otherwise, it reads from stdin and writes to
stdout, and prompts for each line of input, using "re&#62;" to prompt for regular
expressions, and "data&#62;" to prompt for data lines.
</P>
<P>
The program handles any number of sets of input on a single input file. Each
set starts with a regular expression, and continues with any number of data
lines to be matched against the pattern.
</P>
<P>
Each line is matched separately and independently. If you want to do
multiple-line matches, you have to use the \n escape sequence in a single line
of input to encode the newline characters. The maximum length of data line is
30,000 characters.
</P>
<P>
An empty line signals the end of the data lines, at which point a new regular
expression is read. The regular expressions are given enclosed in any
non-alphameric delimiters other than backslash, for example
</P>
<P>
<pre>
/(a|bc)x+yz/
</PRE>
</P>
<P>
White space before the initial delimiter is ignored. A regular expression may
be continued over several input lines, in which case the newline characters are
included within it. It is possible to include the delimiter within the pattern
by escaping it, for example
</P>
<P>
<pre>
/abc\/def/
</PRE>
</P>
<P>
If you do so, the escape and the delimiter form part of the pattern, but since
delimiters are always non-alphameric, this does not affect its interpretation.
If the terminating delimiter is immediately followed by a backslash, for
example,
</P>
<P>
<pre>
/abc/\
</PRE>
</P>
<P>
then a backslash is added to the end of the pattern. This is done to provide a
way of testing the error condition that arises if a pattern finishes with a
backslash, because
</P>
<P>
<pre>
/abc\/
</PRE>
</P>
<P>
is interpreted as the first line of a pattern that starts with "abc/", causing
pcretest to read the next line as a continuation of the regular expression.
</P>
<br><a name="SEC4" href="#TOC1">PATTERN MODIFIERS</a><br>
<P>
The pattern may be followed by <b>i</b>, <b>m</b>, <b>s</b>, or <b>x</b> to set the
PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options,
respectively. For example:
</P>
<P>
<pre>
/caseless/i
</PRE>
</P>
<P>
These modifier letters have the same effect as they do in Perl. There are
others that set PCRE options that do not correspond to anything in Perl:
<b>/A</b>, <b>/E</b>, <b>/N</b>, <b>/U</b>, and <b>/X</b> set PCRE_ANCHORED,
PCRE_DOLLAR_ENDONLY, PCRE_NO_AUTO_CAPTURE, PCRE_UNGREEDY, and PCRE_EXTRA
respectively.
</P>
<P>
Searching for all possible matches within each subject string can be requested
by the <b>/g</b> or <b>/G</b> modifier. After finding a match, PCRE is called
again to search the remainder of the subject string. The difference between
<b>/g</b> and <b>/G</b> is that the former uses the <i>startoffset</i> argument to
<b>pcre_exec()</b> to start searching at a new point within the entire string
(which is in effect what Perl does), whereas the latter passes over a shortened
substring. This makes a difference to the matching process if the pattern
begins with a lookbehind assertion (including \b or \B).
</P>
<P>
If any call to <b>pcre_exec()</b> in a <b>/g</b> or <b>/G</b> sequence matches an
empty string, the next call is done with the PCRE_NOTEMPTY and PCRE_ANCHORED
flags set in order to search for another, non-empty, match at the same point.
If this second match fails, the start offset is advanced by one, and the normal
match is retried. This imitates the way Perl handles such cases when using the
<b>/g</b> modifier or the <b>split()</b> function.
</P>
<P>
There are a number of other modifiers for controlling the way <b>pcretest</b>
operates.
</P>
<P>
The <b>/+</b> modifier requests that as well as outputting the substring that
matched the entire pattern, pcretest should in addition output the remainder of
the subject string. This is useful for tests where the subject contains
multiple copies of the same substring.
</P>
<P>
The <b>/L</b> modifier must be followed directly by the name of a locale, for
example,
</P>
<P>
<pre>
/pattern/Lfr
</PRE>
</P>
<P>
For this reason, it must be the last modifier letter. The given locale is set,
<b>pcre_maketables()</b> is called to build a set of character tables for the
locale, and this is then passed to <b>pcre_compile()</b> when compiling the
regular expression. Without an <b>/L</b> modifier, NULL is passed as the tables
pointer; that is, <b>/L</b> applies only to the expression on which it appears.
</P>
<P>
The <b>/I</b> modifier requests that <b>pcretest</b> output information about the
compiled expression (whether it is anchored, has a fixed first character, and
so on). It does this by calling <b>pcre_fullinfo()</b> after compiling an
expression, and outputting the information it gets back. If the pattern is
studied, the results of that are also output.
</P>
<P>
The <b>/D</b> modifier is a PCRE debugging feature, which also assumes <b>/I</b>.
It causes the internal form of compiled regular expressions to be output after
compilation. If the pattern was studied, the information returned is also
output.
</P>
<P>
The <b>/S</b> modifier causes <b>pcre_study()</b> to be called after the
expression has been compiled, and the results used when the expression is
matched.
</P>
<P>
The <b>/M</b> modifier causes the size of memory block used to hold the compiled
pattern to be output.
</P>
<P>
The <b>/P</b> modifier causes <b>pcretest</b> to call PCRE via the POSIX wrapper
API rather than its native API. When this is done, all other modifiers except
<b>/i</b>, <b>/m</b>, and <b>/+</b> are ignored. REG_ICASE is set if <b>/i</b> is
present, and REG_NEWLINE is set if <b>/m</b> is present. The wrapper functions
force PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.
</P>
<P>
The <b>/8</b> modifier causes <b>pcretest</b> to call PCRE with the PCRE_UTF8
option set. This turns on support for UTF-8 character handling in PCRE,
provided that it was compiled with this support enabled. This modifier also
causes any non-printing characters in output strings to be printed using the
\x{hh...} notation if they are valid UTF-8 sequences.
</P>
<P>
If the <b>/?</b> modifier is used with <b>/8</b>, it causes <b>pcretest</b> to
call <b>pcre_compile()</b> with the PCRE_NO_UTF8_CHECK option, to suppress the
checking of the string for UTF-8 validity.
</P>
<br><a name="SEC5" href="#TOC1">CALLOUTS</a><br>
<P>
If the pattern contains any callout requests, <b>pcretest</b>'s callout function
will be called. By default, it displays the callout number, and the start and
current positions in the text at the callout time. For example, the output
</P>
<P>
<pre>
---&#62;pqrabcdef
0 ^ ^
</PRE>
</P>
<P>
indicates that callout number 0 occurred for a match attempt starting at the
fourth character of the subject string, when the pointer was at the seventh
character. The callout function returns zero (carry on matching) by default.
</P>
<P>
Inserting callouts may be helpful when using <b>pcretest</b> to check
complicated regular expressions. For further information about callouts, see
the
<a href="pcrecallout.html"><b>pcrecallout</b></a>
documentation.
</P>
<P>
For testing the PCRE library, additional control of callout behaviour is
available via escape sequences in the data, as described in the following
section. In particular, it is possible to pass in a number as callout data (the
default is zero). If the callout function receives a non-zero number, it
returns that value instead of zero.
</P>
<br><a name="SEC6" href="#TOC1">DATA LINES</a><br>
<P>
Before each data line is passed to <b>pcre_exec()</b>, leading and trailing
whitespace is removed, and it is then scanned for \ escapes. Some of these are
pretty esoteric features, intended for checking out some of the more
complicated features of PCRE. If you are just testing "ordinary" regular
expressions, you probably don't need any of these. The following escapes are
recognized:
</P>
<P>
<pre>
\a alarm (= BEL)
\b backspace
\e escape
\f formfeed
\n newline
\r carriage return
\t tab
\v vertical tab
\nnn octal character (up to 3 octal digits)
\xhh hexadecimal character (up to 2 hex digits)
\x{hh...} hexadecimal character, any number of digits
in UTF-8 mode
\A pass the PCRE_ANCHORED option to <b>pcre_exec()</b>
\B pass the PCRE_NOTBOL option to <b>pcre_exec()</b>
\Cdd call pcre_copy_substring() for substring dd
after a successful match (any decimal number
less than 32)
\Cname call pcre_copy_named_substring() for substring
"name" after a successful match (name termin-
ated by next non alphanumeric character)
\C+ show the current captured substrings at callout
time
\C- do not supply a callout function
\C!n return 1 instead of 0 when callout number n is
reached
\C!n!m return 1 instead of 0 when callout number n is
reached for the nth time
\C*n pass the number n (may be negative) as callout
data
\Gdd call pcre_get_substring() for substring dd
after a successful match (any decimal number
less than 32)
\Gname call pcre_get_named_substring() for substring
"name" after a successful match (name termin-
ated by next non-alphanumeric character)
\L call pcre_get_substringlist() after a
successful match
\M discover the minimum MATCH_LIMIT setting
\N pass the PCRE_NOTEMPTY option to <b>pcre_exec()</b>
\Odd set the size of the output vector passed to
<b>pcre_exec()</b> to dd (any number of decimal
digits)
\S output details of memory get/free calls during matching
\Z pass the PCRE_NOTEOL option to <b>pcre_exec()</b>
\? pass the PCRE_NO_UTF8_CHECK option to
<b>pcre_exec()</b>
</PRE>
</P>
<P>
If \M is present, <b>pcretest</b> calls <b>pcre_exec()</b> several times, with
different values in the <i>match_limit</i> field of the <b>pcre_extra</b> data
structure, until it finds the minimum number that is needed for
<b>pcre_exec()</b> to complete. This number is a measure of the amount of
recursion and backtracking that takes place, and checking it out can be
instructive. For most simple matches, the number is quite small, but for
patterns with very large numbers of matching possibilities, it can become large
very quickly with increasing length of subject string.
</P>
<P>
When \O is used, it may be higher or lower than the size set by the <b>-O</b>
option (or defaulted to 45); \O applies only to the call of <b>pcre_exec()</b>
for the line in which it appears.
</P>
<P>
A backslash followed by anything else just escapes the anything else. If the
very last character is a backslash, it is ignored. This gives a way of passing
an empty line as data, since a real empty line terminates the data input.
</P>
<P>
If <b>/P</b> was present on the regex, causing the POSIX wrapper API to be used,
only <b>\B</b>, and <b>\Z</b> have any effect, causing REG_NOTBOL and REG_NOTEOL
to be passed to <b>regexec()</b> respectively.
</P>
<P>
The use of \x{hh...} to represent UTF-8 characters is not dependent on the use
of the <b>/8</b> modifier on the pattern. It is recognized always. There may be
any number of hexadecimal digits inside the braces. The result is from one to
six bytes, encoded according to the UTF-8 rules.
</P>
<br><a name="SEC7" href="#TOC1">OUTPUT FROM PCRETEST</a><br>
<P>
When a match succeeds, pcretest outputs the list of captured substrings that
<b>pcre_exec()</b> returns, starting with number 0 for the string that matched
the whole pattern. Here is an example of an interactive pcretest run.
</P>
<P>
<pre>
$ pcretest
PCRE version 4.00 08-Jan-2003
</PRE>
</P>
<P>
<pre>
re&#62; /^abc(\d+)/
data&#62; abc123
0: abc123
1: 123
data&#62; xyz
No match
</PRE>
</P>
<P>
If the strings contain any non-printing characters, they are output as \0x
escapes, or as \x{...} escapes if the <b>/8</b> modifier was present on the
pattern. If the pattern has the <b>/+</b> modifier, then the output for
substring 0 is followed by the the rest of the subject string, identified by
"0+" like this:
</P>
<P>
<pre>
re&#62; /cat/+
data&#62; cataract
0: cat
0+ aract
</PRE>
</P>
<P>
If the pattern has the <b>/g</b> or <b>/G</b> modifier, the results of successive
matching attempts are output in sequence, like this:
</P>
<P>
<pre>
re&#62; /\Bi(\w\w)/g
data&#62; Mississippi
0: iss
1: ss
0: iss
1: ss
0: ipp
1: pp
</PRE>
</P>
<P>
"No match" is output only if the first match attempt fails.
</P>
<P>
If any of the sequences <b>\C</b>, <b>\G</b>, or <b>\L</b> are present in a
data line that is successfully matched, the substrings extracted by the
convenience functions are output with C, G, or L after the string number
instead of a colon. This is in addition to the normal full list. The string
length (that is, the return from the extraction function) is given in
parentheses after each string for <b>\C</b> and <b>\G</b>.
</P>
<P>
Note that while patterns can be continued over several lines (a plain "&#62;"
prompt is used for continuations), data lines may not. However newlines can be
included in data by means of the \n escape.
</P>
<br><a name="SEC8" href="#TOC1">AUTHOR</a><br>
<P>
Philip Hazel &#60;ph10@cam.ac.uk&#62;
<br>
University Computing Service,
<br>
Cambridge CB2 3QG, England.
</P>
<P>
Last updated: 09 December 2003
<br>
Copyright &copy; 1997-2003 University of Cambridge.

View File

@ -1,174 +0,0 @@
.TH PCRE 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH DESCRIPTION
.rs
.sp
The PCRE library is a set of functions that implement regular expression
pattern matching using the same syntax and semantics as Perl, with just a few
differences. The current implementation of PCRE (release 4.x) corresponds
approximately with Perl 5.8, including support for UTF-8 encoded strings.
However, this support has to be explicitly enabled; it is not the default.
PCRE is written in C and released as a C library. However, a number of people
have written wrappers and interfaces of various kinds. A C++ class is included
in these contributions, which can be found in the \fIContrib\fR directory at
the primary FTP site, which is:
.\" HTML <a href="ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre">
.\" </a>
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre
Details of exactly which Perl regular expression features are and are not
supported by PCRE are given in separate documents. See the
.\" HREF
\fBpcrepattern\fR
.\"
and
.\" HREF
\fBpcrecompat\fR
.\"
pages.
Some features of PCRE can be included, excluded, or changed when the library is
built. The
.\" HREF
\fBpcre_config()\fR
.\"
function makes it possible for a client to discover which features are
available. Documentation about building PCRE for various operating systems can
be found in the \fBREADME\fR file in the source distribution.
.SH USER DOCUMENTATION
.rs
.sp
The user documentation for PCRE has been split up into a number of different
sections. In the "man" format, each of these is a separate "man page". In the
HTML format, each is a separate page, linked from the index page. In the plain
text format, all the sections are concatenated, for ease of searching. The
sections are as follows:
pcre this document
pcreapi details of PCRE's native API
pcrebuild options for building PCRE
pcrecallout details of the callout feature
pcrecompat discussion of Perl compatibility
pcregrep description of the \fBpcregrep\fR command
pcrepattern syntax and semantics of supported
regular expressions
pcreperform discussion of performance issues
pcreposix the POSIX-compatible API
pcresample discussion of the sample program
pcretest the \fBpcretest\fR testing command
In addition, in the "man" and HTML formats, there is a short page for each
library function, listing its arguments and results.
.SH LIMITATIONS
.rs
.sp
There are some size limitations in PCRE but it is hoped that they will never in
practice be relevant.
The maximum length of a compiled pattern is 65539 (sic) bytes if PCRE is
compiled with the default internal linkage size of 2. If you want to process
regular expressions that are truly enormous, you can compile PCRE with an
internal linkage size of 3 or 4 (see the \fBREADME\fR file in the source
distribution and the
.\" HREF
\fBpcrebuild\fR
.\"
documentation for details). If these cases the limit is substantially larger.
However, the speed of execution will be slower.
All values in repeating quantifiers must be less than 65536.
The maximum number of capturing subpatterns is 65535.
There is no limit to the number of non-capturing subpatterns, but the maximum
depth of nesting of all kinds of parenthesized subpattern, including capturing
subpatterns, assertions, and other types of subpattern, is 200.
The maximum length of a subject string is the largest positive number that an
integer variable can hold. However, PCRE uses recursion to handle subpatterns
and indefinite repetition. This means that the available stack space may limit
the size of a subject string that can be processed by certain patterns.
.\" HTML <a name="utf8support"></a>
.SH UTF-8 SUPPORT
.rs
.sp
Starting at release 3.3, PCRE has had some support for character strings
encoded in the UTF-8 format. For release 4.0 this has been greatly extended to
cover most common requirements.
In order process UTF-8 strings, you must build PCRE to include UTF-8 support in
the code, and, in addition, you must call
.\" HREF
\fBpcre_compile()\fR
.\"
with the PCRE_UTF8 option flag. When you do this, both the pattern and any
subject strings that are matched against it are treated as UTF-8 strings
instead of just strings of bytes.
If you compile PCRE with UTF-8 support, but do not use it at run time, the
library will be a bit bigger, but the additional run time overhead is limited
to testing the PCRE_UTF8 flag in several places, so should not be very large.
The following comments apply when PCRE is running in UTF-8 mode:
1. When you set the PCRE_UTF8 flag, the strings passed as patterns and subjects
are checked for validity on entry to the relevant functions. If an invalid
UTF-8 string is passed, an error return is given. In some situations, you may
already know that your strings are valid, and therefore want to skip these
checks in order to improve performance. If you set the PCRE_NO_UTF8_CHECK flag
at compile time or at run time, PCRE assumes that the pattern or subject it
is given (respectively) contains only valid UTF-8 codes. In this case, it does
not diagnose an invalid UTF-8 string. If you pass an invalid UTF-8 string to
PCRE when PCRE_NO_UTF8_CHECK is set, the results are undefined. Your program
may crash.
2. In a pattern, the escape sequence \\x{...}, where the contents of the braces
is a string of hexadecimal digits, is interpreted as a UTF-8 character whose
code number is the given hexadecimal number, for example: \\x{1234}. If a
non-hexadecimal digit appears between the braces, the item is not recognized.
This escape sequence can be used either as a literal, or within a character
class.
3. The original hexadecimal escape sequence, \\xhh, matches a two-byte UTF-8
character if the value is greater than 127.
4. Repeat quantifiers apply to complete UTF-8 characters, not to individual
bytes, for example: \\x{100}{3}.
5. The dot metacharacter matches one UTF-8 character instead of a single byte.
6. The escape sequence \\C can be used to match a single byte in UTF-8 mode,
but its use can lead to some strange effects.
7. The character escapes \\b, \\B, \\d, \\D, \\s, \\S, \\w, and \\W correctly
test characters of any code value, but the characters that PCRE recognizes as
digits, spaces, or word characters remain the same set as before, all with
values less than 256.
8. Case-insensitive matching applies only to characters whose values are less
than 256. PCRE does not support the notion of "case" for higher-valued
characters.
9. PCRE does not support the use of Unicode tables and properties or the Perl
escapes \\p, \\P, and \\X.
.SH AUTHOR
.rs
.sp
Philip Hazel <ph10@cam.ac.uk>
.br
University Computing Service,
.br
Cambridge CB2 3QG, England.
.br
Phone: +44 1223 334714
.in 0
Last updated: 20 August 2003
.br
Copyright (c) 1997-2003 University of Cambridge.

File diff suppressed because it is too large Load Diff

View File

@ -1,59 +0,0 @@
.TH PCRE 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B pcre *pcre_compile(const char *\fIpattern\fR, int \fIoptions\fR,
.ti +5n
.B const char **\fIerrptr\fR, int *\fIerroffset\fR,
.ti +5n
.B const unsigned char *\fItableptr\fR);
.SH DESCRIPTION
.rs
.sp
This function compiles a regular expression into an internal form. Its
arguments are:
\fIpattern\fR A zero-terminated string containing the
regular expression to be compiled
\fIoptions\fR Zero or more option bits
\fIerrptr\fR Where to put an error message
\fIerroffset\fR Offset in pattern where error was found
\fItableptr\fR Pointer to character tables, or NULL to
use the built-in default
The option bits are:
PCRE_ANCHORED Force pattern anchoring
PCRE_CASELESS Do caseless matching
PCRE_DOLLAR_ENDONLY $ not to match newline at end
PCRE_DOTALL . matches anything including NL
PCRE_EXTENDED Ignore whitespace and # comments
PCRE_EXTRA PCRE extra features
(not much use currently)
PCRE_MULTILINE ^ and $ match newlines within data
PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
theses (named ones available)
PCRE_UNGREEDY Invert greediness of quantifiers
PCRE_UTF8 Run in UTF-8 mode
PCRE_NO_UTF8_CHECK Do not check the pattern for UTF-8
validity (only relevant if
PCRE_UTF8 is set)
PCRE must be compiled with UTF-8 support in order to use PCRE_UTF8
(or PCRE_NO_UTF8_CHECK).
The yield of the function is a pointer to a private data structure that
contains the compiled pattern, or NULL if an error was detected.
There is a complete description of the PCRE API in the
.\" HREF
\fBpcreapi\fR
.\"
page.

View File

@ -1,45 +0,0 @@
.TH PCRE 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B int pcre_config(int \fIwhat\fR, void *\fIwhere\fR);
.SH DESCRIPTION
.rs
.sp
This function makes it possible for a client program to find out which optional
features are available in the version of the PCRE library it is using. Its
arguments are as follows:
\fIwhat\fR A code specifying what information is required
\fIwhere\fR Points to where to put the data
The available codes are:
PCRE_CONFIG_LINK_SIZE Internal link size: 2, 3, or 4
PCRE_CONFIG_MATCH_LIMIT Internal resource limit
PCRE_CONFIG_NEWLINE Value of the newline character
PCRE_CONFIG_POSIX_MALLOC_THRESHOLD
Threshold of return slots, above
which \fBmalloc()\fR is used by
the POSIX API
PCRE_CONFIG_STACKRECURSE Recursion implementation (1=stack 0=heap)
PCRE_CONFIG_UTF8 Availability of UTF-8 support (1=yes 0=no)
The function yields 0 on success or PCRE_ERROR_BADOPTION otherwise.
There is a complete description of the PCRE native API in the
.\" HREF
\fBpcreapi\fR
.\"
page, and a description of the POSIX API in the
.\" HREF
\fBpcreposix\fR
.\"
page.

View File

@ -1,40 +0,0 @@
.TH PCRE 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B int pcre_copy_named_substring(const pcre *\fIcode\fR,
.ti +5n
.B const char *\fIsubject\fR, int *\fIovector\fR,
.ti +5n
.B int \fIstringcount\fR, const char *\fIstringname\fR,
.ti +5n
.B char *\fIbuffer\fR, int \fIbuffersize\fR);
.SH DESCRIPTION
.rs
.sp
This is a convenience function for extracting a captured substring, identified
by name, into a given buffer. The arguments are:
\fIcode\fR Pattern that was successfully matched
\fIsubject\fR Subject that has been successfully matched
\fIovector\fR Offset vector that \fBpcre_exec()\fR used
\fIstringcount\fR Value returned by \fBpcre_exec()\fR
\fIstringname\fR Name of the required substring
\fIbuffer\fR Buffer to receive the string
\fIbuffersize\fR Size of buffer
The yield is the length of the substring, PCRE_ERROR_NOMEMORY if the buffer was
too small, or PCRE_ERROR_NOSUBSTRING if the string name is invalid.
There is a complete description of the PCRE API in the
.\" HREF
\fBpcreapi\fR
.\"
page.

View File

@ -1,37 +0,0 @@
.TH PCRE 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B int pcre_copy_substring(const char *\fIsubject\fR, int *\fIovector\fR,
.ti +5n
.B int \fIstringcount\fR, int \fIstringnumber\fR, char *\fIbuffer\fR,
.ti +5n
.B int \fIbuffersize\fR);
.SH DESCRIPTION
.rs
.sp
This is a convenience function for extracting a captured substring into a given
buffer. The arguments are:
\fIsubject\fR Subject that has been successfully matched
\fIovector\fR Offset vector that \fBpcre_exec()\fR used
\fIstringcount\fR Value returned by \fBpcre_exec()\fR
\fIstringnumber\fR Number of the required substring
\fIbuffer\fR Buffer to receive the string
\fIbuffersize\fR Size of buffer
The yield is the legnth of the string, PCRE_ERROR_NOMEMORY if the buffer was
too small, or PCRE_ERROR_NOSUBSTRING if the string number is invalid.
There is a complete description of the PCRE API in the
.\" HREF
\fBpcreapi\fR
.\"
page.

View File

@ -1,48 +0,0 @@
.TH PCRE 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B int pcre_exec(const pcre *\fIcode\fR, "const pcre_extra *\fIextra\fR,"
.ti +5n
.B "const char *\fIsubject\fR," int \fIlength\fR, int \fIstartoffset\fR,
.ti +5n
.B int \fIoptions\fR, int *\fIovector\fR, int \fIovecsize\fR);
.SH DESCRIPTION
.rs
.sp
This function matches a compiled regular expression against a given subject
string, and returns offsets to capturing subexpressions. Its arguments are:
\fIcode\fR Points to the compiled pattern
\fIextra\fR Points to an associated \fBpcre_extra\fR structure,
or is NULL
\fIsubject\fR Points to the subject string
\fIlength\fR Length of the subject string, in bytes
\fIstartoffset\fR Offset in bytes in the subject at which to
start matching
\fIoptions\fR Option bits
\fIovector\fR Points to a vector of ints for result offsets
\fIovecsize\fR Size of the vector (a multiple of 3)
The options are:
PCRE_ANCHORED Match only at the first position
PCRE_NOTBOL Subject is not the beginning of a line
PCRE_NOTEOL Subject is not the end of a line
PCRE_NOTEMPTY An empty string is not a valid match
PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
validity (only relevant if PCRE_UTF8
was set at compile time)
There is a complete description of the PCRE API in the
.\" HREF
\fBpcreapi\fR
.\"
page.

View File

@ -1,24 +0,0 @@
.TH PCRE 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B void pcre_free_substring(const char *\fIstringptr\fR);
.SH DESCRIPTION
.rs
.sp
This is a convenience function for freeing the store obtained by a previous
call to \fBpcre_get_substring()\fR or \fBpcre_get_named_substring()\fR. Its
only argument is a pointer to the string.
There is a complete description of the PCRE API in the
.\" HREF
\fBpcreapi\fR
.\"
page.

View File

@ -1,24 +0,0 @@
.TH PCRE 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B void pcre_free_substring_list(const char **\fIstringptr\fR);
.SH DESCRIPTION
.rs
.sp
This is a convenience function for freeing the store obtained by a previous
call to \fBpcre_get_substring_list()\fR. Its only argument is a pointer to the
list of string pointers.
There is a complete description of the PCRE API in the
.\" HREF
\fBpcreapi\fR
.\"
page.

View File

@ -1,53 +0,0 @@
.TH PCRE 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B int pcre_fullinfo(const pcre *\fIcode\fR, "const pcre_extra *\fIextra\fR,"
.ti +5n
.B int \fIwhat\fR, void *\fIwhere\fR);
.SH DESCRIPTION
.rs
.sp
This function returns information about a compiled pattern. Its arguments are:
\fIcode\fR Compiled regular expression
\fIextra\fR Result of \fBpcre_study()\fR or NULL
\fIwhat\fR What information is required
\fIwhere\fR Where to put the information
The following information is available:
PCRE_INFO_BACKREFMAX Number of highest back reference
PCRE_INFO_CAPTURECOUNT Number of capturing subpatterns
PCRE_INFO_FIRSTBYTE Fixed first byte for a match, or
-1 for start of string
or after newline, or
-2 otherwise
PCRE_INFO_FIRSTTABLE Table of first bytes
(after studying)
PCRE_INFO_LASTLITERAL Literal last byte required
PCRE_INFO_NAMECOUNT Number of named subpatterns
PCRE_INFO_NAMEENTRYSIZE Size of name table entry
PCRE_INFO_NAMETABLE Pointer to name table
PCRE_INFO_OPTIONS Options used for compilation
PCRE_INFO_SIZE Size of compiled pattern
The yield of the function is zero on success or:
PCRE_ERROR_NULL the argument \fIcode\fR was NULL
the argument \fIwhere\fR was NULL
PCRE_ERROR_BADMAGIC the "magic number" was not found
PCRE_ERROR_BADOPTION the value of \fIwhat\fR was invalid
There is a complete description of the PCRE API in the
.\" HREF
\fBpcreapi\fR
.\"
page.

View File

@ -1,40 +0,0 @@
.TH PCRE 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B int pcre_get_named_substring(const pcre *\fIcode\fR,
.ti +5n
.B const char *\fIsubject\fR, int *\fIovector\fR,
.ti +5n
.B int \fIstringcount\fR, const char *\fIstringname\fR,
.ti +5n
.B const char **\fIstringptr\fR);
.SH DESCRIPTION
.rs
.sp
This is a convenience function for extracting a captured substring by name. The
arguments are:
\fIcode\fR Compiled pattern
\fIsubject\fR Subject that has been successfully matched
\fIovector\fR Offset vector that \fBpcre_exec()\fR used
\fIstringcount\fR Value returned by \fBpcre_exec()\fR
\fIstringname\fR Name of the required substring
\fIstringptr\fR Where to put the string pointer
The yield is the length of the extracted substring, PCRE_ERROR_NOMEMORY if
sufficient memory could not be obtained, or PCRE_ERROR_NOSUBSTRING if the
string name is invalid.
There is a complete description of the PCRE API in the
.\" HREF
\fBpcreapi\fR
.\"
page.

View File

@ -1,31 +0,0 @@
.TH PCRE 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B int pcre_get_stringnumber(const pcre *\fIcode\fR,
.ti +5n
.B const char *\fIname\fR);
.SH DESCRIPTION
.rs
.sp
This convenience function finds the number of a named substring capturing
parenthesis in a compiled pattern. Its arguments are:
\fIcode\fR Compiled regular expression
\fIname\fR Name whose number is required
The yield of the function is the number of the parenthesis if the name is
found, or PCRE_ERROR_NOSUBSTRING otherwise.
There is a complete description of the PCRE API in the
.\" HREF
\fBpcreapi\fR
.\"
page.

View File

@ -1,37 +0,0 @@
.TH PCRE 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B int pcre_get_substring(const char *\fIsubject\fR, int *\fIovector\fR,
.ti +5n
.B int \fIstringcount\fR, int \fIstringnumber\fR,
.ti +5n
.B const char **\fIstringptr\fR);
.SH DESCRIPTION
.rs
.sp
This is a convenience function for extracting a captured substring. The
arguments are:
\fIsubject\fR Subject that has been successfully matched
\fIovector\fR Offset vector that \fBpcre_exec()\fR used
\fIstringcount\fR Value returned by \fBpcre_exec()\fR
\fIstringnumber\fR Number of the required substring
\fIstringptr\fR Where to put the string pointer
The yield is the length of the substring, PCRE_ERROR_NOMEMORY if sufficient
memory could not be obtained, or PCRE_ERROR_NOSUBSTRING if the string number is
invalid.
There is a complete description of the PCRE API in the
.\" HREF
\fBpcreapi\fR
.\"
page.

View File

@ -1,33 +0,0 @@
.TH PCRE 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B int pcre_get_substring_list(const char *\fIsubject\fR,
.ti +5n
.B int *\fIovector\fR, int \fIstringcount\fR, "const char ***\fIlistptr\fR);"
.SH DESCRIPTION
.rs
.sp
This is a convenience function for extracting a list of all the captured
substrings. The arguments are:
\fIsubject\fR Subject that has been successfully matched
\fIovector\fR Offset vector that \fBpcre_exec\fR used
\fIstringcount\fR Value returned by \fBpcre_exec\fR
\fIlistptr\fR Where to put a pointer to the list
The yield is zero on success or PCRE_ERROR_NOMEMORY if sufficient memory could
not be obtained.
There is a complete description of the PCRE API in the
.\" HREF
\fBpcreapi\fR
.\"
page.

View File

@ -1,23 +0,0 @@
.TH PCRE 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B int pcre_info(const pcre *\fIcode\fR, int *\fIoptptr\fR, int
.B *\fIfirstcharptr\fR);
.SH DESCRIPTION
.rs
.sp
This function is obsolete. You should be using \fBpcre_fullinfo()\fR instead.
There is a complete description of the PCRE API in the
.\" HREF
\fBpcreapi\fR
.\"
page.

View File

@ -1,26 +0,0 @@
.TH PCRE 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B const unsigned char *pcre_maketables(void);
.SH DESCRIPTION
.rs
.sp
This function builds a set of character tables which can be passed to
\fBpcre_compile()\fR to override PCRE's internal, built-in tables (which were
made by \fBpcre_maketables()\fR when PCRE was compiled). You might want to do
this if you are using a non-standard locale. The function yields a pointer to
the tables.
There is a complete description of the PCRE API in the
.\" HREF
\fBpcreapi\fR
.\"
page.

View File

@ -1,36 +0,0 @@
.TH PCRE 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B pcre_extra *pcre_study(const pcre *\fIcode\fR, int \fIoptions\fR,
.ti +5n
.B const char **\fIerrptr\fR);
.SH DESCRIPTION
.rs
.sp
This function studies a compiled pattern, to see if additional information can
be extracted that might speed up matching. Its arguments are:
\fIcode\fR A compiled regular expression
\fIoptions\fR Options for \fBpcre_study()\fR
\fIerrptr\fR Where to put an error message
If the function returns NULL, either it could not find any additional
information, or there was an error. You can tell the difference by looking at
the error value. It is NULL in first case.
There are currently no options defined; the value of the second argument should
always be zero.
There is a complete description of the PCRE API in the
.\" HREF
\fBpcreapi\fR
.\"
page.

View File

@ -1,23 +0,0 @@
.TH PCRE 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B char *pcre_version(void);
.SH DESCRIPTION
.rs
.sp
This function returns a character string that gives the version number of the
PCRE library, and its date of release.
There is a complete description of the PCRE API in the
.\" HREF
\fBpcreapi\fR
.\"
page.

File diff suppressed because it is too large Load Diff

View File

@ -1,145 +0,0 @@
.TH PCRE 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH PCRE BUILD-TIME OPTIONS
.rs
.sp
This document describes the optional features of PCRE that can be selected when
the library is compiled. They are all selected, or deselected, by providing
options to the \fBconfigure\fR script which is run before the \fBmake\fR
command. The complete list of options for \fBconfigure\fR (which includes the
standard ones such as the selection of the installation directory) can be
obtained by running
./configure --help
The following sections describe certain options whose names begin with --enable
or --disable. These settings specify changes to the defaults for the
\fBconfigure\fR command. Because of the way that \fBconfigure\fR works,
--enable and --disable always come in pairs, so the complementary option always
exists as well, but as it specifies the default, it is not described.
.SH UTF-8 SUPPORT
.rs
.sp
To build PCRE with support for UTF-8 character strings, add
--enable-utf8
to the \fBconfigure\fR command. Of itself, this does not make PCRE treat
strings as UTF-8. As well as compiling PCRE with this option, you also have
have to set the PCRE_UTF8 option when you call the \fBpcre_compile()\fR
function.
.SH CODE VALUE OF NEWLINE
.rs
.sp
By default, PCRE treats character 10 (linefeed) as the newline character. This
is the normal newline character on Unix-like systems. You can compile PCRE to
use character 13 (carriage return) instead by adding
--enable-newline-is-cr
to the \fBconfigure\fR command. For completeness there is also a
--enable-newline-is-lf option, which explicitly specifies linefeed as the
newline character.
.SH BUILDING SHARED AND STATIC LIBRARIES
.rs
.sp
The PCRE building process uses \fBlibtool\fR to build both shared and static
Unix libraries by default. You can suppress one of these by adding one of
--disable-shared
--disable-static
to the \fBconfigure\fR command, as required.
.SH POSIX MALLOC USAGE
.rs
.sp
When PCRE is called through the POSIX interface (see the \fBpcreposix\fR
documentation), additional working storage is required for holding the pointers
to capturing substrings because PCRE requires three integers per substring,
whereas the POSIX interface provides only two. If the number of expected
substrings is small, the wrapper function uses space on the stack, because this
is faster than using \fBmalloc()\fR for each call. The default threshold above
which the stack is no longer used is 10; it can be changed by adding a setting
such as
--with-posix-malloc-threshold=20
to the \fBconfigure\fR command.
.SH LIMITING PCRE RESOURCE USAGE
.rs
.sp
Internally, PCRE has a function called \fBmatch()\fR which it calls repeatedly
(possibly recursively) when performing a matching operation. By limiting the
number of times this function may be called, a limit can be placed on the
resources used by a single call to \fBpcre_exec()\fR. The limit can be changed
at run time, as described in the \fBpcreapi\fR documentation. The default is 10
million, but this can be changed by adding a setting such as
--with-match-limit=500000
to the \fBconfigure\fR command.
.SH HANDLING VERY LARGE PATTERNS
.rs
.sp
Within a compiled pattern, offset values are used to point from one part to
another (for example, from an opening parenthesis to an alternation
metacharacter). By default two-byte values are used for these offsets, leading
to a maximum size for a compiled pattern of around 64K. This is sufficient to
handle all but the most gigantic patterns. Nevertheless, some people do want to
process enormous patterns, so it is possible to compile PCRE to use three-byte
or four-byte offsets by adding a setting such as
--with-link-size=3
to the \fBconfigure\fR command. The value given must be 2, 3, or 4. Using
longer offsets slows down the operation of PCRE because it has to load
additional bytes when handling them.
If you build PCRE with an increased link size, test 2 (and test 5 if you are
using UTF-8) will fail. Part of the output of these tests is a representation
of the compiled pattern, and this changes with the link size.
.SH AVOIDING EXCESSIVE STACK USAGE
.rs
.sp
PCRE implements backtracking while matching by making recursive calls to an
internal function called \fBmatch()\fR. In environments where the size of the
stack is limited, this can severely limit PCRE's operation. (The Unix
environment does not usually suffer from this problem.) An alternative approach
that uses memory from the heap to remember data, instead of using recursive
function calls, has been implemented to work round this problem. If you want to
build a version of PCRE that works this way, add
--disable-stack-for-recursion
to the \fBconfigure\fR command. With this configuration, PCRE will use the
\fBpcre_stack_malloc\fR and \fBpcre_stack_free\fR variables to call memory
management functions. Separate functions are provided because the usage is very
predictable: the block sizes requested are always the same, and the blocks are
always freed in reverse order. A calling program might be able to implement
optimized functions that perform better than the standard \fBmalloc()\fR and
\fBfree()\fR functions. PCRE runs noticeably more slowly when built in this
way.
.SH USING EBCDIC CODE
.rs
.sp
PCRE assumes by default that it will run in an environment where the character
code is ASCII (or UTF-8, which is a superset of ASCII). PCRE can, however, be
compiled to run in an EBCDIC environment by adding
--enable-ebcdic
to the \fBconfigure\fR command.
.in 0
Last updated: 09 December 2003
.br
Copyright (c) 1997-2003 University of Cambridge.

View File

@ -1,92 +0,0 @@
.TH PCRE 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH PCRE CALLOUTS
.rs
.sp
.B int (*pcre_callout)(pcre_callout_block *);
.PP
PCRE provides a feature called "callout", which is a means of temporarily
passing control to the caller of PCRE in the middle of pattern matching. The
caller of PCRE provides an external function by putting its entry point in the
global variable \fIpcre_callout\fR. By default, this variable contains NULL,
which disables all calling out.
Within a regular expression, (?C) indicates the points at which the external
function is to be called. Different callout points can be identified by putting
a number less than 256 after the letter C. The default value is zero.
For example, this pattern has two callout points:
(?C1)\dabc(?C2)def
During matching, when PCRE reaches a callout point (and \fIpcre_callout\fR is
set), the external function is called. Its only argument is a pointer to a
\fBpcre_callout\fR block. This contains the following variables:
int \fIversion\fR;
int \fIcallout_number\fR;
int *\fIoffset_vector\fR;
const char *\fIsubject\fR;
int \fIsubject_length\fR;
int \fIstart_match\fR;
int \fIcurrent_position\fR;
int \fIcapture_top\fR;
int \fIcapture_last\fR;
void *\fIcallout_data\fR;
The \fIversion\fR field is an integer containing the version number of the
block format. The current version is zero. The version number may change in
future if additional fields are added, but the intention is never to remove any
of the existing fields.
The \fIcallout_number\fR field contains the number of the callout, as compiled
into the pattern (that is, the number after ?C).
The \fIoffset_vector\fR field is a pointer to the vector of offsets that was
passed by the caller to \fBpcre_exec()\fR. The contents can be inspected in
order to extract substrings that have been matched so far, in the same way as
for extracting substrings after a match has completed.
The \fIsubject\fR and \fIsubject_length\fR fields contain copies the values
that were passed to \fBpcre_exec()\fR.
The \fIstart_match\fR field contains the offset within the subject at which the
current match attempt started. If the pattern is not anchored, the callout
function may be called several times for different starting points.
The \fIcurrent_position\fR field contains the offset within the subject of the
current match pointer.
The \fIcapture_top\fR field contains one more than the number of the highest
numbered captured substring so far. If no substrings have been captured,
the value of \fIcapture_top\fR is one.
The \fIcapture_last\fR field contains the number of the most recently captured
substring.
The \fIcallout_data\fR field contains a value that is passed to
\fBpcre_exec()\fR by the caller specifically so that it can be passed back in
callouts. It is passed in the \fIpcre_callout\fR field of the \fBpcre_extra\fR
data structure. If no such data was passed, the value of \fIcallout_data\fR in
a \fBpcre_callout\fR block is NULL. There is a description of the
\fBpcre_extra\fR structure in the \fBpcreapi\fR documentation.
.SH RETURN VALUES
.rs
.sp
The callout function returns an integer. If the value is zero, matching
proceeds as normal. If the value is greater than zero, matching fails at the
current point, but backtracking to test other possibilities goes ahead, just as
if a lookahead assertion had failed. If the value is less than zero, the match
is abandoned, and \fBpcre_exec()\fR returns the value.
Negative values should normally be chosen from the set of PCRE_ERROR_xxx
values. In particular, PCRE_ERROR_NOMATCH forces a standard "no match" failure.
The error number PCRE_ERROR_CALLOUT is reserved for use by callout functions;
it will never be used by PCRE itself.
.in 0
Last updated: 21 January 2003
.br
Copyright (c) 1997-2003 University of Cambridge.

View File

@ -1,107 +0,0 @@
.TH PCRE 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH DIFFERENCES FROM PERL
.rs
.sp
This document describes the differences in the ways that PCRE and Perl handle
regular expressions. The differences described here are with respect to Perl
5.8.
1. PCRE does not have full UTF-8 support. Details of what it does have are
given in the
.\" HTML <a href="pcre.html#utf8support">
.\" </a>
section on UTF-8 support
.\"
in the main
.\" HREF
\fBpcre\fR
.\"
page.
2. PCRE does not allow repeat quantifiers on lookahead assertions. Perl permits
them, but they do not mean what you might think. For example, (?!a){3} does
not assert that the next three characters are not "a". It just asserts that the
next character is not "a" three times.
3. Capturing subpatterns that occur inside negative lookahead assertions are
counted, but their entries in the offsets vector are never set. Perl sets its
numerical variables from any such patterns that are matched before the
assertion fails to match something (thereby succeeding), but only if the
negative lookahead assertion contains just one branch.
4. Though binary zero characters are supported in the subject string, they are
not allowed in a pattern string because it is passed as a normal C string,
terminated by zero. The escape sequence "\\0" can be used in the pattern to
represent a binary zero.
5. The following Perl escape sequences are not supported: \\l, \\u, \\L,
\\U, \\P, \\p, \\N, and \\X. In fact these are implemented by Perl's general
string-handling and are not part of its pattern matching engine. If any of
these are encountered by PCRE, an error is generated.
6. PCRE does support the \\Q...\\E escape for quoting substrings. Characters in
between are treated as literals. This is slightly different from Perl in that $
and @ are also handled as literals inside the quotes. In Perl, they cause
variable interpolation (but of course PCRE does not have variables). Note the
following examples:
Pattern PCRE matches Perl matches
\\Qabc$xyz\\E abc$xyz abc followed by the
contents of $xyz
\\Qabc\\$xyz\\E abc\\$xyz abc\\$xyz
\\Qabc\\E\\$\\Qxyz\\E abc$xyz abc$xyz
The \\Q...\\E sequence is recognized both inside and outside character classes.
7. Fairly obviously, PCRE does not support the (?{code}) and (?p{code})
constructions. However, there is some experimental support for recursive
patterns using the non-Perl items (?R), (?number) and (?P>name). Also, the PCRE
"callout" feature allows an external function to be called during pattern
matching.
8. There are some differences that are concerned with the settings of captured
strings when part of a pattern is repeated. For example, matching "aba" against
the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b".
9. PCRE provides some extensions to the Perl regular expression facilities:
(a) Although lookbehind assertions must match fixed length strings, each
alternative branch of a lookbehind assertion can match a different length of
string. Perl requires them all to have the same length.
(b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $
meta-character matches only at the very end of the string.
(c) If PCRE_EXTRA is set, a backslash followed by a letter with no special
meaning is faulted.
(d) If PCRE_UNGREEDY is set, the greediness of the repetition quantifiers is
inverted, that is, by default they are not greedy, but if followed by a
question mark they are.
(e) PCRE_ANCHORED can be used to force a pattern to be tried only at the first
matching position in the subject string.
(f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NO_AUTO_CAPTURE
options for \fBpcre_exec()\fR have no Perl equivalents.
(g) The (?R), (?number), and (?P>name) constructs allows for recursive pattern
matching (Perl can do this using the (?p{code}) construct, which PCRE cannot
support.)
(h) PCRE supports named capturing substrings, using the Python syntax.
(i) PCRE supports the possessive quantifier "++" syntax, taken from Sun's Java
package.
(j) The (R) condition, for testing recursion, is a PCRE extension.
(k) The callout facility is PCRE-specific.
.in 0
Last updated: 09 December 2003
.br
Copyright (c) 1997-2003 University of Cambridge.

View File

@ -1,130 +0,0 @@
.TH PCREGREP 1
.SH NAME
pcregrep - a grep with Perl-compatible regular expressions.
.SH SYNOPSIS
.B pcregrep [-Vcfhilnrsuvx] [long options] [pattern] [file1 file2 ...]
.SH DESCRIPTION
.rs
.sp
\fBpcregrep\fR searches files for character patterns, in the same way as other
grep commands do, but it uses the PCRE regular expression library to support
patterns that are compatible with the regular expressions of Perl 5. See
.\" HREF
\fBpcrepattern\fR
.\"
for a full description of syntax and semantics of the regular expressions that
PCRE supports.
A pattern must be specified on the command line unless the \fB-f\fR option is
used (see below).
If no files are specified, \fBpcregrep\fR reads the standard input. By default,
each line that matches the pattern is copied to the standard output, and if
there is more than one file, the file name is printed before each line of
output. However, there are options that can change how \fBpcregrep\fR behaves.
Lines are limited to BUFSIZ characters. BUFSIZ is defined in \fB<stdio.h>\fR.
The newline character is removed from the end of each line before it is matched
against the pattern.
.SH OPTIONS
.rs
.sp
.TP 10
\fB-V\fR
Write the version number of the PCRE library being used to the standard error
stream.
.TP
\fB-c\fR
Do not print individual lines; instead just print a count of the number of
lines that would otherwise have been printed. If several files are given, a
count is printed for each of them.
.TP
\fB-f\fR\fIfilename\fR
Read a number of patterns from the file, one per line, and match all of them
against each line of input. A line is output if any of the patterns match it.
When \fB-f\fR is used, no pattern is taken from the command line; all arguments
are treated as file names. There is a maximum of 100 patterns. Trailing white
space is removed, and blank lines are ignored. An empty file contains no
patterns and therefore matches nothing.
.TP
\fB-h\fR
Suppress printing of filenames when searching multiple files.
.TP
\fB-i\fR
Ignore upper/lower case distinctions during comparisons.
.TP
\fB-l\fR
Instead of printing lines from the files, just print the names of the files
containing lines that would have been printed. Each file name is printed
once, on a separate line.
.TP
\fB-n\fR
Precede each line by its line number in the file.
.TP
\fB-r\fR
If any file is a directory, recursively scan the files it contains. Without
\fB-r\fR a directory is scanned as a normal file.
.TP
\fB-s\fR
Work silently, that is, display nothing except error messages.
The exit status indicates whether any matches were found.
.TP
\fB-u\fR
Operate in UTF-8 mode. This option is available only if PCRE has been compiled
with UTF-8 support. Both the pattern and each subject line are assumed to be
valid strings of UTF-8 characters.
.TP
\fB-v\fR
Invert the sense of the match, so that lines which do \fInot\fR match the
pattern are now the ones that are found.
.TP
\fB-x\fR
Force the pattern to be anchored (it must start matching at the beginning of
the line) and in addition, require it to match the entire line. This is
equivalent to having ^ and $ characters at the start and end of each
alternative branch in the regular expression.
.SH LONG OPTIONS
.rs
.sp
Long forms of all the options are available, as in GNU grep. They are shown in
the following table:
-c --count
-h --no-filename
-i --ignore-case
-l --files-with-matches
-n --line-number
-r --recursive
-s --no-messages
-u --utf-8
-V --version
-v --invert-match
-x --line-regex
-x --line-regexp
In addition, --file=\fIfilename\fR is equivalent to -f\fIfilename\fR, and
--help shows the list of options and then exits.
.SH DIAGNOSTICS
.rs
.sp
Exit status is 0 if any matches were found, 1 if no matches were found, and 2
for syntax errors or inacessible files (even if matches were found).
.SH AUTHOR
.rs
.sp
Philip Hazel <ph10@cam.ac.uk>
.br
University Computing Service
.br
Cambridge CB2 3QG, England.
.in 0
Last updated: 03 February 2003
.br
Copyright (c) 1997-2003 University of Cambridge.

View File

@ -1,124 +0,0 @@
PCREGREP(1) PCREGREP(1)
NAME
pcregrep - a grep with Perl-compatible regular expressions.
SYNOPSIS
pcregrep [-Vcfhilnrsuvx] [long options] [pattern] [file1 file2 ...]
DESCRIPTION
pcregrep searches files for character patterns, in the same way as
other grep commands do, but it uses the PCRE regular expression library
to support patterns that are compatible with the regular expressions of
Perl 5. See pcrepattern for a full description of syntax and semantics
of the regular expressions that PCRE supports.
A pattern must be specified on the command line unless the -f option is
used (see below).
If no files are specified, pcregrep reads the standard input. By
default, each line that matches the pattern is copied to the standard
output, and if there is more than one file, the file name is printed
before each line of output. However, there are options that can change
how pcregrep behaves.
Lines are limited to BUFSIZ characters. BUFSIZ is defined in <stdio.h>.
The newline character is removed from the end of each line before it is
matched against the pattern.
OPTIONS
-V Write the version number of the PCRE library being used to
the standard error stream.
-c Do not print individual lines; instead just print a count of
the number of lines that would otherwise have been printed.
If several files are given, a count is printed for each of
them.
-ffilename
Read a number of patterns from the file, one per line, and
match all of them against each line of input. A line is out-
put if any of the patterns match it. When -f is used, no
pattern is taken from the command line; all arguments are
treated as file names. There is a maximum of 100 patterns.
Trailing white space is removed, and blank lines are ignored.
An empty file contains no patterns and therefore matches
nothing.
-h Suppress printing of filenames when searching multiple files.
-i Ignore upper/lower case distinctions during comparisons.
-l Instead of printing lines from the files, just print the
names of the files containing lines that would have been
printed. Each file name is printed once, on a separate line.
-n Precede each line by its line number in the file.
-r If any file is a directory, recursively scan the files it
contains. Without -r a directory is scanned as a normal file.
-s Work silently, that is, display nothing except error mes-
sages. The exit status indicates whether any matches were
found.
-u Operate in UTF-8 mode. This option is available only if PCRE
has been compiled with UTF-8 support. Both the pattern and
each subject line are assumed to be valid strings of UTF-8
characters.
-v Invert the sense of the match, so that lines which do not
match the pattern are now the ones that are found.
-x Force the pattern to be anchored (it must start matching at
the beginning of the line) and in addition, require it to
match the entire line. This is equivalent to having ^ and $
characters at the start and end of each alternative branch in
the regular expression.
LONG OPTIONS
Long forms of all the options are available, as in GNU grep. They are
shown in the following table:
-c --count
-h --no-filename
-i --ignore-case
-l --files-with-matches
-n --line-number
-r --recursive
-s --no-messages
-u --utf-8
-V --version
-v --invert-match
-x --line-regex
-x --line-regexp
In addition, --file=filename is equivalent to -ffilename, and --help
shows the list of options and then exits.
DIAGNOSTICS
Exit status is 0 if any matches were found, 1 if no matches were found,
and 2 for syntax errors or inacessible files (even if matches were
found).
AUTHOR
Philip Hazel <ph10@cam.ac.uk>
University Computing Service
Cambridge CB2 3QG, England.
Last updated: 03 February 2003
Copyright (c) 1997-2003 University of Cambridge.

File diff suppressed because it is too large Load Diff

View File

@ -1,66 +0,0 @@
.TH PCRE 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH PCRE PERFORMANCE
.rs
.sp
Certain items that may appear in regular expression patterns are more efficient
than others. It is more efficient to use a character class like [aeiou] than a
set of alternatives such as (a|e|i|o|u). In general, the simplest construction
that provides the required behaviour is usually the most efficient. Jeffrey
Friedl's book contains a lot of discussion about optimizing regular expressions
for efficient performance.
When a pattern begins with .* not in parentheses, or in parentheses that are
not the subject of a backreference, and the PCRE_DOTALL option is set, the
pattern is implicitly anchored by PCRE, since it can match only at the start of
a subject string. However, if PCRE_DOTALL is not set, PCRE cannot make this
optimization, because the . metacharacter does not then match a newline, and if
the subject string contains newlines, the pattern may match from the character
immediately following one of them instead of from the very start. For example,
the pattern
.*second
matches the subject "first\\nand second" (where \\n stands for a newline
character), with the match starting at the seventh character. In order to do
this, PCRE has to retry the match starting after every newline in the subject.
If you are using such a pattern with subject strings that do not contain
newlines, the best performance is obtained by setting PCRE_DOTALL, or starting
the pattern with ^.* to indicate explicit anchoring. That saves PCRE from
having to scan along the subject looking for a newline to restart at.
Beware of patterns that contain nested indefinite repeats. These can take a
long time to run when applied to a string that does not match. Consider the
pattern fragment
(a+)*
This can match "aaaa" in 33 different ways, and this number increases very
rapidly as the string gets longer. (The * repeat can match 0, 1, 2, 3, or 4
times, and for each of those cases other than 0, the + repeats can match
different numbers of times.) When the remainder of the pattern is such that the
entire match is going to fail, PCRE has in principle to try every possible
variation, and this can take an extremely long time.
An optimization catches some of the more simple cases such as
(a+)*b
where a literal character follows. Before embarking on the standard matching
procedure, PCRE checks that there is a "b" later in the subject string, and if
there is not, it fails the match immediately. However, when there is no
following literal this optimization cannot be used. You can see the difference
by comparing the behaviour of
(a+)*\\d
with the pattern above. The former gives a failure almost instantly when
applied to a whole line of "a" characters, whereas the latter takes an
appreciable time with strings longer than about 20 characters.
.in 0
Last updated: 03 February 2003
.br
Copyright (c) 1997-2003 University of Cambridge.

View File

@ -1,194 +0,0 @@
.TH PCRE 3
.SH NAME
PCRE - Perl-compatible regular expressions.
.SH SYNOPSIS OF POSIX API
.B #include <pcreposix.h>
.PP
.SM
.br
.B int regcomp(regex_t *\fIpreg\fR, const char *\fIpattern\fR,
.ti +5n
.B int \fIcflags\fR);
.PP
.br
.B int regexec(regex_t *\fIpreg\fR, const char *\fIstring\fR,
.ti +5n
.B size_t \fInmatch\fR, regmatch_t \fIpmatch\fR[], int \fIeflags\fR);
.PP
.br
.B size_t regerror(int \fIerrcode\fR, const regex_t *\fIpreg\fR,
.ti +5n
.B char *\fIerrbuf\fR, size_t \fIerrbuf_size\fR);
.PP
.br
.B void regfree(regex_t *\fIpreg\fR);
.SH DESCRIPTION
.rs
.sp
This set of functions provides a POSIX-style API to the PCRE regular expression
package. See the
.\" HREF
\fBpcreapi\fR
.\"
documentation for a description of the native API, which contains additional
functionality.
The functions described here are just wrapper functions that ultimately call
the PCRE native API. Their prototypes are defined in the \fBpcreposix.h\fR
header file, and on Unix systems the library itself is called
\fBpcreposix.a\fR, so can be accessed by adding \fB-lpcreposix\fR to the
command for linking an application which uses them. Because the POSIX functions
call the native ones, it is also necessary to add \fR-lpcre\fR.
I have implemented only those option bits that can be reasonably mapped to PCRE
native options. In addition, the options REG_EXTENDED and REG_NOSUB are defined
with the value zero. They have no effect, but since programs that are written
to the POSIX interface often use them, this makes it easier to slot in PCRE as
a replacement library. Other POSIX options are not even defined.
When PCRE is called via these functions, it is only the API that is POSIX-like
in style. The syntax and semantics of the regular expressions themselves are
still those of Perl, subject to the setting of various PCRE options, as
described below. "POSIX-like in style" means that the API approximates to the
POSIX definition; it is not fully POSIX-compatible, and in multi-byte encoding
domains it is probably even less compatible.
The header for these functions is supplied as \fBpcreposix.h\fR to avoid any
potential clash with other POSIX libraries. It can, of course, be renamed or
aliased as \fBregex.h\fR, which is the "correct" name. It provides two
structure types, \fIregex_t\fR for compiled internal forms, and
\fIregmatch_t\fR for returning captured substrings. It also defines some
constants whose names start with "REG_"; these are used for setting options and
identifying error codes.
.SH COMPILING A PATTERN
.rs
.sp
The function \fBregcomp()\fR is called to compile a pattern into an
internal form. The pattern is a C string terminated by a binary zero, and
is passed in the argument \fIpattern\fR. The \fIpreg\fR argument is a pointer
to a regex_t structure which is used as a base for storing information about
the compiled expression.
The argument \fIcflags\fR is either zero, or contains one or more of the bits
defined by the following macros:
REG_ICASE
The PCRE_CASELESS option is set when the expression is passed for compilation
to the native function.
REG_NEWLINE
The PCRE_MULTILINE option is set when the expression is passed for compilation
to the native function. Note that this does \fInot\fR mimic the defined POSIX
behaviour for REG_NEWLINE (see the following section).
In the absence of these flags, no options are passed to the native function.
This means the the regex is compiled with PCRE default semantics. In
particular, the way it handles newline characters in the subject string is the
Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only
\fIsome\fR of the effects specified for REG_NEWLINE. It does not affect the way
newlines are matched by . (they aren't) or by a negative class such as [^a]
(they are).
The yield of \fBregcomp()\fR is zero on success, and non-zero otherwise. The
\fIpreg\fR structure is filled in on success, and one member of the structure
is public: \fIre_nsub\fR contains the number of capturing subpatterns in
the regular expression. Various error codes are defined in the header file.
.SH MATCHING NEWLINE CHARACTERS
.rs
.sp
This area is not simple, because POSIX and Perl take different views of things.
It is not possible to get PCRE to obey POSIX semantics, but then PCRE was never
intended to be a POSIX engine. The following table lists the different
possibilities for matching newline characters in PCRE:
Default Change with
. matches newline no PCRE_DOTALL
newline matches [^a] yes not changeable
$ matches \\n at end yes PCRE_DOLLARENDONLY
$ matches \\n in middle no PCRE_MULTILINE
^ matches \\n in middle no PCRE_MULTILINE
This is the equivalent table for POSIX:
Default Change with
. matches newline yes REG_NEWLINE
newline matches [^a] yes REG_NEWLINE
$ matches \\n at end no REG_NEWLINE
$ matches \\n in middle no REG_NEWLINE
^ matches \\n in middle no REG_NEWLINE
PCRE's behaviour is the same as Perl's, except that there is no equivalent for
PCRE_DOLLARENDONLY in Perl. In both PCRE and Perl, there is no way to stop
newline from matching [^a].
The default POSIX newline handling can be obtained by setting PCRE_DOTALL and
PCRE_DOLLARENDONLY, but there is no way to make PCRE behave exactly as for the
REG_NEWLINE action.
.SH MATCHING A PATTERN
.rs
.sp
The function \fBregexec()\fR is called to match a pre-compiled pattern
\fIpreg\fR against a given \fIstring\fR, which is terminated by a zero byte,
subject to the options in \fIeflags\fR. These can be:
REG_NOTBOL
The PCRE_NOTBOL option is set when calling the underlying PCRE matching
function.
REG_NOTEOL
The PCRE_NOTEOL option is set when calling the underlying PCRE matching
function.
The portion of the string that was matched, and also any captured substrings,
are returned via the \fIpmatch\fR argument, which points to an array of
\fInmatch\fR structures of type \fIregmatch_t\fR, containing the members
\fIrm_so\fR and \fIrm_eo\fR. These contain the offset to the first character of
each substring and the offset to the first character after the end of each
substring, respectively. The 0th element of the vector relates to the entire
portion of \fIstring\fR that was matched; subsequent elements relate to the
capturing subpatterns of the regular expression. Unused entries in the array
have both structure members set to -1.
A successful match yields a zero return; various error codes are defined in the
header file, of which REG_NOMATCH is the "expected" failure code.
.SH ERROR MESSAGES
.rs
.sp
The \fBregerror()\fR function maps a non-zero errorcode from either
\fBregcomp()\fR or \fBregexec()\fR to a printable message. If \fIpreg\fR is not
NULL, the error should have arisen from the use of that structure. A message
terminated by a binary zero is placed in \fIerrbuf\fR. The length of the
message, including the zero, is limited to \fIerrbuf_size\fR. The yield of the
function is the size of buffer needed to hold the whole message.
.SH STORAGE
.rs
.sp
Compiling a regular expression causes memory to be allocated and associated
with the \fIpreg\fR structure. The function \fBregfree()\fR frees all such
memory, after which \fIpreg\fR may no longer be used as a compiled expression.
.SH AUTHOR
.rs
.sp
Philip Hazel <ph10@cam.ac.uk>
.br
University Computing Service,
.br
Cambridge CB2 3QG, England.
.in 0
Last updated: 03 February 2003
.br
Copyright (c) 1997-2003 University of Cambridge.

View File

@ -1,52 +0,0 @@
.TH PCRE 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH PCRE SAMPLE PROGRAM
.rs
.sp
A simple, complete demonstration program, to get you started with using PCRE,
is supplied in the file \fIpcredemo.c\fR in the PCRE distribution.
The program compiles the regular expression that is its first argument, and
matches it against the subject string in its second argument. No PCRE options
are set, and default character tables are used. If matching succeeds, the
program outputs the portion of the subject that matched, together with the
contents of any captured substrings.
If the -g option is given on the command line, the program then goes on to
check for further matches of the same regular expression in the same subject
string. The logic is a little bit tricky because of the possibility of matching
an empty string. Comments in the code explain what is going on.
On a Unix system that has PCRE installed in \fI/usr/local\fR, you can compile
the demonstration program using a command like this:
gcc -o pcredemo pcredemo.c -I/usr/local/include \\
-L/usr/local/lib -lpcre
Then you can run simple tests like this:
./pcredemo 'cat|dog' 'the cat sat on the mat'
./pcredemo -g 'cat|dog' 'the dog sat on the cat'
Note that there is a much more comprehensive test program, called
\fBpcretest\fR, which supports many more facilities for testing regular
expressions and the PCRE library. The \fBpcredemo\fR program is provided as a
simple coding example.
On some operating systems (e.g. Solaris) you may get an error like this when
you try to run \fBpcredemo\fR:
ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such file or directory
This is caused by the way shared library support works on those systems. You
need to add
-R/usr/local/lib
to the compile command to get round this problem.
.in 0
Last updated: 28 January 2003
.br
Copyright (c) 1997-2003 University of Cambridge.

View File

@ -1,364 +0,0 @@
.TH PCRETEST 1
.SH NAME
pcretest - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS
.B pcretest "[-d] [-i] [-m] [-o osize] [-p] [-t] [source] [destination]"
\fBpcretest\fR was written as a test program for the PCRE regular expression
library itself, but it can also be used for experimenting with regular
expressions. This document describes the features of the test program; for
details of the regular expressions themselves, see the
.\" HREF
\fBpcrepattern\fR
.\"
documentation. For details of PCRE and its options, see the
.\" HREF
\fBpcreapi\fR
.\"
documentation.
.SH OPTIONS
.rs
.sp
.TP 10
\fB-C\fR
Output the version number of the PCRE library, and all available information
about the optional features that are included, and then exit.
.TP 10
\fB-d\fR
Behave as if each regex had the \fB/D\fR modifier (see below); the internal
form is output after compilation.
.TP 10
\fB-i\fR
Behave as if each regex had the \fB/I\fR modifier; information about the
compiled pattern is given after compilation.
.TP 10
\fB-m\fR
Output the size of each compiled pattern after it has been compiled. This is
equivalent to adding /M to each regular expression. For compatibility with
earlier versions of pcretest, \fB-s\fR is a synonym for \fB-m\fR.
.TP 10
\fB-o\fR \fIosize\fR
Set the number of elements in the output vector that is used when calling PCRE
to be \fIosize\fR. The default value is 45, which is enough for 14 capturing
subexpressions. The vector size can be changed for individual matching calls by
including \\O in the data line (see below).
.TP 10
\fB-p\fR
Behave as if each regex has \fB/P\fR modifier; the POSIX wrapper API is used
to call PCRE. None of the other options has any effect when \fB-p\fR is set.
.TP 10
\fB-t\fR
Run each compile, study, and match many times with a timer, and output
resulting time per compile or match (in milliseconds). Do not set \fB-t\fR with
\fB-m\fR, because you will then get the size output 20000 times and the timing
will be distorted.
.SH DESCRIPTION
.rs
.sp
If \fBpcretest\fR is given two filename arguments, it reads from the first and
writes to the second. If it is given only one filename argument, it reads from
that file and writes to stdout. Otherwise, it reads from stdin and writes to
stdout, and prompts for each line of input, using "re>" to prompt for regular
expressions, and "data>" to prompt for data lines.
The program handles any number of sets of input on a single input file. Each
set starts with a regular expression, and continues with any number of data
lines to be matched against the pattern.
Each line is matched separately and independently. If you want to do
multiple-line matches, you have to use the \\n escape sequence in a single line
of input to encode the newline characters. The maximum length of data line is
30,000 characters.
An empty line signals the end of the data lines, at which point a new regular
expression is read. The regular expressions are given enclosed in any
non-alphameric delimiters other than backslash, for example
/(a|bc)x+yz/
White space before the initial delimiter is ignored. A regular expression may
be continued over several input lines, in which case the newline characters are
included within it. It is possible to include the delimiter within the pattern
by escaping it, for example
/abc\\/def/
If you do so, the escape and the delimiter form part of the pattern, but since
delimiters are always non-alphameric, this does not affect its interpretation.
If the terminating delimiter is immediately followed by a backslash, for
example,
/abc/\\
then a backslash is added to the end of the pattern. This is done to provide a
way of testing the error condition that arises if a pattern finishes with a
backslash, because
/abc\\/
is interpreted as the first line of a pattern that starts with "abc/", causing
pcretest to read the next line as a continuation of the regular expression.
.SH PATTERN MODIFIERS
.rs
.sp
The pattern may be followed by \fBi\fR, \fBm\fR, \fBs\fR, or \fBx\fR to set the
PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options,
respectively. For example:
/caseless/i
These modifier letters have the same effect as they do in Perl. There are
others that set PCRE options that do not correspond to anything in Perl:
\fB/A\fR, \fB/E\fR, \fB/N\fR, \fB/U\fR, and \fB/X\fR set PCRE_ANCHORED,
PCRE_DOLLAR_ENDONLY, PCRE_NO_AUTO_CAPTURE, PCRE_UNGREEDY, and PCRE_EXTRA
respectively.
Searching for all possible matches within each subject string can be requested
by the \fB/g\fR or \fB/G\fR modifier. After finding a match, PCRE is called
again to search the remainder of the subject string. The difference between
\fB/g\fR and \fB/G\fR is that the former uses the \fIstartoffset\fR argument to
\fBpcre_exec()\fR to start searching at a new point within the entire string
(which is in effect what Perl does), whereas the latter passes over a shortened
substring. This makes a difference to the matching process if the pattern
begins with a lookbehind assertion (including \\b or \\B).
If any call to \fBpcre_exec()\fR in a \fB/g\fR or \fB/G\fR sequence matches an
empty string, the next call is done with the PCRE_NOTEMPTY and PCRE_ANCHORED
flags set in order to search for another, non-empty, match at the same point.
If this second match fails, the start offset is advanced by one, and the normal
match is retried. This imitates the way Perl handles such cases when using the
\fB/g\fR modifier or the \fBsplit()\fR function.
There are a number of other modifiers for controlling the way \fBpcretest\fR
operates.
The \fB/+\fR modifier requests that as well as outputting the substring that
matched the entire pattern, pcretest should in addition output the remainder of
the subject string. This is useful for tests where the subject contains
multiple copies of the same substring.
The \fB/L\fR modifier must be followed directly by the name of a locale, for
example,
/pattern/Lfr
For this reason, it must be the last modifier letter. The given locale is set,
\fBpcre_maketables()\fR is called to build a set of character tables for the
locale, and this is then passed to \fBpcre_compile()\fR when compiling the
regular expression. Without an \fB/L\fR modifier, NULL is passed as the tables
pointer; that is, \fB/L\fR applies only to the expression on which it appears.
The \fB/I\fR modifier requests that \fBpcretest\fR output information about the
compiled expression (whether it is anchored, has a fixed first character, and
so on). It does this by calling \fBpcre_fullinfo()\fR after compiling an
expression, and outputting the information it gets back. If the pattern is
studied, the results of that are also output.
The \fB/D\fR modifier is a PCRE debugging feature, which also assumes \fB/I\fR.
It causes the internal form of compiled regular expressions to be output after
compilation. If the pattern was studied, the information returned is also
output.
The \fB/S\fR modifier causes \fBpcre_study()\fR to be called after the
expression has been compiled, and the results used when the expression is
matched.
The \fB/M\fR modifier causes the size of memory block used to hold the compiled
pattern to be output.
The \fB/P\fR modifier causes \fBpcretest\fR to call PCRE via the POSIX wrapper
API rather than its native API. When this is done, all other modifiers except
\fB/i\fR, \fB/m\fR, and \fB/+\fR are ignored. REG_ICASE is set if \fB/i\fR is
present, and REG_NEWLINE is set if \fB/m\fR is present. The wrapper functions
force PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.
The \fB/8\fR modifier causes \fBpcretest\fR to call PCRE with the PCRE_UTF8
option set. This turns on support for UTF-8 character handling in PCRE,
provided that it was compiled with this support enabled. This modifier also
causes any non-printing characters in output strings to be printed using the
\\x{hh...} notation if they are valid UTF-8 sequences.
If the \fB/?\fR modifier is used with \fB/8\fR, it causes \fBpcretest\fR to
call \fBpcre_compile()\fR with the PCRE_NO_UTF8_CHECK option, to suppress the
checking of the string for UTF-8 validity.
.SH CALLOUTS
.rs
.sp
If the pattern contains any callout requests, \fBpcretest\fR's callout function
will be called. By default, it displays the callout number, and the start and
current positions in the text at the callout time. For example, the output
--->pqrabcdef
0 ^ ^
indicates that callout number 0 occurred for a match attempt starting at the
fourth character of the subject string, when the pointer was at the seventh
character. The callout function returns zero (carry on matching) by default.
Inserting callouts may be helpful when using \fBpcretest\fR to check
complicated regular expressions. For further information about callouts, see
the
.\" HREF
\fBpcrecallout\fR
.\"
documentation.
For testing the PCRE library, additional control of callout behaviour is
available via escape sequences in the data, as described in the following
section. In particular, it is possible to pass in a number as callout data (the
default is zero). If the callout function receives a non-zero number, it
returns that value instead of zero.
.SH DATA LINES
.rs
.sp
Before each data line is passed to \fBpcre_exec()\fR, leading and trailing
whitespace is removed, and it is then scanned for \\ escapes. Some of these are
pretty esoteric features, intended for checking out some of the more
complicated features of PCRE. If you are just testing "ordinary" regular
expressions, you probably don't need any of these. The following escapes are
recognized:
\\a alarm (= BEL)
\\b backspace
\\e escape
\\f formfeed
\\n newline
\\r carriage return
\\t tab
\\v vertical tab
\\nnn octal character (up to 3 octal digits)
\\xhh hexadecimal character (up to 2 hex digits)
\\x{hh...} hexadecimal character, any number of digits
in UTF-8 mode
\\A pass the PCRE_ANCHORED option to \fBpcre_exec()\fR
\\B pass the PCRE_NOTBOL option to \fBpcre_exec()\fR
\\Cdd call pcre_copy_substring() for substring dd
after a successful match (any decimal number
less than 32)
\\Cname call pcre_copy_named_substring() for substring
"name" after a successful match (name termin-
ated by next non alphanumeric character)
\\C+ show the current captured substrings at callout
time
\\C- do not supply a callout function
\\C!n return 1 instead of 0 when callout number n is
reached
\\C!n!m return 1 instead of 0 when callout number n is
reached for the nth time
\\C*n pass the number n (may be negative) as callout
data
\\Gdd call pcre_get_substring() for substring dd
after a successful match (any decimal number
less than 32)
\\Gname call pcre_get_named_substring() for substring
"name" after a successful match (name termin-
ated by next non-alphanumeric character)
\\L call pcre_get_substringlist() after a
successful match
\\M discover the minimum MATCH_LIMIT setting
\\N pass the PCRE_NOTEMPTY option to \fBpcre_exec()\fR
\\Odd set the size of the output vector passed to
\fBpcre_exec()\fR to dd (any number of decimal
digits)
\\S output details of memory get/free calls during matching
\\Z pass the PCRE_NOTEOL option to \fBpcre_exec()\fR
\\? pass the PCRE_NO_UTF8_CHECK option to
\fBpcre_exec()\fR
If \\M is present, \fBpcretest\fR calls \fBpcre_exec()\fR several times, with
different values in the \fImatch_limit\fR field of the \fBpcre_extra\fR data
structure, until it finds the minimum number that is needed for
\fBpcre_exec()\fR to complete. This number is a measure of the amount of
recursion and backtracking that takes place, and checking it out can be
instructive. For most simple matches, the number is quite small, but for
patterns with very large numbers of matching possibilities, it can become large
very quickly with increasing length of subject string.
When \\O is used, it may be higher or lower than the size set by the \fB-O\fR
option (or defaulted to 45); \\O applies only to the call of \fBpcre_exec()\fR
for the line in which it appears.
A backslash followed by anything else just escapes the anything else. If the
very last character is a backslash, it is ignored. This gives a way of passing
an empty line as data, since a real empty line terminates the data input.
If \fB/P\fR was present on the regex, causing the POSIX wrapper API to be used,
only \fB\B\fR, and \fB\Z\fR have any effect, causing REG_NOTBOL and REG_NOTEOL
to be passed to \fBregexec()\fR respectively.
The use of \\x{hh...} to represent UTF-8 characters is not dependent on the use
of the \fB/8\fR modifier on the pattern. It is recognized always. There may be
any number of hexadecimal digits inside the braces. The result is from one to
six bytes, encoded according to the UTF-8 rules.
.SH OUTPUT FROM PCRETEST
.rs
.sp
When a match succeeds, pcretest outputs the list of captured substrings that
\fBpcre_exec()\fR returns, starting with number 0 for the string that matched
the whole pattern. Here is an example of an interactive pcretest run.
$ pcretest
PCRE version 4.00 08-Jan-2003
re> /^abc(\\d+)/
data> abc123
0: abc123
1: 123
data> xyz
No match
If the strings contain any non-printing characters, they are output as \\0x
escapes, or as \\x{...} escapes if the \fB/8\fR modifier was present on the
pattern. If the pattern has the \fB/+\fR modifier, then the output for
substring 0 is followed by the the rest of the subject string, identified by
"0+" like this:
re> /cat/+
data> cataract
0: cat
0+ aract
If the pattern has the \fB/g\fR or \fB/G\fR modifier, the results of successive
matching attempts are output in sequence, like this:
re> /\\Bi(\\w\\w)/g
data> Mississippi
0: iss
1: ss
0: iss
1: ss
0: ipp
1: pp
"No match" is output only if the first match attempt fails.
If any of the sequences \fB\\C\fR, \fB\\G\fR, or \fB\\L\fR are present in a
data line that is successfully matched, the substrings extracted by the
convenience functions are output with C, G, or L after the string number
instead of a colon. This is in addition to the normal full list. The string
length (that is, the return from the extraction function) is given in
parentheses after each string for \fB\\C\fR and \fB\\G\fR.
Note that while patterns can be continued over several lines (a plain ">"
prompt is used for continuations), data lines may not. However newlines can be
included in data by means of the \\n escape.
.SH AUTHOR
.rs
.sp
Philip Hazel <ph10@cam.ac.uk>
.br
University Computing Service,
.br
Cambridge CB2 3QG, England.
.in 0
Last updated: 09 December 2003
.br
Copyright (c) 1997-2003 University of Cambridge.

View File

@ -1,357 +0,0 @@
PCRETEST(1) PCRETEST(1)
NAME
pcretest - a program for testing Perl-compatible regular expressions.
SYNOPSIS
pcretest [-d] [-i] [-m] [-o osize] [-p] [-t] [source] [destination]
pcretest was written as a test program for the PCRE regular expression
library itself, but it can also be used for experimenting with regular
expressions. This document describes the features of the test program;
for details of the regular expressions themselves, see the pcrepattern
documentation. For details of PCRE and its options, see the pcreapi
documentation.
OPTIONS
-C Output the version number of the PCRE library, and all avail-
able information about the optional features that are
included, and then exit.
-d Behave as if each regex had the /D modifier (see below); the
internal form is output after compilation.
-i Behave as if each regex had the /I modifier; information
about the compiled pattern is given after compilation.
-m Output the size of each compiled pattern after it has been
compiled. This is equivalent to adding /M to each regular
expression. For compatibility with earlier versions of
pcretest, -s is a synonym for -m.
-o osize Set the number of elements in the output vector that is used
when calling PCRE to be osize. The default value is 45, which
is enough for 14 capturing subexpressions. The vector size
can be changed for individual matching calls by including \O
in the data line (see below).
-p Behave as if each regex has /P modifier; the POSIX wrapper
API is used to call PCRE. None of the other options has any
effect when -p is set.
-t Run each compile, study, and match many times with a timer,
and output resulting time per compile or match (in millisec-
onds). Do not set -t with -m, because you will then get the
size output 20000 times and the timing will be distorted.
DESCRIPTION
If pcretest is given two filename arguments, it reads from the first
and writes to the second. If it is given only one filename argument, it
reads from that file and writes to stdout. Otherwise, it reads from
stdin and writes to stdout, and prompts for each line of input, using
"re>" to prompt for regular expressions, and "data>" to prompt for data
lines.
The program handles any number of sets of input on a single input file.
Each set starts with a regular expression, and continues with any num-
ber of data lines to be matched against the pattern.
Each line is matched separately and independently. If you want to do
multiple-line matches, you have to use the \n escape sequence in a sin-
gle line of input to encode the newline characters. The maximum length
of data line is 30,000 characters.
An empty line signals the end of the data lines, at which point a new
regular expression is read. The regular expressions are given enclosed
in any non-alphameric delimiters other than backslash, for example
/(a|bc)x+yz/
White space before the initial delimiter is ignored. A regular expres-
sion may be continued over several input lines, in which case the new-
line characters are included within it. It is possible to include the
delimiter within the pattern by escaping it, for example
/abc\/def/
If you do so, the escape and the delimiter form part of the pattern,
but since delimiters are always non-alphameric, this does not affect
its interpretation. If the terminating delimiter is immediately fol-
lowed by a backslash, for example,
/abc/\
then a backslash is added to the end of the pattern. This is done to
provide a way of testing the error condition that arises if a pattern
finishes with a backslash, because
/abc\/
is interpreted as the first line of a pattern that starts with "abc/",
causing pcretest to read the next line as a continuation of the regular
expression.
PATTERN MODIFIERS
The pattern may be followed by i, m, s, or x to set the PCRE_CASELESS,
PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively.
For example:
/caseless/i
These modifier letters have the same effect as they do in Perl. There
are others that set PCRE options that do not correspond to anything in
Perl: /A, /E, /N, /U, and /X set PCRE_ANCHORED, PCRE_DOLLAR_ENDONLY,
PCRE_NO_AUTO_CAPTURE, PCRE_UNGREEDY, and PCRE_EXTRA respectively.
Searching for all possible matches within each subject string can be
requested by the /g or /G modifier. After finding a match, PCRE is
called again to search the remainder of the subject string. The differ-
ence between /g and /G is that the former uses the startoffset argument
to pcre_exec() to start searching at a new point within the entire
string (which is in effect what Perl does), whereas the latter passes
over a shortened substring. This makes a difference to the matching
process if the pattern begins with a lookbehind assertion (including \b
or \B).
If any call to pcre_exec() in a /g or /G sequence matches an empty
string, the next call is done with the PCRE_NOTEMPTY and PCRE_ANCHORED
flags set in order to search for another, non-empty, match at the same
point. If this second match fails, the start offset is advanced by
one, and the normal match is retried. This imitates the way Perl han-
dles such cases when using the /g modifier or the split() function.
There are a number of other modifiers for controlling the way pcretest
operates.
The /+ modifier requests that as well as outputting the substring that
matched the entire pattern, pcretest should in addition output the
remainder of the subject string. This is useful for tests where the
subject contains multiple copies of the same substring.
The /L modifier must be followed directly by the name of a locale, for
example,
/pattern/Lfr
For this reason, it must be the last modifier letter. The given locale
is set, pcre_maketables() is called to build a set of character tables
for the locale, and this is then passed to pcre_compile() when compil-
ing the regular expression. Without an /L modifier, NULL is passed as
the tables pointer; that is, /L applies only to the expression on which
it appears.
The /I modifier requests that pcretest output information about the
compiled expression (whether it is anchored, has a fixed first charac-
ter, and so on). It does this by calling pcre_fullinfo() after compil-
ing an expression, and outputting the information it gets back. If the
pattern is studied, the results of that are also output.
The /D modifier is a PCRE debugging feature, which also assumes /I. It
causes the internal form of compiled regular expressions to be output
after compilation. If the pattern was studied, the information returned
is also output.
The /S modifier causes pcre_study() to be called after the expression
has been compiled, and the results used when the expression is matched.
The /M modifier causes the size of memory block used to hold the com-
piled pattern to be output.
The /P modifier causes pcretest to call PCRE via the POSIX wrapper API
rather than its native API. When this is done, all other modifiers
except /i, /m, and /+ are ignored. REG_ICASE is set if /i is present,
and REG_NEWLINE is set if /m is present. The wrapper functions force
PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.
The /8 modifier causes pcretest to call PCRE with the PCRE_UTF8 option
set. This turns on support for UTF-8 character handling in PCRE, pro-
vided that it was compiled with this support enabled. This modifier
also causes any non-printing characters in output strings to be printed
using the \x{hh...} notation if they are valid UTF-8 sequences.
If the /? modifier is used with /8, it causes pcretest to call
pcre_compile() with the PCRE_NO_UTF8_CHECK option, to suppress the
checking of the string for UTF-8 validity.
CALLOUTS
If the pattern contains any callout requests, pcretest's callout func-
tion will be called. By default, it displays the callout number, and
the start and current positions in the text at the callout time. For
example, the output
--->pqrabcdef
0 ^ ^
indicates that callout number 0 occurred for a match attempt starting
at the fourth character of the subject string, when the pointer was at
the seventh character. The callout function returns zero (carry on
matching) by default.
Inserting callouts may be helpful when using pcretest to check compli-
cated regular expressions. For further information about callouts, see
the pcrecallout documentation.
For testing the PCRE library, additional control of callout behaviour
is available via escape sequences in the data, as described in the fol-
lowing section. In particular, it is possible to pass in a number as
callout data (the default is zero). If the callout function receives a
non-zero number, it returns that value instead of zero.
DATA LINES
Before each data line is passed to pcre_exec(), leading and trailing
whitespace is removed, and it is then scanned for \ escapes. Some of
these are pretty esoteric features, intended for checking out some of
the more complicated features of PCRE. If you are just testing "ordi-
nary" regular expressions, you probably don't need any of these. The
following escapes are recognized:
\a alarm (= BEL)
\b backspace
\e escape
\f formfeed
\n newline
\r carriage return
\t tab
\v vertical tab
\nnn octal character (up to 3 octal digits)
\xhh hexadecimal character (up to 2 hex digits)
\x{hh...} hexadecimal character, any number of digits
in UTF-8 mode
\A pass the PCRE_ANCHORED option to pcre_exec()
\B pass the PCRE_NOTBOL option to pcre_exec()
\Cdd call pcre_copy_substring() for substring dd
after a successful match (any decimal number
less than 32)
\Cname call pcre_copy_named_substring() for substring
"name" after a successful match (name termin-
ated by next non alphanumeric character)
\C+ show the current captured substrings at callout
time
\C- do not supply a callout function
\C!n return 1 instead of 0 when callout number n is
reached
\C!n!m return 1 instead of 0 when callout number n is
reached for the nth time
\C*n pass the number n (may be negative) as callout
data
\Gdd call pcre_get_substring() for substring dd
after a successful match (any decimal number
less than 32)
\Gname call pcre_get_named_substring() for substring
"name" after a successful match (name termin-
ated by next non-alphanumeric character)
\L call pcre_get_substringlist() after a
successful match
\M discover the minimum MATCH_LIMIT setting
\N pass the PCRE_NOTEMPTY option to pcre_exec()
\Odd set the size of the output vector passed to
pcre_exec() to dd (any number of decimal
digits)
\S output details of memory get/free calls during matching
\Z pass the PCRE_NOTEOL option to pcre_exec()
\? pass the PCRE_NO_UTF8_CHECK option to
pcre_exec()
If \M is present, pcretest calls pcre_exec() several times, with dif-
ferent values in the match_limit field of the pcre_extra data struc-
ture, until it finds the minimum number that is needed for pcre_exec()
to complete. This number is a measure of the amount of recursion and
backtracking that takes place, and checking it out can be instructive.
For most simple matches, the number is quite small, but for patterns
with very large numbers of matching possibilities, it can become large
very quickly with increasing length of subject string.
When \O is used, it may be higher or lower than the size set by the -O
option (or defaulted to 45); \O applies only to the call of pcre_exec()
for the line in which it appears.
A backslash followed by anything else just escapes the anything else.
If the very last character is a backslash, it is ignored. This gives a
way of passing an empty line as data, since a real empty line termi-
nates the data input.
If /P was present on the regex, causing the POSIX wrapper API to be
used, only 0 causing REG_NOTBOL and REG_NOTEOL to be passed to
regexec() respectively.
The use of \x{hh...} to represent UTF-8 characters is not dependent on
the use of the /8 modifier on the pattern. It is recognized always.
There may be any number of hexadecimal digits inside the braces. The
result is from one to six bytes, encoded according to the UTF-8 rules.
OUTPUT FROM PCRETEST
When a match succeeds, pcretest outputs the list of captured substrings
that pcre_exec() returns, starting with number 0 for the string that
matched the whole pattern. Here is an example of an interactive
pcretest run.
$ pcretest
PCRE version 4.00 08-Jan-2003
re> /^abc(\d+)/
data> abc123
0: abc123
1: 123
data> xyz
No match
If the strings contain any non-printing characters, they are output as
\0x escapes, or as \x{...} escapes if the /8 modifier was present on
the pattern. If the pattern has the /+ modifier, then the output for
substring 0 is followed by the the rest of the subject string, identi-
fied by "0+" like this:
re> /cat/+
data> cataract
0: cat
0+ aract
If the pattern has the /g or /G modifier, the results of successive
matching attempts are output in sequence, like this:
re> /\Bi(\w\w)/g
data> Mississippi
0: iss
1: ss
0: iss
1: ss
0: ipp
1: pp
"No match" is output only if the first match attempt fails.
If any of the sequences \C, \G, or \L are present in a data line that
is successfully matched, the substrings extracted by the convenience
functions are output with C, G, or L after the string number instead of
a colon. This is in addition to the normal full list. The string length
(that is, the return from the extraction function) is given in paren-
theses after each string for \C and \G.
Note that while patterns can be continued over several lines (a plain
">" prompt is used for continuations), data lines may not. However new-
lines can be included in data by means of the \n escape.
AUTHOR
Philip Hazel <ph10@cam.ac.uk>
University Computing Service,
Cambridge CB2 3QG, England.
Last updated: 09 December 2003
Copyright (c) 1997-2003 University of Cambridge.

View File

@ -1,34 +0,0 @@
The perltest program
--------------------
The perltest program tests Perl's regular expressions; it has the same
specification as pcretest, and so can be given identical input, except that
input patterns can be followed only by Perl's lower case modifiers and /+ (as
used by pcretest), which is recognized and handled by the program.
The data lines are processed as Perl double-quoted strings, so if they contain
" \ $ or @ characters, these have to be escaped. For this reason, all such
characters in testinput1 and testinput3 are escaped so that they can be used
for perltest as well as for pcretest, and the special upper case modifiers such
as /A that pcretest recognizes are not used in these files. The output should
be identical, apart from the initial identifying banner.
The perltest script can also test UTF-8 features. It works as is for Perl 5.8
or higher. It recognizes the special modifier /8 that pcretest uses to invoke
UTF-8 functionality. The testinput5 file can be fed to perltest to run UTF-8
tests.
For Perl 5.6, perltest won't work unmodified for the UTF-8 tests. You need to
uncomment the "use utf8" lines that it contains. It is best to do this on a
copy of the script, because for non-UTF-8 tests, these lines should remain
commented out.
The testinput2 and testinput4 files are not suitable for feeding to perltest,
since they do make use of the special upper case modifiers and escapes that
pcretest uses to test some features of PCRE. The first of these files also
contains malformed regular expressions, in order to check that PCRE diagnoses
them correctly. Similarly, testinput6 tests UTF-8 features that do not relate
to Perl.
Philip Hazel <ph10@cam.ac.uk>
August 2002

View File

@ -1,251 +0,0 @@
#!/bin/sh
#
# install - install a program, script, or datafile
# This comes from X11R5 (mit/util/scripts/install.sh).
#
# Copyright 1991 by the Massachusetts Institute of Technology
#
# Permission to use, copy, modify, distribute, and sell this software and its
# documentation for any purpose is hereby granted without fee, provided that
# the above copyright notice appear in all copies and that both that
# copyright notice and this permission notice appear in supporting
# documentation, and that the name of M.I.T. not be used in advertising or
# publicity pertaining to distribution of the software without specific,
# written prior permission. M.I.T. makes no representations about the
# suitability of this software for any purpose. It is provided "as is"
# without express or implied warranty.
#
# Calling this script install-sh is preferred over install.sh, to prevent
# `make' implicit rules from creating a file called install from it
# when there is no Makefile.
#
# This script is compatible with the BSD install script, but was written
# from scratch. It can only install one file at a time, a restriction
# shared with many OS's install programs.
# set DOITPROG to echo to test this script
# Don't use :- since 4.3BSD and earlier shells don't like it.
doit="${DOITPROG-}"
# put in absolute paths if you don't have them in your path; or use env. vars.
mvprog="${MVPROG-mv}"
cpprog="${CPPROG-cp}"
chmodprog="${CHMODPROG-chmod}"
chownprog="${CHOWNPROG-chown}"
chgrpprog="${CHGRPPROG-chgrp}"
stripprog="${STRIPPROG-strip}"
rmprog="${RMPROG-rm}"
mkdirprog="${MKDIRPROG-mkdir}"
transformbasename=""
transform_arg=""
instcmd="$mvprog"
chmodcmd="$chmodprog 0755"
chowncmd=""
chgrpcmd=""
stripcmd=""
rmcmd="$rmprog -f"
mvcmd="$mvprog"
src=""
dst=""
dir_arg=""
while [ x"$1" != x ]; do
case $1 in
-c) instcmd="$cpprog"
shift
continue;;
-d) dir_arg=true
shift
continue;;
-m) chmodcmd="$chmodprog $2"
shift
shift
continue;;
-o) chowncmd="$chownprog $2"
shift
shift
continue;;
-g) chgrpcmd="$chgrpprog $2"
shift
shift
continue;;
-s) stripcmd="$stripprog"
shift
continue;;
-t=*) transformarg=`echo $1 | sed 's/-t=//'`
shift
continue;;
-b=*) transformbasename=`echo $1 | sed 's/-b=//'`
shift
continue;;
*) if [ x"$src" = x ]
then
src=$1
else
# this colon is to work around a 386BSD /bin/sh bug
:
dst=$1
fi
shift
continue;;
esac
done
if [ x"$src" = x ]
then
echo "install: no input file specified"
exit 1
else
true
fi
if [ x"$dir_arg" != x ]; then
dst=$src
src=""
if [ -d $dst ]; then
instcmd=:
chmodcmd=""
else
instcmd=mkdir
fi
else
# Waiting for this to be detected by the "$instcmd $src $dsttmp" command
# might cause directories to be created, which would be especially bad
# if $src (and thus $dsttmp) contains '*'.
if [ -f $src -o -d $src ]
then
true
else
echo "install: $src does not exist"
exit 1
fi
if [ x"$dst" = x ]
then
echo "install: no destination specified"
exit 1
else
true
fi
# If destination is a directory, append the input filename; if your system
# does not like double slashes in filenames, you may need to add some logic
if [ -d $dst ]
then
dst="$dst"/`basename $src`
else
true
fi
fi
## this sed command emulates the dirname command
dstdir=`echo $dst | sed -e 's,[^/]*$,,;s,/$,,;s,^$,.,'`
# Make sure that the destination directory exists.
# this part is taken from Noah Friedman's mkinstalldirs script
# Skip lots of stat calls in the usual case.
if [ ! -d "$dstdir" ]; then
defaultIFS='
'
IFS="${IFS-${defaultIFS}}"
oIFS="${IFS}"
# Some sh's can't handle IFS=/ for some reason.
IFS='%'
set - `echo ${dstdir} | sed -e 's@/@%@g' -e 's@^%@/@'`
IFS="${oIFS}"
pathcomp=''
while [ $# -ne 0 ] ; do
pathcomp="${pathcomp}${1}"
shift
if [ ! -d "${pathcomp}" ] ;
then
$mkdirprog "${pathcomp}"
else
true
fi
pathcomp="${pathcomp}/"
done
fi
if [ x"$dir_arg" != x ]
then
$doit $instcmd $dst &&
if [ x"$chowncmd" != x ]; then $doit $chowncmd $dst; else true ; fi &&
if [ x"$chgrpcmd" != x ]; then $doit $chgrpcmd $dst; else true ; fi &&
if [ x"$stripcmd" != x ]; then $doit $stripcmd $dst; else true ; fi &&
if [ x"$chmodcmd" != x ]; then $doit $chmodcmd $dst; else true ; fi
else
# If we're going to rename the final executable, determine the name now.
if [ x"$transformarg" = x ]
then
dstfile=`basename $dst`
else
dstfile=`basename $dst $transformbasename |
sed $transformarg`$transformbasename
fi
# don't allow the sed command to completely eliminate the filename
if [ x"$dstfile" = x ]
then
dstfile=`basename $dst`
else
true
fi
# Make a temp file name in the proper directory.
dsttmp=$dstdir/#inst.$$#
# Move or copy the file name to the temp name
$doit $instcmd $src $dsttmp &&
trap "rm -f ${dsttmp}" 0 &&
# and set any options; do chmod last to preserve setuid bits
# If any of these fail, we abort the whole thing. If we want to
# ignore errors from any of these, just make sure not to ignore
# errors from the above "$doit $instcmd $src $dsttmp" command.
if [ x"$chowncmd" != x ]; then $doit $chowncmd $dsttmp; else true;fi &&
if [ x"$chgrpcmd" != x ]; then $doit $chgrpcmd $dsttmp; else true;fi &&
if [ x"$stripcmd" != x ]; then $doit $stripcmd $dsttmp; else true;fi &&
if [ x"$chmodcmd" != x ]; then $doit $chmodcmd $dsttmp; else true;fi &&
# Now rename the file to the real destination.
$doit $rmcmd -f $dstdir/$dstfile &&
$doit $mvcmd $dsttmp $dstdir/$dstfile
fi &&
exit 0

View File

@ -1,19 +0,0 @@
LIBRARY libpcre
EXPORTS
pcre_malloc
pcre_free
pcre_config
pcre_callout
pcre_compile
pcre_copy_substring
pcre_exec
pcre_get_substring
pcre_get_stringnumber
pcre_get_substring_list
pcre_free_substring
pcre_free_substring_list
pcre_info
pcre_fullinfo
pcre_maketables
pcre_study
pcre_version

View File

@ -1,24 +0,0 @@
LIBRARY libpcreposix
EXPORTS
pcre_malloc
pcre_free
pcre_config
pcre_callout
pcre_compile
pcre_copy_substring
pcre_exec
pcre_get_substring
pcre_get_stringnumber
pcre_get_substring_list
pcre_free_substring
pcre_free_substring_list
pcre_info
pcre_fullinfo
pcre_maketables
pcre_study
pcre_version
regcomp
regexec
regerror
regfree

File diff suppressed because it is too large Load Diff

View File

@ -1,25 +0,0 @@
@echo off
REM This file was contributed by Alexander Tokarev for building PCRE for use
REM with Virtual Pascal. It has not been tested with the latest PCRE release.
REM CHANGE THIS FOR YOUR BORLAND C++ COMPILER PATH
SET BORLAND=c:\usr\apps\bcc55
sh configure
bcc32 -DDFTABLES -DSTATIC -DVPCOMPAT -I%BORLAND%\include -L%BORLAND%\lib dftables.c
dftables > chartables.c
bcc32 -c -RT- -y- -v- -u- -P- -O2 -5 -DSTATIC -DVPCOMPAT -UDFTABLES -I%BORLAND%\include get.c maketables.c pcre.c study.c
tlib %BORLAND%\lib\cw32.lib *calloc *del *strncmp *memcpy *memmove *memset
tlib pcre.lib +get.obj +maketables.obj +pcre.obj +study.obj +calloc.obj +del.obj +strncmp.obj +memcpy.obj +memmove.obj +memset.obj
del *.obj *.exe *.tds *.bak >nul 2>nul
echo ---
echo Now the library should be complete. Please check all messages above.
echo Don't care for warnings, it's OK.

View File

@ -1,40 +0,0 @@
#! /bin/sh
# mkinstalldirs --- make directory hierarchy
# Author: Noah Friedman <friedman@prep.ai.mit.edu>
# Created: 1993-05-16
# Public domain
# $Id: mkinstalldirs,v 1.12.2.1 1998/12/26 17:32:14 bje Exp $
errstatus=0
for file
do
set fnord `echo ":$file" | sed -ne 's/^:\//#/;s/^://;s/\// /g;s/^#/\//;p'`
shift
pathcomp=
for d
do
pathcomp="$pathcomp$d"
case "$pathcomp" in
-* ) pathcomp=./$pathcomp ;;
esac
if test ! -d "$pathcomp"; then
echo "mkdir $pathcomp"
mkdir "$pathcomp" || lasterr=$?
if test ! -d "$pathcomp"; then
errstatus=$lasterr
fi
fi
pathcomp="$pathcomp/"
done
done
exit $errstatus
# mkinstalldirs ends here

View File

@ -1,22 +0,0 @@
EXPORTS
pcre_malloc DATA
pcre_free DATA
pcre_compile
pcre_copy_substring
pcre_exec
pcre_get_substring
pcre_get_substring_list
pcre_free_substring
pcre_free_substring_list
pcre_info
pcre_fullinfo
pcre_maketables
pcre_study
pcre_version
regcomp
regexec
regerror
regfree

View File

@ -1,193 +0,0 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/* Copyright (c) 1997-2003 University of Cambridge */
#ifndef _PCRE_H
#define _PCRE_H
/* The file pcre.h is build by "configure". Do not edit it; instead
make changes to pcre.in. */
#define PCRE_MAJOR @PCRE_MAJOR@
#define PCRE_MINOR @PCRE_MINOR@
#define PCRE_DATE @PCRE_DATE@
/* Win32 uses DLL by default */
#ifdef _WIN32
# ifdef PCRE_DEFINITION
# ifdef DLL_EXPORT
# define PCRE_DATA_SCOPE __declspec(dllexport)
# endif
# else
# ifndef PCRE_STATIC
# define PCRE_DATA_SCOPE extern __declspec(dllimport)
# endif
# endif
#endif
#ifndef PCRE_DATA_SCOPE
# define PCRE_DATA_SCOPE extern
#endif
/* Have to include stdlib.h in order to ensure that size_t is defined;
it is needed here for malloc. */
#include <stdlib.h>
/* Allow for C++ users */
#ifdef __cplusplus
extern "C" {
#endif
/* Options */
#define PCRE_CASELESS 0x0001
#define PCRE_MULTILINE 0x0002
#define PCRE_DOTALL 0x0004
#define PCRE_EXTENDED 0x0008
#define PCRE_ANCHORED 0x0010
#define PCRE_DOLLAR_ENDONLY 0x0020
#define PCRE_EXTRA 0x0040
#define PCRE_NOTBOL 0x0080
#define PCRE_NOTEOL 0x0100
#define PCRE_UNGREEDY 0x0200
#define PCRE_NOTEMPTY 0x0400
#define PCRE_UTF8 0x0800
#define PCRE_NO_AUTO_CAPTURE 0x1000
#define PCRE_NO_UTF8_CHECK 0x2000
/* Exec-time and get/set-time error codes */
#define PCRE_ERROR_NOMATCH (-1)
#define PCRE_ERROR_NULL (-2)
#define PCRE_ERROR_BADOPTION (-3)
#define PCRE_ERROR_BADMAGIC (-4)
#define PCRE_ERROR_UNKNOWN_NODE (-5)
#define PCRE_ERROR_NOMEMORY (-6)
#define PCRE_ERROR_NOSUBSTRING (-7)
#define PCRE_ERROR_MATCHLIMIT (-8)
#define PCRE_ERROR_CALLOUT (-9) /* Never used by PCRE itself */
#define PCRE_ERROR_BADUTF8 (-10)
#define PCRE_ERROR_BADUTF8_OFFSET (-11)
/* Request types for pcre_fullinfo() */
#define PCRE_INFO_OPTIONS 0
#define PCRE_INFO_SIZE 1
#define PCRE_INFO_CAPTURECOUNT 2
#define PCRE_INFO_BACKREFMAX 3
#define PCRE_INFO_FIRSTBYTE 4
#define PCRE_INFO_FIRSTCHAR 4 /* For backwards compatibility */
#define PCRE_INFO_FIRSTTABLE 5
#define PCRE_INFO_LASTLITERAL 6
#define PCRE_INFO_NAMEENTRYSIZE 7
#define PCRE_INFO_NAMECOUNT 8
#define PCRE_INFO_NAMETABLE 9
#define PCRE_INFO_STUDYSIZE 10
/* Request types for pcre_config() */
#define PCRE_CONFIG_UTF8 0
#define PCRE_CONFIG_NEWLINE 1
#define PCRE_CONFIG_LINK_SIZE 2
#define PCRE_CONFIG_POSIX_MALLOC_THRESHOLD 3
#define PCRE_CONFIG_MATCH_LIMIT 4
#define PCRE_CONFIG_STACKRECURSE 5
/* Bit flags for the pcre_extra structure */
#define PCRE_EXTRA_STUDY_DATA 0x0001
#define PCRE_EXTRA_MATCH_LIMIT 0x0002
#define PCRE_EXTRA_CALLOUT_DATA 0x0004
/* Types */
struct real_pcre; /* declaration; the definition is private */
typedef struct real_pcre pcre;
/* The structure for passing additional data to pcre_exec(). This is defined in
such as way as to be extensible. */
typedef struct pcre_extra {
unsigned long int flags; /* Bits for which fields are set */
void *study_data; /* Opaque data from pcre_study() */
unsigned long int match_limit; /* Maximum number of calls to match() */
void *callout_data; /* Data passed back in callouts */
} pcre_extra;
/* The structure for passing out data via the pcre_callout_function. We use a
structure so that new fields can be added on the end in future versions,
without changing the API of the function, thereby allowing old clients to work
without modification. */
typedef struct pcre_callout_block {
int version; /* Identifies version of block */
/* ------------------------ Version 0 ------------------------------- */
int callout_number; /* Number compiled into pattern */
int *offset_vector; /* The offset vector */
const char *subject; /* The subject being matched */
int subject_length; /* The length of the subject */
int start_match; /* Offset to start of this match attempt */
int current_position; /* Where we currently are */
int capture_top; /* Max current capture */
int capture_last; /* Most recently closed capture */
void *callout_data; /* Data passed in with the call */
/* ------------------------------------------------------------------ */
} pcre_callout_block;
/* Indirection for store get and free functions. These can be set to
alternative malloc/free functions if required. Special ones are used in the
non-recursive case for "frames". There is also an optional callout function
that is triggered by the (?) regex item. Some magic is required for Win32 DLL;
it is null on other OS. For Virtual Pascal, these have to be different again.
*/
#ifndef VPCOMPAT
PCRE_DATA_SCOPE void *(*pcre_malloc)(size_t);
PCRE_DATA_SCOPE void (*pcre_free)(void *);
PCRE_DATA_SCOPE void *(*pcre_stack_malloc)(size_t);
PCRE_DATA_SCOPE void (*pcre_stack_free)(void *);
PCRE_DATA_SCOPE int (*pcre_callout)(pcre_callout_block *);
#else /* VPCOMPAT */
extern void *pcre_malloc(size_t);
extern void pcre_free(void *);
extern void *pcre_stack_malloc(size_t);
extern void pcre_stack_free(void *);
extern int pcre_callout(pcre_callout_block *);
#endif /* VPCOMPAT */
/* Exported PCRE functions */
extern pcre *pcre_compile(const char *, int, const char **,
int *, const unsigned char *);
extern int pcre_config(int, void *);
extern int pcre_copy_named_substring(const pcre *, const char *,
int *, int, const char *, char *, int);
extern int pcre_copy_substring(const char *, int *, int, int,
char *, int);
extern int pcre_exec(const pcre *, const pcre_extra *,
const char *, int, int, int, int *, int);
extern void pcre_free_substring(const char *);
extern void pcre_free_substring_list(const char **);
extern int pcre_fullinfo(const pcre *, const pcre_extra *, int,
void *);
extern int pcre_get_named_substring(const pcre *, const char *,
int *, int, const char *, const char **);
extern int pcre_get_stringnumber(const pcre *, const char *);
extern int pcre_get_substring(const char *, int *, int, int,
const char **);
extern int pcre_get_substring_list(const char *, int *, int,
const char ***);
extern int pcre_info(const pcre *, int *, int *);
extern const unsigned char *pcre_maketables(void);
extern pcre_extra *pcre_study(const pcre *, int, const char **);
extern const char *pcre_version(void);
#ifdef __cplusplus
} /* extern "C" */
#endif
#endif /* End of pcre.h */

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,65 +0,0 @@
/^[\w]+/
*** Failers
École
/^[\w]+/Lfr_FR
École
/^[\w]+/
*** Failers
École
/^[\W]+/
École
/^[\W]+/Lfr_FR
*** Failers
École
/[\b]/
\b
*** Failers
a
/[\b]/Lfr_FR
\b
*** Failers
a
/^\w+/
*** Failers
École
/^\w+/Lfr_FR
École
/(.+)\b(.+)/
École
/(.+)\b(.+)/Lfr_FR
*** Failers
École
/École/i
École
*** Failers
école
/École/iLfr_FR
École
école
/\w/IS
/\w/ISLfr_FR
/^[\xc8-\xc9]/iLfr_FR
École
école
/^[\xc8-\xc9]/Lfr_FR
École
*** Failers
école
/ End of testinput3 /

View File

@ -1,517 +0,0 @@
/-- Do not use the \x{} construct except with patterns that have the --/
/-- /8 option set, because PCRE doesn't recognize them as UTF-8 unless --/
/-- that option is set. However, the latest Perls recognize them always. --/
/a.b/8
acb
a\x7fb
a\x{100}b
*** Failers
a\nb
/a(.{3})b/8
a\x{4000}xyb
a\x{4000}\x7fyb
a\x{4000}\x{100}yb
*** Failers
a\x{4000}b
ac\ncb
/a(.*?)(.)/
a\xc0\x88b
/a(.*?)(.)/8
a\x{100}b
/a(.*)(.)/
a\xc0\x88b
/a(.*)(.)/8
a\x{100}b
/a(.)(.)/
a\xc0\x92bcd
/a(.)(.)/8
a\x{240}bcd
/a(.?)(.)/
a\xc0\x92bcd
/a(.?)(.)/8
a\x{240}bcd
/a(.??)(.)/
a\xc0\x92bcd
/a(.??)(.)/8
a\x{240}bcd
/a(.{3})b/8
a\x{1234}xyb
a\x{1234}\x{4321}yb
a\x{1234}\x{4321}\x{3412}b
*** Failers
a\x{1234}b
ac\ncb
/a(.{3,})b/8
a\x{1234}xyb
a\x{1234}\x{4321}yb
a\x{1234}\x{4321}\x{3412}b
axxxxbcdefghijb
a\x{1234}\x{4321}\x{3412}\x{3421}b
*** Failers
a\x{1234}b
/a(.{3,}?)b/8
a\x{1234}xyb
a\x{1234}\x{4321}yb
a\x{1234}\x{4321}\x{3412}b
axxxxbcdefghijb
a\x{1234}\x{4321}\x{3412}\x{3421}b
*** Failers
a\x{1234}b
/a(.{3,5})b/8
a\x{1234}xyb
a\x{1234}\x{4321}yb
a\x{1234}\x{4321}\x{3412}b
axxxxbcdefghijb
a\x{1234}\x{4321}\x{3412}\x{3421}b
axbxxbcdefghijb
axxxxxbcdefghijb
*** Failers
a\x{1234}b
axxxxxxbcdefghijb
/a(.{3,5}?)b/8
a\x{1234}xyb
a\x{1234}\x{4321}yb
a\x{1234}\x{4321}\x{3412}b
axxxxbcdefghijb
a\x{1234}\x{4321}\x{3412}\x{3421}b
axbxxbcdefghijb
axxxxxbcdefghijb
*** Failers
a\x{1234}b
axxxxxxbcdefghijb
/^[a\x{c0}]/8
*** Failers
\x{100}
/(?<=aXb)cd/8
aXbcd
/(?<=a\x{100}b)cd/8
a\x{100}bcd
/(?<=a\x{100000}b)cd/8
a\x{100000}bcd
/(?:\x{100}){3}b/8
\x{100}\x{100}\x{100}b
*** Failers
\x{100}\x{100}b
/\x{ab}/8
\x{ab}
\xc2\xab
*** Failers
\x00{ab}
/(?<=(.))X/8
WXYZ
\x{256}XYZ
*** Failers
XYZ
/X(\C{3})/8
X\x{1234}
/X(\C{4})/8
X\x{1234}YZ
/X\C*/8
XYZabcdce
/X\C*?/8
XYZabcde
/X\C{3,5}/8
Xabcdefg
X\x{1234}
X\x{1234}YZ
X\x{1234}\x{512}
X\x{1234}\x{512}YZ
/X\C{3,5}?/8
Xabcdefg
X\x{1234}
X\x{1234}YZ
X\x{1234}\x{512}
/[^a]+/8g
bcd
\x{100}aY\x{256}Z
/^[^a]{2}/8
\x{100}bc
/^[^a]{2,}/8
\x{100}bcAa
/^[^a]{2,}?/8
\x{100}bca
/[^a]+/8ig
bcd
\x{100}aY\x{256}Z
/^[^a]{2}/8i
\x{100}bc
/^[^a]{2,}/8i
\x{100}bcAa
/^[^a]{2,}?/8i
\x{100}bca
/\x{100}{0,0}/8
abcd
/\x{100}?/8
abcd
\x{100}\x{100}
/\x{100}{0,3}/8
\x{100}\x{100}
\x{100}\x{100}\x{100}\x{100}
/\x{100}*/8
abce
\x{100}\x{100}\x{100}\x{100}
/\x{100}{1,1}/8
abcd\x{100}\x{100}\x{100}\x{100}
/\x{100}{1,3}/8
abcd\x{100}\x{100}\x{100}\x{100}
/\x{100}+/8
abcd\x{100}\x{100}\x{100}\x{100}
/\x{100}{3}/8
abcd\x{100}\x{100}\x{100}XX
/\x{100}{3,5}/8
abcd\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}XX
/\x{100}{3,}/8
abcd\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}XX
/(?<=a\x{100}{2}b)X/8+
Xyyya\x{100}\x{100}bXzzz
/\D*/8
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
/\D*/8
\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}
/\D/8
1X2
1\x{100}2
/>\S/8
> >X Y
> >\x{100} Y
/\W/8
A.B
A\x{100}B
/\d/8
\x{100}3
/\s/8
\x{100} X
/\w/8
\x{100}X
/\D+/8
12abcd34
*** Failers
1234
/\D{2,3}/8
12abcd34
12ab34
*** Failers
1234
12a34
/\D{2,3}?/8
12abcd34
12ab34
*** Failers
1234
12a34
/\d+/8
12abcd34
*** Failers
/\d{2,3}/8
12abcd34
1234abcd
*** Failers
1.4
/\d{2,3}?/8
12abcd34
1234abcd
*** Failers
1.4
/\S+/8
12abcd34
*** Failers
\ \
/\S{2,3}/8
12abcd34
1234abcd
*** Failers
\ \
/\S{2,3}?/8
12abcd34
1234abcd
*** Failers
\ \
/>\s+</8+
12> <34
*** Failers
/>\s{2,3}</8+
ab> <cd
ab> <ce
*** Failers
ab> <cd
/>\s{2,3}?</8+
ab> <cd
ab> <ce
*** Failers
ab> <cd
/\w+/8
12 34
*** Failers
+++=*!
/\w{2,3}/8
ab cd
abcd ce
*** Failers
a.b.c
/\w{2,3}?/8
ab cd
abcd ce
*** Failers
a.b.c
/\W+/8
12====34
*** Failers
abcd
/\W{2,3}/8
ab====cd
ab==cd
*** Failers
a.b.c
/\W{2,3}?/8
ab====cd
ab==cd
*** Failers
a.b.c
/[\x{100}]/8
\x{100}
Z\x{100}
\x{100}Z
*** Failers
/[Z\x{100}]/8
Z\x{100}
\x{100}
\x{100}Z
*** Failers
/[\x{100}\x{200}]/8
ab\x{100}cd
ab\x{200}cd
*** Failers
/[\x{100}-\x{200}]/8
ab\x{100}cd
ab\x{200}cd
ab\x{111}cd
*** Failers
/[z-\x{200}]/8
ab\x{100}cd
ab\x{200}cd
ab\x{111}cd
abzcd
ab|cd
*** Failers
/[Q\x{100}\x{200}]/8
ab\x{100}cd
ab\x{200}cd
Q?
*** Failers
/[Q\x{100}-\x{200}]/8
ab\x{100}cd
ab\x{200}cd
ab\x{111}cd
Q?
*** Failers
/[Qz-\x{200}]/8
ab\x{100}cd
ab\x{200}cd
ab\x{111}cd
abzcd
ab|cd
Q?
*** Failers
/[\x{100}\x{200}]{1,3}/8
ab\x{100}cd
ab\x{200}cd
ab\x{200}\x{100}\x{200}\x{100}cd
*** Failers
/[\x{100}\x{200}]{1,3}?/8
ab\x{100}cd
ab\x{200}cd
ab\x{200}\x{100}\x{200}\x{100}cd
*** Failers
/[Q\x{100}\x{200}]{1,3}/8
ab\x{100}cd
ab\x{200}cd
ab\x{200}\x{100}\x{200}\x{100}cd
*** Failers
/[Q\x{100}\x{200}]{1,3}?/8
ab\x{100}cd
ab\x{200}cd
ab\x{200}\x{100}\x{200}\x{100}cd
*** Failers
/(?<=[\x{100}\x{200}])X/8
abc\x{200}X
abc\x{100}X
*** Failers
X
/(?<=[Q\x{100}\x{200}])X/8
abc\x{200}X
abc\x{100}X
abQX
*** Failers
X
/(?<=[\x{100}\x{200}]{3})X/8
abc\x{100}\x{200}\x{100}X
*** Failers
abc\x{200}X
X
/[^\x{100}\x{200}]X/8
AX
\x{150}X
\x{500}X
*** Failers
\x{100}X
\x{200}X
/[^Q\x{100}\x{200}]X/8
AX
\x{150}X
\x{500}X
*** Failers
\x{100}X
\x{200}X
QX
/[^\x{100}-\x{200}]X/8
AX
\x{500}X
*** Failers
\x{100}X
\x{150}X
\x{200}X
/a\Cb/
aXb
a\nb
/a\Cb/8
aXb
a\nb
*** Failers
a\x{100}b
/[z-\x{100}]/8i
z
Z
\x{100}
*** Failers
\x{101}
y
/[\xFF]/
>\xff<
/[\xff]/8
>\x{ff}<
/[^\xFF]/
XYZ
/[^\xff]/8
XYZ
\x{123}
/^[ac]*b/8
xb
/^[ac\x{100}]*b/8
xb
/^[^x]*b/8i
xb
/^[^x]*b/8
xb
/^\d*b/8
xb
/(|a)/g8
catac
a\x{256}a
/ End of testinput4 /

View File

@ -1,258 +0,0 @@
/\x{100}/8DM
/\x{1000}/8DM
/\x{10000}/8DM
/\x{100000}/8DM
/\x{1000000}/8DM
/\x{4000000}/8DM
/\x{7fffFFFF}/8DM
/[\x{ff}]/8DM
/[\x{100}]/8DM
/\x{ffffffff}/8
/\x{100000000}/8
/^\x{100}a\x{1234}/8
\x{100}a\x{1234}bcd
/\x80/8D
/\xff/8D
/\x{0041}\x{2262}\x{0391}\x{002e}/D8
\x{0041}\x{2262}\x{0391}\x{002e}
/\x{D55c}\x{ad6d}\x{C5B4}/D8
\x{D55c}\x{ad6d}\x{C5B4}
/\x{65e5}\x{672c}\x{8a9e}/D8
\x{65e5}\x{672c}\x{8a9e}
/\x{80}/D8
/\x{084}/D8
/\x{104}/D8
/\x{861}/D8
/\x{212ab}/D8
/.{3,5}X/D8
\x{212ab}\x{212ab}\x{212ab}\x{861}X
/.{3,5}?/D8
\x{212ab}\x{212ab}\x{212ab}\x{861}
/-- These tests are here rather than in testinput4 because Perl 5.6 has --/
/-- some problems with UTF-8 support, in the area of \x{..} where the --/
/-- value is < 255. It grumbles about invalid UTF-8 strings. --/
/^[a\x{c0}]b/8
\x{c0}b
/^([a\x{c0}]*?)aa/8
a\x{c0}aaaa/
/^([a\x{c0}]*?)aa/8
a\x{c0}aaaa/
a\x{c0}a\x{c0}aaa/
/^([a\x{c0}]*)aa/8
a\x{c0}aaaa/
a\x{c0}a\x{c0}aaa/
/^([a\x{c0}]*)a\x{c0}/8
a\x{c0}aaaa/
a\x{c0}a\x{c0}aaa/
/-- --/
/(?<=\C)X/8
Should produce an error diagnostic
/-- This one is here not because it's different to Perl, but because the --/
/-- way the captured single-byte is displayed. (In Perl it becomes a --/
/-- character, and you can't tell the difference.) --/
/X(\C)(.*)/8
X\x{1234}
X\nabc
/^[ab]/8D
bar
*** Failers
c
\x{ff}
\x{100}
/^[^ab]/8D
c
\x{ff}
\x{100}
*** Failers
aaa
/[^ab\xC0-\xF0]/8SD
\x{f1}
\x{bf}
\x{100}
\x{1000}
*** Failers
\x{c0}
\x{f0}
/Ä€{3,4}/8SD
\x{100}\x{100}\x{100}\x{100\x{100}
/(\x{100}+|x)/8SD
/(\x{100}*a|x)/8SD
/(\x{100}{0,2}a|x)/8SD
/(\x{100}{1,2}a|x)/8SD
/\x{100}*(\d+|"(?1)")/8
1234
"1234"
\x{100}1234
"\x{100}1234"
\x{100}\x{100}12ab
\x{100}\x{100}"12"
*** Failers
\x{100}\x{100}abcd
/\x{100}/8D
/\x{100}*/8D
/a\x{100}*/8D
/ab\x{100}*/8D
/a\x{100}\x{101}*/8D
/a\x{100}\x{101}+/8D
/\x{100}*A/8D
A
/\x{100}*\d(?R)/8D
/[^\x{c4}]/D
/[^\x{c4}]/8D
/[\x{100}]/8DM
\x{100}
Z\x{100}
\x{100}Z
*** Failers
/[Z\x{100}]/8DM
Z\x{100}
\x{100}
\x{100}Z
*** Failers
/[\x{200}-\x{100}]/8
/[Ä€-Ä„]/8
\x{100}
\x{104}
*** Failers
\x{105}
\x{ff}
/[z-\x{100}]/8D
/[z-\x{100}]/8Di
/[z\Qa-d]Ä€\E]/8D
\x{100}
Ä€
/[\xFF]/D
>\xff<
/[\xff]/D8
>\x{ff}<
/[^\xFF]/D
/[^\xff]/8D
/[Ä-Ü]/8
Ö # Matches without Study
\x{d6}
/[Ä-Ü]/8S
Ö <-- Same with Study
\x{d6}
/[\x{c4}-\x{dc}]/8
Ö # Matches without Study
\x{d6}
/[\x{c4}-\x{dc}]/8S
Ö <-- Same with Study
\x{d6}
/[Ã]/8
/Ã/8
/ÃÃÃxxx/8
/ÃÃÃxxx/8?D
/abc/8
Ã]
Ã
ÃÃÃ
ÃÃÃ\?
/anything/8
\xc0\x80
\xc1\x8f
\xe0\x9f\x80
\xf0\x8f\x80\x80
\xf8\x87\x80\x80\x80
\xfc\x83\x80\x80\x80\x80
\xfe\x80\x80\x80\x80\x80
\xff\x80\x80\x80\x80\x80
\xc3\x8f
\xe0\xaf\x80
\xe1\x80\x80
\xf0\x9f\x80\x80
\xf1\x8f\x80\x80
\xf8\x88\x80\x80\x80
\xf9\x87\x80\x80\x80
\xfc\x84\x80\x80\x80\x80
\xfd\x83\x80\x80\x80\x80
/\x{100}abc(xyz(?1))/8D
/[^\x{100}]abc(xyz(?1))/8D
/[ab\x{100}]abc(xyz(?1))/8D
/(\x{100}(b(?2)c))?/D8
/(\x{100}(b(?2)c)){0,2}/D8
/(\x{100}(b(?1)c))?/D8
/(\x{100}(b(?1)c)){0,2}/D8
/ End of testinput5 /

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,115 +0,0 @@
PCRE version 4.5 01-December-2003
/^[\w]+/
*** Failers
No match
École
No match
/^[\w]+/Lfr_FR
École
0: École
/^[\w]+/
*** Failers
No match
École
No match
/^[\W]+/
École
0: \xc9
/^[\W]+/Lfr_FR
*** Failers
0: ***
École
No match
/[\b]/
\b
0: \x08
*** Failers
No match
a
No match
/[\b]/Lfr_FR
\b
0: \x08
*** Failers
No match
a
No match
/^\w+/
*** Failers
No match
École
No match
/^\w+/Lfr_FR
École
0: École
/(.+)\b(.+)/
École
0: \xc9cole
1: \xc9
2: cole
/(.+)\b(.+)/Lfr_FR
*** Failers
0: *** Failers
1: ***
2: Failers
École
No match
/École/i
École
0: \xc9cole
*** Failers
No match
école
No match
/École/iLfr_FR
École
0: École
école
0: école
/\w/IS
Capturing subpattern count = 0
No options
No first char
No need char
Starting character set: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
/\w/ISLfr_FR
Capturing subpattern count = 0
No options
No first char
No need char
Starting character set: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
µ À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ğ Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü İ Ş ß à á â ã ä
å æ ç è é ê ë ì í î ï ğ ñ ò ó ô õ ö ø ù ú û ü ı ş ÿ
/^[\xc8-\xc9]/iLfr_FR
École
0: É
école
0: é
/^[\xc8-\xc9]/Lfr_FR
École
0: É
*** Failers
No match
école
No match
/ End of testinput3 /

View File

@ -1,909 +0,0 @@
PCRE version 4.5 01-December-2003
/-- Do not use the \x{} construct except with patterns that have the --/
/-- /8 option set, because PCRE doesn't recognize them as UTF-8 unless --/
No match
/-- that option is set. However, the latest Perls recognize them always. --/
No match
/a.b/8
acb
0: acb
a\x7fb
0: a\x{7f}b
a\x{100}b
0: a\x{100}b
*** Failers
No match
a\nb
No match
/a(.{3})b/8
a\x{4000}xyb
0: a\x{4000}xyb
1: \x{4000}xy
a\x{4000}\x7fyb
0: a\x{4000}\x{7f}yb
1: \x{4000}\x{7f}y
a\x{4000}\x{100}yb
0: a\x{4000}\x{100}yb
1: \x{4000}\x{100}y
*** Failers
No match
a\x{4000}b
No match
ac\ncb
No match
/a(.*?)(.)/
a\xc0\x88b
0: a\xc0
1:
2: \xc0
/a(.*?)(.)/8
a\x{100}b
0: a\x{100}
1:
2: \x{100}
/a(.*)(.)/
a\xc0\x88b
0: a\xc0\x88b
1: \xc0\x88
2: b
/a(.*)(.)/8
a\x{100}b
0: a\x{100}b
1: \x{100}
2: b
/a(.)(.)/
a\xc0\x92bcd
0: a\xc0\x92
1: \xc0
2: \x92
/a(.)(.)/8
a\x{240}bcd
0: a\x{240}b
1: \x{240}
2: b
/a(.?)(.)/
a\xc0\x92bcd
0: a\xc0\x92
1: \xc0
2: \x92
/a(.?)(.)/8
a\x{240}bcd
0: a\x{240}b
1: \x{240}
2: b
/a(.??)(.)/
a\xc0\x92bcd
0: a\xc0
1:
2: \xc0
/a(.??)(.)/8
a\x{240}bcd
0: a\x{240}
1:
2: \x{240}
/a(.{3})b/8
a\x{1234}xyb
0: a\x{1234}xyb
1: \x{1234}xy
a\x{1234}\x{4321}yb
0: a\x{1234}\x{4321}yb
1: \x{1234}\x{4321}y
a\x{1234}\x{4321}\x{3412}b
0: a\x{1234}\x{4321}\x{3412}b
1: \x{1234}\x{4321}\x{3412}
*** Failers
No match
a\x{1234}b
No match
ac\ncb
No match
/a(.{3,})b/8
a\x{1234}xyb
0: a\x{1234}xyb
1: \x{1234}xy
a\x{1234}\x{4321}yb
0: a\x{1234}\x{4321}yb
1: \x{1234}\x{4321}y
a\x{1234}\x{4321}\x{3412}b
0: a\x{1234}\x{4321}\x{3412}b
1: \x{1234}\x{4321}\x{3412}
axxxxbcdefghijb
0: axxxxbcdefghijb
1: xxxxbcdefghij
a\x{1234}\x{4321}\x{3412}\x{3421}b
0: a\x{1234}\x{4321}\x{3412}\x{3421}b
1: \x{1234}\x{4321}\x{3412}\x{3421}
*** Failers
No match
a\x{1234}b
No match
/a(.{3,}?)b/8
a\x{1234}xyb
0: a\x{1234}xyb
1: \x{1234}xy
a\x{1234}\x{4321}yb
0: a\x{1234}\x{4321}yb
1: \x{1234}\x{4321}y
a\x{1234}\x{4321}\x{3412}b
0: a\x{1234}\x{4321}\x{3412}b
1: \x{1234}\x{4321}\x{3412}
axxxxbcdefghijb
0: axxxxb
1: xxxx
a\x{1234}\x{4321}\x{3412}\x{3421}b
0: a\x{1234}\x{4321}\x{3412}\x{3421}b
1: \x{1234}\x{4321}\x{3412}\x{3421}
*** Failers
No match
a\x{1234}b
No match
/a(.{3,5})b/8
a\x{1234}xyb
0: a\x{1234}xyb
1: \x{1234}xy
a\x{1234}\x{4321}yb
0: a\x{1234}\x{4321}yb
1: \x{1234}\x{4321}y
a\x{1234}\x{4321}\x{3412}b
0: a\x{1234}\x{4321}\x{3412}b
1: \x{1234}\x{4321}\x{3412}
axxxxbcdefghijb
0: axxxxb
1: xxxx
a\x{1234}\x{4321}\x{3412}\x{3421}b
0: a\x{1234}\x{4321}\x{3412}\x{3421}b
1: \x{1234}\x{4321}\x{3412}\x{3421}
axbxxbcdefghijb
0: axbxxb
1: xbxx
axxxxxbcdefghijb
0: axxxxxb
1: xxxxx
*** Failers
No match
a\x{1234}b
No match
axxxxxxbcdefghijb
No match
/a(.{3,5}?)b/8
a\x{1234}xyb
0: a\x{1234}xyb
1: \x{1234}xy
a\x{1234}\x{4321}yb
0: a\x{1234}\x{4321}yb
1: \x{1234}\x{4321}y
a\x{1234}\x{4321}\x{3412}b
0: a\x{1234}\x{4321}\x{3412}b
1: \x{1234}\x{4321}\x{3412}
axxxxbcdefghijb
0: axxxxb
1: xxxx
a\x{1234}\x{4321}\x{3412}\x{3421}b
0: a\x{1234}\x{4321}\x{3412}\x{3421}b
1: \x{1234}\x{4321}\x{3412}\x{3421}
axbxxbcdefghijb
0: axbxxb
1: xbxx
axxxxxbcdefghijb
0: axxxxxb
1: xxxxx
*** Failers
No match
a\x{1234}b
No match
axxxxxxbcdefghijb
No match
/^[a\x{c0}]/8
*** Failers
No match
\x{100}
No match
/(?<=aXb)cd/8
aXbcd
0: cd
/(?<=a\x{100}b)cd/8
a\x{100}bcd
0: cd
/(?<=a\x{100000}b)cd/8
a\x{100000}bcd
0: cd
/(?:\x{100}){3}b/8
\x{100}\x{100}\x{100}b
0: \x{100}\x{100}\x{100}b
*** Failers
No match
\x{100}\x{100}b
No match
/\x{ab}/8
\x{ab}
0: \x{ab}
\xc2\xab
0: \x{ab}
*** Failers
No match
\x00{ab}
No match
/(?<=(.))X/8
WXYZ
0: X
1: W
\x{256}XYZ
0: X
1: \x{256}
*** Failers
No match
XYZ
No match
/X(\C{3})/8
X\x{1234}
0: X\x{1234}
1: \x{1234}
/X(\C{4})/8
X\x{1234}YZ
0: X\x{1234}Y
1: \x{1234}Y
/X\C*/8
XYZabcdce
0: XYZabcdce
/X\C*?/8
XYZabcde
0: X
/X\C{3,5}/8
Xabcdefg
0: Xabcde
X\x{1234}
0: X\x{1234}
X\x{1234}YZ
0: X\x{1234}YZ
X\x{1234}\x{512}
0: X\x{1234}\x{512}
X\x{1234}\x{512}YZ
0: X\x{1234}\x{512}
/X\C{3,5}?/8
Xabcdefg
0: Xabc
X\x{1234}
0: X\x{1234}
X\x{1234}YZ
0: X\x{1234}
X\x{1234}\x{512}
0: X\x{1234}
/[^a]+/8g
bcd
0: bcd
\x{100}aY\x{256}Z
0: \x{100}
0: Y\x{256}Z
/^[^a]{2}/8
\x{100}bc
0: \x{100}b
/^[^a]{2,}/8
\x{100}bcAa
0: \x{100}bcA
/^[^a]{2,}?/8
\x{100}bca
0: \x{100}b
/[^a]+/8ig
bcd
0: bcd
\x{100}aY\x{256}Z
0: \x{100}
0: Y\x{256}Z
/^[^a]{2}/8i
\x{100}bc
0: \x{100}b
/^[^a]{2,}/8i
\x{100}bcAa
0: \x{100}bc
/^[^a]{2,}?/8i
\x{100}bca
0: \x{100}b
/\x{100}{0,0}/8
abcd
0:
/\x{100}?/8
abcd
0:
\x{100}\x{100}
0: \x{100}
/\x{100}{0,3}/8
\x{100}\x{100}
0: \x{100}\x{100}
\x{100}\x{100}\x{100}\x{100}
0: \x{100}\x{100}\x{100}
/\x{100}*/8
abce
0:
\x{100}\x{100}\x{100}\x{100}
0: \x{100}\x{100}\x{100}\x{100}
/\x{100}{1,1}/8
abcd\x{100}\x{100}\x{100}\x{100}
0: \x{100}
/\x{100}{1,3}/8
abcd\x{100}\x{100}\x{100}\x{100}
0: \x{100}\x{100}\x{100}
/\x{100}+/8
abcd\x{100}\x{100}\x{100}\x{100}
0: \x{100}\x{100}\x{100}\x{100}
/\x{100}{3}/8
abcd\x{100}\x{100}\x{100}XX
0: \x{100}\x{100}\x{100}
/\x{100}{3,5}/8
abcd\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}XX
0: \x{100}\x{100}\x{100}\x{100}\x{100}
/\x{100}{3,}/8
abcd\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}XX
0: \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}
/(?<=a\x{100}{2}b)X/8+
Xyyya\x{100}\x{100}bXzzz
0: X
0+ zzz
/\D*/8
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
/\D*/8
\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}
0: \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}
/\D/8
1X2
0: X
1\x{100}2
0: \x{100}
/>\S/8
> >X Y
0: >X
> >\x{100} Y
0: >\x{100}
/\W/8
A.B
0: .
A\x{100}B
0: \x{100}
/\d/8
\x{100}3
0: 3
/\s/8
\x{100} X
0:
/\w/8
\x{100}X
0: X
/\D+/8
12abcd34
0: abcd
*** Failers
0: *** Failers
1234
No match
/\D{2,3}/8
12abcd34
0: abc
12ab34
0: ab
*** Failers
0: ***
1234
No match
12a34
No match
/\D{2,3}?/8
12abcd34
0: ab
12ab34
0: ab
*** Failers
0: **
1234
No match
12a34
No match
/\d+/8
12abcd34
0: 12
*** Failers
No match
/\d{2,3}/8
12abcd34
0: 12
1234abcd
0: 123
*** Failers
No match
1.4
No match
/\d{2,3}?/8
12abcd34
0: 12
1234abcd
0: 12
*** Failers
No match
1.4
No match
/\S+/8
12abcd34
0: 12abcd34
*** Failers
0: ***
\ \
No match
/\S{2,3}/8
12abcd34
0: 12a
1234abcd
0: 123
*** Failers
0: ***
\ \
No match
/\S{2,3}?/8
12abcd34
0: 12
1234abcd
0: 12
*** Failers
0: **
\ \
No match
/>\s+</8+
12> <34
0: > <
0+ 34
*** Failers
No match
/>\s{2,3}</8+
ab> <cd
0: > <
0+ cd
ab> <ce
0: > <
0+ ce
*** Failers
No match
ab> <cd
No match
/>\s{2,3}?</8+
ab> <cd
0: > <
0+ cd
ab> <ce
0: > <
0+ ce
*** Failers
No match
ab> <cd
No match
/\w+/8
12 34
0: 12
*** Failers
0: Failers
+++=*!
No match
/\w{2,3}/8
ab cd
0: ab
abcd ce
0: abc
*** Failers
0: Fai
a.b.c
No match
/\w{2,3}?/8
ab cd
0: ab
abcd ce
0: ab
*** Failers
0: Fa
a.b.c
No match
/\W+/8
12====34
0: ====
*** Failers
0: ***
abcd
No match
/\W{2,3}/8
ab====cd
0: ===
ab==cd
0: ==
*** Failers
0: ***
a.b.c
No match
/\W{2,3}?/8
ab====cd
0: ==
ab==cd
0: ==
*** Failers
0: **
a.b.c
No match
/[\x{100}]/8
\x{100}
0: \x{100}
Z\x{100}
0: \x{100}
\x{100}Z
0: \x{100}
*** Failers
No match
/[Z\x{100}]/8
Z\x{100}
0: Z
\x{100}
0: \x{100}
\x{100}Z
0: \x{100}
*** Failers
No match
/[\x{100}\x{200}]/8
ab\x{100}cd
0: \x{100}
ab\x{200}cd
0: \x{200}
*** Failers
No match
/[\x{100}-\x{200}]/8
ab\x{100}cd
0: \x{100}
ab\x{200}cd
0: \x{200}
ab\x{111}cd
0: \x{111}
*** Failers
No match
/[z-\x{200}]/8
ab\x{100}cd
0: \x{100}
ab\x{200}cd
0: \x{200}
ab\x{111}cd
0: \x{111}
abzcd
0: z
ab|cd
0: |
*** Failers
No match
/[Q\x{100}\x{200}]/8
ab\x{100}cd
0: \x{100}
ab\x{200}cd
0: \x{200}
Q?
0: Q
*** Failers
No match
/[Q\x{100}-\x{200}]/8
ab\x{100}cd
0: \x{100}
ab\x{200}cd
0: \x{200}
ab\x{111}cd
0: \x{111}
Q?
0: Q
*** Failers
No match
/[Qz-\x{200}]/8
ab\x{100}cd
0: \x{100}
ab\x{200}cd
0: \x{200}
ab\x{111}cd
0: \x{111}
abzcd
0: z
ab|cd
0: |
Q?
0: Q
*** Failers
No match
/[\x{100}\x{200}]{1,3}/8
ab\x{100}cd
0: \x{100}
ab\x{200}cd
0: \x{200}
ab\x{200}\x{100}\x{200}\x{100}cd
0: \x{200}\x{100}\x{200}
*** Failers
No match
/[\x{100}\x{200}]{1,3}?/8
ab\x{100}cd
0: \x{100}
ab\x{200}cd
0: \x{200}
ab\x{200}\x{100}\x{200}\x{100}cd
0: \x{200}
*** Failers
No match
/[Q\x{100}\x{200}]{1,3}/8
ab\x{100}cd
0: \x{100}
ab\x{200}cd
0: \x{200}
ab\x{200}\x{100}\x{200}\x{100}cd
0: \x{200}\x{100}\x{200}
*** Failers
No match
/[Q\x{100}\x{200}]{1,3}?/8
ab\x{100}cd
0: \x{100}
ab\x{200}cd
0: \x{200}
ab\x{200}\x{100}\x{200}\x{100}cd
0: \x{200}
*** Failers
No match
/(?<=[\x{100}\x{200}])X/8
abc\x{200}X
0: X
abc\x{100}X
0: X
*** Failers
No match
X
No match
/(?<=[Q\x{100}\x{200}])X/8
abc\x{200}X
0: X
abc\x{100}X
0: X
abQX
0: X
*** Failers
No match
X
No match
/(?<=[\x{100}\x{200}]{3})X/8
abc\x{100}\x{200}\x{100}X
0: X
*** Failers
No match
abc\x{200}X
No match
X
No match
/[^\x{100}\x{200}]X/8
AX
0: AX
\x{150}X
0: \x{150}X
\x{500}X
0: \x{500}X
*** Failers
No match
\x{100}X
No match
\x{200}X
No match
/[^Q\x{100}\x{200}]X/8
AX
0: AX
\x{150}X
0: \x{150}X
\x{500}X
0: \x{500}X
*** Failers
No match
\x{100}X
No match
\x{200}X
No match
QX
No match
/[^\x{100}-\x{200}]X/8
AX
0: AX
\x{500}X
0: \x{500}X
*** Failers
No match
\x{100}X
No match
\x{150}X
No match
\x{200}X
No match
/a\Cb/
aXb
0: aXb
a\nb
0: a\x0ab
/a\Cb/8
aXb
0: aXb
a\nb
0: a\x{0a}b
*** Failers
No match
a\x{100}b
No match
/[z-\x{100}]/8i
z
0: z
Z
0: Z
\x{100}
0: \x{100}
*** Failers
No match
\x{101}
No match
y
No match
/[\xFF]/
>\xff<
0: \xff
/[\xff]/8
>\x{ff}<
0: \x{ff}
/[^\xFF]/
XYZ
0: X
/[^\xff]/8
XYZ
0: X
\x{123}
0: \x{123}
/^[ac]*b/8
xb
No match
/^[ac\x{100}]*b/8
xb
No match
/^[^x]*b/8i
xb
No match
/^[^x]*b/8
xb
No match
/^\d*b/8
xb
No match
/(|a)/g8
catac
0:
1:
0:
1:
0: a
1: a
0:
1:
0:
1:
0: a
1: a
0:
1:
0:
1:
a\x{256}a
0:
1:
0: a
1: a
0:
1:
0:
1:
0: a
1: a
0:
1:
/ End of testinput4 /

File diff suppressed because it is too large Load Diff