Documentation

UriString
in package

A class to parse a URI string according to RFC3986.

Tags
link
https://tools.ietf.org/html/rfc3986
author

Ignace Nyamagana Butera nyamsprod@gmail.com

since
6.0.0

Table of Contents

REGEXP_HOST_PORT  = ',^(?<host>\[.*\]|[^:]*)(:(?<port>.*))?$,'
Host and Port splitter regular expression.
REGEXP_IDN_PATTERN  = '/[^\x20-\x7f]/'
IDN Host detector regular expression.
REGEXP_INVALID_HOST_CHARS  = '/ [:\/?#\[\]@ ] # gen-delims characters as well as the space character /ix'
Invalid characters in host regular expression.
REGEXP_INVALID_PATH  = ',^(([^/]*):)(.*)?/,'
Invalid path for URI without scheme and authority regular expression.
REGEXP_INVALID_URI_CHARS  = '/[\x00-\x1f\x7f]/'
Range of invalid characters in URI string.
REGEXP_IP_FUTURE  = '/^ v(?<version>[A-F0-9])+\. (?: (?<unreserved>[a-z0-9_~\-\.])| (?<sub_delims>[!$&'()*+,;=:]) # also include the : character )+ $/ix'
IPvFuture regular expression.
REGEXP_REGISTERED_NAME  = '/(?(DEFINE) (?<unreserved>[a-z0-9_~\-]) # . is missing as it is used to separate labels (?<sub_delims>[!$&'()*+,;=]) (?<encoded>%[A-F0-9]{2}) (?<reg_name>(?:(?&unreserved)|(?&sub_delims)|(?&encoded))*) ) ^(?:(?&reg_name)\.)*(?&reg_name)\.?$/ix'
General registered name regular expression.
REGEXP_URI_PARTS  = ',^ (?<scheme>(?<scontent>[^:/?\#]+):)? # URI scheme component (?<authority>//(?<acontent>[^/?\#]*))? # URI authority part (?<path>[^?\#]*) # URI path component (?<query>\?(?<qcontent>[^\#]*))? # URI query component (?<fragment>\#(?<fcontent>.*))? # URI fragment component ,x'
RFC3986 regular expression URI splitter.
REGEXP_URI_SCHEME  = '/^([a-z][a-z\d\+\.\-]*)?$/i'
URI scheme regular expresssion.
URI_COMPONENTS  = ['scheme' => null, 'user' => null, 'pass' => null, 'host' => null, 'port' => null, 'path' => '', 'query' => null, 'fragment' => null]
Default URI component values.
URI_SCHORTCUTS  = ['' => [], '#' => ['fragment' => ''], '?' => ['query' => ''], '?#' => ['query' => '', 'fragment' => ''], '/' => ['path' => '/'], '//' => ['host' => '']]
Simple URI which do not need any parsing.
ZONE_ID_ADDRESS_BLOCK  = "��"
Only the address block fe80::/10 can have a Zone ID attach to let's detect the link local significant 10 bits.
build()  : string
Generate an URI string representation from its parsed representation returned by League\UriString::parse() or PHP's parse_url.
parse()  : array{scheme: ?string, user: ?string, pass: ?string, host: ?string, port: ?int, path: string, query: ?string, fragment: ?string}
Parse an URI string into its components.
filterHost()  : string
Returns whether a hostname is valid.
filterPort()  : int|null
Filter and format the port component.
filterRegisteredName()  : string
Returns whether the host is an IPv4 or a registered named.
isIpHost()  : bool
Validates a IPv6/IPvfuture host.
parseAuthority()  : array{user: ?string, pass: ?string, host: ?string, port: ?int}
Parses the URI authority part.

Constants

REGEXP_HOST_PORT

Host and Port splitter regular expression.

private mixed REGEXP_HOST_PORT = ',^(?<host>\[.*\]|[^:]*)(:(?<port>.*))?$,'

REGEXP_IDN_PATTERN

IDN Host detector regular expression.

private mixed REGEXP_IDN_PATTERN = '/[^\x20-\x7f]/'

REGEXP_INVALID_URI_CHARS

Range of invalid characters in URI string.

private mixed REGEXP_INVALID_URI_CHARS = '/[\x00-\x1f\x7f]/'

REGEXP_REGISTERED_NAME

General registered name regular expression.

private mixed REGEXP_REGISTERED_NAME = '/(?(DEFINE) (?<unreserved>[a-z0-9_~\-]) # . is missing as it is used to separate labels (?<sub_delims>[!$&'()*+,;=]) (?<encoded>%[A-F0-9]{2}) (?<reg_name>(?:(?&unreserved)|(?&sub_delims)|(?&encoded))*) ) ^(?:(?&reg_name)\.)*(?&reg_name)\.?$/ix'
Tags
link
https://tools.ietf.org/html/rfc3986#section-3.2.2

REGEXP_URI_PARTS

RFC3986 regular expression URI splitter.

private mixed REGEXP_URI_PARTS = ',^ (?<scheme>(?<scontent>[^:/?\#]+):)? # URI scheme component (?<authority>//(?<acontent>[^/?\#]*))? # URI authority part (?<path>[^?\#]*) # URI path component (?<query>\?(?<qcontent>[^\#]*))? # URI query component (?<fragment>\#(?<fcontent>.*))? # URI fragment component ,x'
Tags
link
https://tools.ietf.org/html/rfc3986#appendix-B

URI_COMPONENTS

Default URI component values.

private mixed URI_COMPONENTS = ['scheme' => null, 'user' => null, 'pass' => null, 'host' => null, 'port' => null, 'path' => '', 'query' => null, 'fragment' => null]

URI_SCHORTCUTS

Simple URI which do not need any parsing.

private mixed URI_SCHORTCUTS = ['' => [], '#' => ['fragment' => ''], '?' => ['query' => ''], '?#' => ['query' => '', 'fragment' => ''], '/' => ['path' => '/'], '//' => ['host' => '']]

ZONE_ID_ADDRESS_BLOCK

Only the address block fe80::/10 can have a Zone ID attach to let's detect the link local significant 10 bits.

private mixed ZONE_ID_ADDRESS_BLOCK = "��"

Methods

build()

Generate an URI string representation from its parsed representation returned by League\UriString::parse() or PHP's parse_url.

public static build(array{scheme: ?string, user: ?string, pass: ?string, host: ?string, port: ?int, path: ?string, query: ?string, fragment: ?string} $components) : string

If you supply your own array, you are responsible for providing valid components without their URI delimiters.

Parameters
$components : array{scheme: ?string, user: ?string, pass: ?string, host: ?string, port: ?int, path: ?string, query: ?string, fragment: ?string}
Tags
link
https://tools.ietf.org/html/rfc3986#section-5.3
link
https://tools.ietf.org/html/rfc3986#section-7.5
Return values
string

parse()

Parse an URI string into its components.

public static parse(mixed $uri) : array{scheme: ?string, user: ?string, pass: ?string, host: ?string, port: ?int, path: string, query: ?string, fragment: ?string}

This method parses a URI and returns an associative array containing any of the various components of the URI that are present.

$components = (new Parser())->parse('http://foo@test.example.com:42?query#'); var_export($components); //will display array( 'scheme' => 'http', // the URI scheme component 'user' => 'foo', // the URI user component 'pass' => null, // the URI pass component 'host' => 'test.example.com', // the URI host component 'port' => 42, // the URI port component 'path' => '', // the URI path component 'query' => 'query', // the URI query component 'fragment' => '', // the URI fragment component );

The returned array is similar to PHP's parse_url return value with the following differences:

  • All components are always present in the returned array
  • Empty and undefined component are treated differently. And empty component is set to the empty string while an undefined component is set to the `null` value.
  • The path component is never undefined
  • The method parses the URI following the RFC3986 rules but you are still required to validate the returned components against its related scheme specific rules.
Parameters
$uri : mixed

any scalar or stringable object

Tags
link
https://tools.ietf.org/html/rfc3986
throws
SyntaxError

if the URI contains invalid characters

throws
SyntaxError

if the URI contains an invalid scheme

throws
SyntaxError

if the URI contains an invalid path

Return values
array{scheme: ?string, user: ?string, pass: ?string, host: ?string, port: ?int, path: string, query: ?string, fragment: ?string}

parseAuthority()

Parses the URI authority part.

private static parseAuthority(string $authority) : array{user: ?string, pass: ?string, host: ?string, port: ?int}
Parameters
$authority : string
Tags
link
https://tools.ietf.org/html/rfc3986#section-3.2
throws
SyntaxError

If the port component is invalid

Return values
array{user: ?string, pass: ?string, host: ?string, port: ?int}

Search results