public class URI extends Object
URI character sequence: char octet sequence: byte original character sequence: String
So, a URI is a sequence of characters as an array of a char type, which is not always represented as a sequence of octets as an array of byte. URI Syntactic Components
- In general, written as follows: Absolute URI = <scheme>:<scheme-specific-part> Generic URI = <scheme>://<authority><path>?<query> - Syntax absoluteURI = scheme ":" ( hier_part | opaque_part ) hier_part = ( net_path | abs_path ) [ "?" query ] net_path = "//" authority [ abs_path ] abs_path = "/" path_segments
The following examples illustrate URI that are in common use.
ftp://ftp.is.co.za/rfc/rfc1808.txt -- ftp scheme for File Transfer Protocol services gopher://spinaltap.micro.umn.edu/00/Weather/California/Los%20Angeles -- gopher scheme for Gopher and Gopher+ Protocol services http://www.math.uio.no/faq/compression-faq/part1.html -- http scheme for Hypertext Transfer Protocol services mailto:mduerst@ifi.unizh.ch -- mailto scheme for electronic mail addresses news:comp.infosystems.www.servers.unix -- news scheme for USENET news groups and articles telnet://melvyl.ucop.edu/ -- telnet scheme for interactive services via the TELNET ProtocolPlease, notice that there are many modifications from URL(RFC 1738) and relative URL(RFC 1808). The expressions for a URI
For escaped URI forms - URI(char[]) // constructor - char[] getRawXxx() // method - String getEscapedXxx() // method - String toString() // method For unescaped URI forms - URI(String) // constructor - String getXXX() // method
Modifier and Type | Field | Description |
---|---|---|
protected static BitSet |
abs_path |
URI absolute path.
|
protected static BitSet |
absoluteURI |
BitSet for absoluteURI.
|
static BitSet |
allowed_abs_path |
Those characters that are allowed for the abs_path.
|
static BitSet |
allowed_authority |
Those characters that are allowed for the authority component.
|
static BitSet |
allowed_fragment |
Those characters that are allowed for the fragment component.
|
static BitSet |
allowed_host |
Those characters that are allowed for the host component.
|
static BitSet |
allowed_IPv6reference |
Those characters that are allowed for the IPv6reference component.
|
static BitSet |
allowed_opaque_part |
Those characters that are allowed for the opaque_part.
|
static BitSet |
allowed_query |
Those characters that are allowed for the query component.
|
static BitSet |
allowed_reg_name |
Those characters that are allowed for the reg_name.
|
static BitSet |
allowed_rel_path |
Those characters that are allowed for the rel_path.
|
static BitSet |
allowed_userinfo |
Those characters that are allowed for the userinfo component.
|
static BitSet |
allowed_within_authority |
Those characters that are allowed for the authority component.
|
static BitSet |
allowed_within_path |
Those characters that are allowed within the path.
|
static BitSet |
allowed_within_query |
Those characters that are allowed within the query component.
|
static BitSet |
allowed_within_userinfo |
Those characters that are allowed for within the userinfo component.
|
protected static BitSet |
alpha |
BitSet for alpha.
|
protected static BitSet |
alphanum |
BitSet for alphanum (join of alpha & digit).
|
protected static BitSet |
authority |
BitSet for authority.
|
static BitSet |
control |
BitSet for control.
|
static BitSet |
delims |
BitSet for delims.
|
protected static BitSet |
digit |
BitSet for digit.
|
static BitSet |
disallowed_opaque_part |
Disallowed opaque_part before escaping.
|
static BitSet |
disallowed_rel_path |
Disallowed rel_path before escaping.
|
protected static BitSet |
escaped |
BitSet for escaped.
|
protected static BitSet |
fragment |
BitSet for fragment (alias for uric).
|
protected static BitSet |
hex |
BitSet for hex.
|
protected static BitSet |
hier_part |
BitSet for hier_part.
|
protected static BitSet |
host |
BitSet for host.
|
protected static BitSet |
hostname |
BitSet for hostname.
|
protected static BitSet |
hostport |
BitSet for hostport.
|
protected static BitSet |
IPv4address |
Bitset that combines digit and dot fo IPv$address.
|
protected static BitSet |
IPv6address |
RFC 2373.
|
protected static BitSet |
IPv6reference |
RFC 2732, 2373.
|
protected static BitSet |
mark |
BitSet for mark.
|
protected static BitSet |
net_path |
BitSet for net_path.
|
protected static BitSet |
opaque_part |
URI bitset that combines uric_no_slash and uric.
|
protected static BitSet |
param |
BitSet for param (alias for pchar).
|
protected static BitSet |
path |
URI bitset that combines absolute path and opaque part.
|
protected static BitSet |
path_segments |
BitSet for path segments.
|
protected static BitSet |
pchar |
BitSet for pchar.
|
protected static BitSet |
percent |
The percent "%" character always has the reserved purpose of being the
escape indicator, it must be escaped as "%25" in order to be used as
data within a URI.
|
protected static BitSet |
port |
Port, a logical alias for digit.
|
protected static BitSet |
query |
BitSet for query (alias for uric).
|
protected static BitSet |
reg_name |
BitSet for reg_name.
|
protected static BitSet |
rel_path |
BitSet for rel_path.
|
protected static BitSet |
rel_segment |
BitSet for rel_segment.
|
protected static BitSet |
relativeURI |
BitSet for relativeURI.
|
protected static BitSet |
reserved |
BitSet for reserved.
|
protected static BitSet |
scheme |
BitSet for scheme.
|
protected static BitSet |
segment |
BitSet for segment.
|
protected static BitSet |
server |
Bitset for server.
|
static BitSet |
space |
BitSet for space.
|
protected static BitSet |
toplabel |
BitSet for toplabel.
|
protected static BitSet |
unreserved |
Data characters that are allowed in a URI but do not have a reserved
purpose are called unreserved.
|
static BitSet |
unwise |
BitSet for unwise.
|
protected static BitSet |
URI_reference |
BitSet for URI-reference.
|
protected static BitSet |
uric |
BitSet for uric.
|
protected static BitSet |
uric_no_slash |
URI bitset for encoding typical non-slash characters.
|
protected static BitSet |
userinfo |
Bitset for userinfo.
|
static BitSet |
within_userinfo |
BitSet for within the userinfo component like user and password.
|
Constructor | Description |
---|---|
URI() |
Modifier and Type | Method | Description |
---|---|---|
protected static String |
decode(char[] component,
String charset) |
Decodes URI encoded string.
|
protected static String |
decode(String component,
String charset) |
Decodes URI encoded string.
|
protected static char[] |
encode(String original,
BitSet allowed,
String charset) |
Encodes URI string.
|
public static final BitSet within_userinfo
public static final BitSet control
public static final BitSet space
public static final BitSet delims
public static final BitSet unwise
public static final BitSet disallowed_rel_path
public static final BitSet disallowed_opaque_part
public static final BitSet allowed_authority
public static final BitSet allowed_opaque_part
public static final BitSet allowed_reg_name
public static final BitSet allowed_userinfo
public static final BitSet allowed_within_userinfo
public static final BitSet allowed_IPv6reference
public static final BitSet allowed_host
public static final BitSet allowed_within_authority
public static final BitSet allowed_abs_path
public static final BitSet allowed_rel_path
public static final BitSet allowed_within_path
public static final BitSet allowed_query
public static final BitSet allowed_within_query
public static final BitSet allowed_fragment
protected static final BitSet percent
protected static final BitSet digit
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
protected static final BitSet alpha
alpha = lowalpha | upalpha
protected static final BitSet alphanum
alphanum = alpha | digit
protected static final BitSet hex
hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" | "c" | "d" | "e" | "f"
protected static final BitSet escaped
escaped = "%" hex hex
protected static final BitSet mark
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
protected static final BitSet unreserved
unreserved = alphanum | mark
protected static final BitSet reserved
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","
protected static final BitSet uric
uric = reserved | unreserved | escaped
protected static final BitSet fragment
fragment = *uric
protected static final BitSet query
query = *uric
protected static final BitSet pchar
pchar = unreserved | escaped | ":" | "@" | "&" | "=" | "+" | "$" | ","
protected static final BitSet param
param = *pchar
protected static final BitSet segment
segment = *pchar *( ";" param )
protected static final BitSet path_segments
path_segments = segment *( "/" segment )
protected static final BitSet abs_path
abs_path = "/" path_segments
protected static final BitSet uric_no_slash
uric_no_slash = unreserved | escaped | ";" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","
protected static final BitSet opaque_part
opaque_part = uric_no_slash *uric
protected static final BitSet path
path = [ abs_path | opaque_part ]
protected static final BitSet port
protected static final BitSet IPv4address
IPv4address = 1*digit "." 1*digit "." 1*digit "." 1*digit
protected static final BitSet IPv6address
IPv6address = hexpart [ ":" IPv4address ]
protected static final BitSet IPv6reference
IPv6reference = "[" IPv6address "]"
protected static final BitSet toplabel
toplabel = alpha | alpha *( alphanum | "-" ) alphanum
protected static final BitSet hostname
hostname = *( domainlabel "." ) toplabel [ "." ]
protected static final BitSet host
host = hostname | IPv4address | IPv6reference
protected static final BitSet hostport
hostport = host [ ":" port ]
protected static final BitSet userinfo
userinfo = *( unreserved | escaped | ";" | ":" | "&" | "=" | "+" | "$" | "," )
protected static final BitSet server
server = [ [ userinfo "@" ] hostport ]
protected static final BitSet reg_name
reg_name = 1*( unreserved | escaped | "$" | "," | ";" | ":" | "@" | "&" | "=" | "+" )
protected static final BitSet authority
authority = server | reg_name
protected static final BitSet scheme
scheme = alpha *( alpha | digit | "+" | "-" | "." )
protected static final BitSet rel_segment
rel_segment = 1*( unreserved | escaped | ";" | "@" | "&" | "=" | "+" | "$" | "," )
protected static final BitSet rel_path
rel_path = rel_segment [ abs_path ]
protected static final BitSet net_path
net_path = "//" authority [ abs_path ]
protected static final BitSet hier_part
hier_part = ( net_path | abs_path ) [ "?" query ]
protected static final BitSet relativeURI
relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ]
protected static final BitSet absoluteURI
absoluteURI = scheme ":" ( hier_part | opaque_part )
protected static final BitSet URI_reference
URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
protected static char[] encode(String original, BitSet allowed, String charset) throws org.apache.http.HttpException
original character sequence->octet sequence->URI character sequence
An escaped octet is encoded as a character triplet, consisting of the percent character "%" followed by the two hexadecimal digits representing the octet code. For example, "%20" is the escaped encoding for the US-ASCII space character. Conversion from the local filesystem character set to UTF-8 will normally involve a two step process. First convert the local character set to the UCS; then convert the UCS to UTF-8. The first step in the process can be performed by maintaining a mapping table that includes the local character set code and the corresponding UCS code. The next step is to convert the UCS character code to the UTF-8 encoding. Mapping between vendor codepages can be done in a very similar manner as described above. The only time escape encodings can allowedly be made is when a URI is being created from its component parts. The escape and validate methods are internally performed within this method.
original
- the original character sequenceallowed
- those characters that are allowed within a componentcharset
- the protocol charsetorg.apache.http.HttpException
- null component or unsupported character encodingprotected static String decode(char[] component, String charset) throws org.apache.http.HttpException
URI character sequence->octet sequence->original character sequence
A URI must be separated into its components before the escaped characters within those components can be allowedly decoded. Notice that there is a chance that URI characters that are non UTF-8 may be parsed as valid UTF-8. A recent non-scientific analysis found that EUC encoded Japanese words had a 2.7% false reading; SJIS had a 0.0005% false reading; other encoding such as ASCII or KOI-8 have a 0% false reading. The percent "%" character always has the reserved purpose of being the escape indicator, it must be escaped as "%25" in order to be used as data within a URI. The unescape method is internally performed within this method.
component
- the URI character sequencecharset
- the protocol charsetorg.apache.http.HttpException
- incomplete trailing escape pattern or unsupported
character encodingprotected static String decode(String component, String charset) throws org.apache.http.HttpException
URI character sequence->octet sequence->original character sequence
A URI must be separated into its components before the escaped characters within those components can be allowedly decoded. Notice that there is a chance that URI characters that are non UTF-8 may be parsed as valid UTF-8. A recent non-scientific analysis found that EUC encoded Japanese words had a 2.7% false reading; SJIS had a 0.0005% false reading; other encoding such as ASCII or KOI-8 have a 0% false reading. The percent "%" character always has the reserved purpose of being the escape indicator, it must be escaped as "%25" in order to be used as data within a URI. The unescape method is internally performed within this method.
component
- the URI character sequencecharset
- the protocol charsetorg.apache.http.HttpException
- incomplete trailing escape pattern or unsupported
character encoding