Package com.pnfsoftware.jeb.util.format
Class Strings
java.lang.Object
com.pnfsoftware.jeb.util.format.Strings
Utility methods for Strings and CharSequences.
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic String
Convert a camel-case string to a sentence.static String
camelCaseToString
(String s, boolean breakOnDigits, boolean keepUppercaseAcronyms) Convert a camel-case string to a sentence.static String
Capitalize the first character of a string.static boolean
A many-element variant ofString.contains
.static boolean
containsAt
(String s, int index, String elt) Indicates if a String s contains a particular substring at a specified index.static int
Count the number of occurrences of a character within a string.static int
Count the number of occurrences of a sub-string within a string.static int
Count the number of non blank characters in the provided string.static String[]
Decode an encoded array of objects.static String
decodeASCII
(byte[] bytes) Decode a byte buffer using an ASCII decoder.static String
decodeASCII
(byte[] bytes, int offset, int length) Decode a byte buffer using an ASCII decoder.decodeList
(String s) Decode an encoded list of objects.static String
decodeLocal
(byte[] bytes) Decode a byte buffer using the local platform's default charset.static String
decodeLocal
(byte[] bytes, int offset, int length) Decode a byte buffer using the local platform's default charset.Decode an encoded map.static String
decodeUTF8
(byte[] bytes) Decode a byte buffer using a UTF-8 decoder.static String
decodeUTF8
(byte[] bytes, int offset, int length) Decode a byte buffer using a UTF-8 decoder.static String
decodeUTF8Ex
(byte[] bytes, boolean useStandardDecoderFirst) static String
decodeUTF8Ex
(byte[] bytes, int off, int len, boolean useStandardDecoderFirst) static Charset
determinePotentialEncoding
(byte[] data, int offset, int size) Heuristically determine the encoding of a string.static String
encodeArray
(Object... array) Encode an array of objects.static byte[]
Encode a string using an ASCII encoder.static byte[]
Generate a byte array consisting of the low-bytes of the input string characters.static String
encodeList
(List<?> list) Encode a list of objects.static byte[]
Encode a string using the local platform's default charset.static String
Encode a dictionary.static byte[]
encodeUTF8
(String s) Encode a string using a UTF-8 encoder.static boolean
A many-element variant ofString.endsWith
.static boolean
A safer version ofString.equals(Object)
.static boolean
equalsIgnoreCase
(String a, String b) A safer version ofString.equalsIgnoreCase(String)
static String
Format using the US locale.static Appendable
ff
(Appendable sink, String format, Object... args) A faster version ofString.format(String, Object...)
.static String
A faster version ofString.format(String, Object...)
.static Appendable
ff
(Locale l, Appendable sink, String format, Object... args) A faster version ofString.format(String, Object...)
.static String
A faster version ofString.format(String, Object...)
.static int[]
findWordBoundaries
(String str, int offset) Find a word in the stringstatic int[]
findWordBoundaries
(String str, int offset, Predicate<Character> boundaryTester) Find a word in the stringstatic String
static String
generate
(char c, int count) Generate a repeated-characterString
.static String
generate
(CharSequence s, int count) Generate a repeated string.static int
getAsciiLength
(byte[] data) Same asgetAsciiLength(data, data.length)
.static int
getAsciiLength
(byte[] data, int maxlen) Retrieve the length of a potentially ASCII-encoded string.static int
getBOMSize
(byte[] input) Retrieve the size taken by the BOM or equivalent encoding mark.static Comparator<String>
Get a case-sensitive string comparator that treats hexadecimal sequences as numbers, and orders them accordingly, instead as simple strings.static Comparator<String>
getComparator
(boolean caseSensitive, boolean scanHexadecimal) Get a string comparator that can treat hexadecimal sequences as numbers (and order them accordingly) instead as simple strings.static int
static int
static int
getInitialBlankSize
(InputStream stream, boolean includeBOM, char... extraWhitespaceCharacters) Retrieve the initial blank bytes at the beginning of a stream (non data)static boolean
Determine if a string contains one or more WSP characters.static boolean
Determine if a string is non-null and non-empty.static boolean
Determine if a string contains right-to-left (RTL) characters, eg Arabic or Hebrew characters.static String
indentBlock
(String blk) Indent a buffer using a 4-space indentation.static String
indentBlock
(String blk, String indent) Indent a buffer.static int
indexOf
(CharSequence text, char c) Implementation ofindexOf
for CharSequence.static int
indexOf2
(CharSequence text, char c0, char c1) Find the first one of two characters and return its position.static int
indexOf2
(CharSequence text, int from, char c0, char c1) Find the first one of two characters and return its position.static int
indexOfAny
(CharSequence text, Set<Character> cset) Find the first one of any of the provided characters and return its position.static int
indexOfNotInGroup
(CharSequence text, char c, int fromIndex, char[]... ingoreInGroups) Find the index a of character, ignoring some groups.static boolean
isAsciiWhitespace
(int b, char... extraWhitespaceCharacters) Determine if a character is a white-space, per the Ascii standard.static boolean
Determine if a character sequence is null, empty, or contains WSP chars exclusively.static boolean
isContainedIn
(String s, String... elts) Determine if a string is contained in an var-arg list of provided strings.static boolean
isHexNumber
(String text) Check that every character of the text parameter is an hexadecimal number.static boolean
Check that every character of the text parameter is a digit.static boolean
isPrintableCharsetHeader
(byte[] headerBytes, Charset charset) Validate if some starting bytes may be encoded with a particular charset.static boolean
isPrintableUTF8Header
(byte[] headerBytes) Validate if some starting bytes may be considered as an UTF-8 printable character header.static boolean
isWellFormedUTF8
(byte[] bytes) static boolean
isWellFormedUTF8
(byte[] bytes, int off, int len) static boolean
isWhitespace
(char c) Determine if a character is a white-space, per the Unicode standard.static String
Join the string representations of a sequence of objects using the provided separator.static <T> String
join
(String separator, Iterable<T> iterator, Function<T, CharSequence> f) Join a series of items.static String
Join a series of non-null strings.static String
Join the elements of a list using "," as a separator and surround the resulting string with square brackets.static String
Join the string representations of a sequence of objects using the provided separator.static String
Join the string representations of a sequence of objects using the provided separator.static int
lastIndexOf2
(CharSequence text, char c0, char c1) Find the last one of two characters and return its position.static int
lastIndexOf2
(CharSequence text, int from, char c0, char c1) Find the last one of two characters and return its position.static int
lastIndexOfAny
(CharSequence text, Set<Character> cset) Find the last one of any of the provided characters and return its position.static boolean
Check whether an input string matches a providedregex pattern
.static boolean
Check whether an input string matches a providedregex pattern
.static String
Left trim all chars less than or equal to ' '.static String
Left trim on a given character.static void
Append a new-line character to the provided buffer unless the buffer is empty or the last character in the buffer is a new-line.static String
Replace all newline sequences by the standard \n LF charcter.static CharSequence
pad
(char c, int count) Repeat character c, iter times and build aCharSequence
from it.static String[]
Parse a string as a command line.static String
parseUrlParameter
(String s, String entry) Same asparseUrlParameters(java.lang.String, java.lang.String...)
with a single entry.static String[]
parseUrlParameters
(String s, String... entries) Extract the parameters of a URL-like encoded string.static String
Generate a 32-character long random unique identifier.static String
readBOM
(byte[] input) Retrieve the charset from start bytes.static String
replaceLast
(String str, String target, String replacement) Replace the last occurrence of target in str by the replacementstatic String
replaceNewLines
(String s, String repl) Replace newline sequences.static String
replaceWhitespaces
(String str, char repl) Efficiently replace all Unicode white-spaces by the provided char.static void
static String
Right trim all chars less than or equal to ' '.static String
Right trim on a given character.static String
Get the string representation of the parameter object, or the empty string if the object is null.static String
Get the string representation of the parameter object, or the provided string if the object is null.static String
Get the string representation of the parameter object, or the provided non-empty string if the object is null or its string representation is the empty string.static int
search
(CharSequence data, int index, String pattern, boolean regex, boolean caseSensitive, boolean reverseSearch) Search for a sub-string.static String
spaces
(int count) Generate a repeated string of spaces.static String[]
static String[]
splitLines
(String s) Split a text into an array of Lines.static String[]
splitLines
(String s, boolean doNotReturnFinalEmptyLine) Split a text into an array of Lines.static boolean
starMatches
(String str, String pat) Check whether an input string matches a provided pattern using aStarMatcher
.static boolean
startsWith
(String s, String... elts) A many-element variant ofString.startsWith
.static String
Flexible version ofString.substring(int, int)
.static String
A safe version ofString.toString
.static String
A safe version ofString.toString
.static String
Trim (left and right) all chars less than or equal to ' '.static String
Trim (left and right) all chars to provided character.static String
Trim (left and right) all characters considered to be white-space by the Unicode standard.static String
Truncate a string.static String
truncateWithSuffix
(String s, int maxLength, String suffix) Truncate a string and append an optional suffix to it if it was actually truncated.static String
Decode aURL-encoded
string.static String
Urlencode a string.
-
Field Details
-
LINESEP
Line-separator for *this* platform.
-
-
Constructor Details
-
Strings
public Strings()
-
-
Method Details
-
hasLength
Determine if a string is non-null and non-empty.- Parameters:
s
-- Returns:
- the true IFF string contains at least one character
-
replaceNewLines
Replace newline sequences. This method accepts null strings as input.- Parameters:
s
- a string or null; in the latter case, null will be returnedrepl
- the non-null substitution string, which must not contain new-line characters- Returns:
-
normalizeNewLines
Replace all newline sequences by the standard \n LF charcter.- Parameters:
s
- a string- Returns:
-
safe
Get the string representation of the parameter object, or the empty string if the object is null.- Parameters:
s
- an object, possibly null- Returns:
- the object
toString(java.lang.Object)
representation, or the empty string
-
safe
Get the string representation of the parameter object, or the provided string if the object is null.- Parameters:
s
- an object, possibly nulldef
- a non-null string- Returns:
- a non-null string, possibly empty
-
safe2
Get the string representation of the parameter object, or the provided non-empty string if the object is null or its string representation is the empty string.- Parameters:
s
- an object, possibly nulldef
- a non-null, non-empty string- Returns:
- a string guaranteed to be non-empty
-
joinList
Join the elements of a list using "," as a separator and surround the resulting string with square brackets. Careful, this method does not abide to the common semantic ofjoin
.- Parameters:
objects
- a list of objects- Returns:
- the resulting string
-
joinv
Join the string representations of a sequence of objects using the provided separator. Null objects will be formatted as "null".- Parameters:
separator
- a non-null separatorobjects
- an array of objects- Returns:
- the resulting string
-
joinv
Join the string representations of a sequence of objects using the provided separator.- Parameters:
separator
- a non-null separatordefaultValue
- String representation for null Objectsobjects
- an array of objects- Returns:
- the resulting string
-
join
Join the string representations of a sequence of objects using the provided separator. Null objects will be formatted as "null".- Parameters:
separator
- a non-null separatoriterator
- an iterator- Returns:
- the resulting string
-
join
Join a series of items. Format items using the function. For example, to display a list of long as hexadecimal separated by comma:Strings.join(", ", Arrays.asList(0x10L, 0x20L), l -> Long.toHexString(l))
- Type Parameters:
T
- Any Object- Parameters:
separator
- a non-null separatoriterator
-f
- toString() equivalent method to be applied to objects from list.- Returns:
- the resulting string
-
join
Join a series of non-null strings.- Parameters:
separator
-elts
-begin
- inclusive start indexend
- exclusive end index- Returns:
-
splitLines
Split a text into an array of Lines. Empty lines are returned. The final new-line character(s) are trimmed off. Works for all new lines characters (\r, \n) or sequences of characters (\r\n)- Parameters:
s
- mandatory input stringdoNotReturnFinalEmptyLine
-- Returns:
- the lines
-
splitLines
Split a text into an array of Lines. Empty lines are returned. The final new-line character(s) are trimmed off. Works for all new lines characters (\r, \n) or sequences of characters (\r\n)- Parameters:
s
- mandatory input string- Returns:
- the lines
-
splitall
-
firstLine
-
search
public static int search(CharSequence data, int index, String pattern, boolean regex, boolean caseSensitive, boolean reverseSearch) Search for a sub-string.Note: on JDK 11+, this implementation for
regex=false, caseSensitive=false, reverseSearch=false
may be slower than doingdata.toLowerCase().indexOf(pattern.toLowerCase())
.- Parameters:
data
- buffer to be searched (aka, the haystack)index
- in the case of a regular (forward) search, the search takes is [index,EOS); in the case of a reverse (backward) search, the search range is [0,index)pattern
- text that is being searched (aka, the needle)regex
- if true, the pattern will be treated as a regular expression; if the regex is invalid, it will be treated as a regular string and no error will be reportedcaseSensitive
- search is case-sensitivereverseSearch
- search is done in reverse- Returns:
- index where the substring was found, or -1 if nothing was found
-
isContainedIn
Determine if a string is contained in an var-arg list of provided strings.- Parameters:
s
- string to be searchedelts
- the list of elements- Returns:
- true iff the input string was not null and found in the list of elements
-
contains
A many-element variant ofString.contains
.- Parameters:
s
- the stringelts
- a list of string elements- Returns:
- true if the string contains at least one of the provided elements
-
startsWith
A many-element variant ofString.startsWith
.- Parameters:
s
- the stringelts
- a list of string elements- Returns:
- true if the string starts with one of the provided elements
-
containsAt
Indicates if a String s contains a particular substring at a specified index. Semantically equivalent tos.substring(i).startsWith(elt)
without intermediate substring creation.- Parameters:
s
- the stringindex
- String index to look atelt
- element to identify- Returns:
- true if s.substring(i).startsWith(elt) returns true
-
endsWith
A many-element variant ofString.endsWith
.- Parameters:
s
- the stringelts
- a list of string elements- Returns:
- true if the string ends with one of the provided elements
-
equals
A safer version ofString.equals(Object)
.- Parameters:
a
- first string, may be nullb
- second string, may be null- Returns:
- true iff both strings are non-null and equals
-
equalsIgnoreCase
A safer version ofString.equalsIgnoreCase(String)
- Parameters:
a
- first string, may be nullb
- second string, may be null- Returns:
- true iff both strings are non-null and iequals
-
toString
A safe version ofString.toString
.- Parameters:
o
- an object, could be null- Returns:
- the String representation of the provided object, or "null"
-
toString
A safe version ofString.toString
.- Parameters:
o
- an object, could be nulldefaultValue
- default String representation if o is null- Returns:
- the String representation of the provided object, or the default value
-
generate
- Parameters:
c
- character to repeatcount
- repeat count (ie, string length)- Returns:
- the string
-
generate
Generate a repeated string.- Parameters:
s
- string to repeatcount
- repeat count- Returns:
- the resulting result
-
spaces
Generate a repeated string of spaces.- Parameters:
count
-- Returns:
-
isBlank
Determine if a character sequence is null, empty, or contains WSP chars exclusively.- Parameters:
s
- the character sequence- Returns:
- true if the sequence is null or blank
-
countNonBlankCharacters
Count the number of non blank characters in the provided string.- Parameters:
s
-- Returns:
-
indexOf
Implementation ofindexOf
for CharSequence. Same behavior asString.indexOf(int)
.- Parameters:
text
- stringc
- char- Returns:
- the index position, or -1 if not found
-
indexOf2
Find the first one of two characters and return its position.This is a 2-element implementation of
String.indexOf(int)
.- Parameters:
text
- stringc0
- first charc1
- second char- Returns:
- the position of the first occurrence of c0 or c1 (whichever came first), -1 if not found
-
indexOf2
Find the first one of two characters and return its position.- Parameters:
text
- stringfrom
- start indexc0
- first charc1
- second char- Returns:
- the position of the first occurrence of c0 or c1 (whichever came first), -1 if not found
-
indexOfAny
Find the first one of any of the provided characters and return its position.This is a N-element implementation of
String.indexOf(int)
.- Parameters:
text
-cset
- a set of characters- Returns:
-
indexOfNotInGroup
public static int indexOfNotInGroup(CharSequence text, char c, int fromIndex, char[]... ingoreInGroups) Find the index a of character, ignoring some groups.
For example:- ignore some text in parenthesis:
indexOfNotInGroup("it is (almost) done", 'o', 0, ['(', ')'])
will return 16- ignore generics:
indexOfNotInGroup("std::myclass<a,b>::mymethod(type a, type b)", ',', 0, ['<', '>'])
will return 34- Parameters:
text
- stringc
- character to findfromIndex
- start index, use 0 by defaultingoreInGroups
- list of character groups to be ignored {'(', ')'}, {'<', '>'}. Each character group must contain at least 2 elements (one for open element, one for close element)- Returns:
- positive index if found, -1 when not found, -2 in case of malformed
- ignore some text in parenthesis:
-
lastIndexOf2
Find the last one of two characters and return its position.This is a 2-element implementation of
String.lastIndexOf(int)
.- Parameters:
text
- stringc0
- first charc1
- second char- Returns:
- the position of the last occurrence of c0 or c1 (whichever came first), -1 if not found
-
lastIndexOf2
Find the last one of two characters and return its position.- Parameters:
text
- stringfrom
- start indexc0
- first charc1
- second char- Returns:
- the position of the last occurrence of c0 or c1 (whichever came first), -1 if not found
-
lastIndexOfAny
Find the last one of any of the provided characters and return its position.This is a N-element implementation of
String.lastIndexOf(int)
.- Parameters:
text
-cset
- a set of characters- Returns:
-
hasBlank
Determine if a string contains one or more WSP characters.- Parameters:
s
-- Returns:
-
isWhitespace
public static boolean isWhitespace(char c) Determine if a character is a white-space, per the Unicode standard. This method differs fromCharacter.isWhitespace(char)
(Java language definition of a WSP).- Parameters:
c
-- Returns:
-
isAsciiWhitespace
public static boolean isAsciiWhitespace(int b, char... extraWhitespaceCharacters) Determine if a character is a white-space, per the Ascii standard. It only processes regular space, tab, CR and LF characters.- Parameters:
b
- the int to testextraWhitespaceCharacters
- additional ascii characters considered as whitespace- Returns:
-
replaceWhitespaces
Efficiently replace all Unicode white-spaces by the provided char.- Parameters:
str
-repl
-- Returns:
-
trimWhitespaces
Trim (left and right) all characters considered to be white-space by the Unicode standard.- Parameters:
s
- the input string- Returns:
- the trimmed string
-
trim
Trim (left and right) all chars less than or equal to ' '. Note that this method differs fromString.trim()
which, for instance, does not consider CR or LF to be WSP.- Parameters:
s
- a string- Returns:
- the trimmed string
-
trim
Trim (left and right) all chars to provided character.- Parameters:
s
- a stringc
- the character to be removed- Returns:
- the trimmed string
-
ltrim
Left trim all chars less than or equal to ' '. Note that this method differs fromString.trim()
which, for instance, does not consider CR or LF to be WSP.- Parameters:
s
- a string- Returns:
- the left-trimmed string
-
rtrim
Right trim all chars less than or equal to ' '. Note that this method differs fromString.trim()
which, for instance, does not consider CR or LF to be WSP.- Parameters:
s
- a string- Returns:
- the right-trimmed string
-
ltrim
Left trim on a given character.- Parameters:
s
-c
-- Returns:
-
rtrim
Right trim on a given character.- Parameters:
s
-c
-- Returns:
-
getAsciiLength
public static int getAsciiLength(byte[] data, int maxlen) Retrieve the length of a potentially ASCII-encoded string. The String characters allowed are contained CR, LF, TAB, and any character in the [0x20, 0x7E] range.- Parameters:
data
- a byte arraymaxlen
- maximum length- Returns:
- the length of the string
-
getAsciiLength
public static int getAsciiLength(byte[] data) Same asgetAsciiLength(data, data.length)
.- Parameters:
data
- a bybte array- Returns:
- the length of the string
-
determinePotentialEncoding
Heuristically determine the encoding of a string.- Parameters:
data
-offset
-size
-- Returns:
- null if unknown, else one of ASCII, UTF-8, UTF-16, UTF-16LE, UTF-16BE, UTF-32LE or UTF-32BE
-
isNumber
Check that every character of the text parameter is a digit.- Parameters:
text
-- Returns:
- true if text is a valid decimal number
-
isHexNumber
Check that every character of the text parameter is an hexadecimal number. Allow upper case as well as lower case characters (only lower or only upper).- Parameters:
text
-- Returns:
- true if text is a valid hexadecimal number
-
f
Format using the US locale.- Parameters:
format
-args
-- Returns:
-
getFastFormatInvocationCount
public static int getFastFormatInvocationCount() -
getFastFormatFailureCount
public static int getFastFormatFailureCount() -
resetFastFormatCounts
public static void resetFastFormatCounts() -
ff
A faster version ofString.format(String, Object...)
.- Parameters:
sink
- optional recipient (if null, a new builder will be created; the formatted string is appended to the sink)l
- locale to be usedformat
- format stringargs
- format arguments- Returns:
- the sink, never null
-
ff
A faster version ofString.format(String, Object...)
.- Parameters:
sink
- optional recipient (if null, a new builder will be created; the formatted string is appended to the sink)format
- format stringargs
- format arguments- Returns:
- the sink, never null
-
ff
A faster version ofString.format(String, Object...)
.- Parameters:
l
- locale to be usedformat
- format stringargs
- format arguments- Returns:
- the formatted string
-
ff
A faster version ofString.format(String, Object...)
.- Parameters:
format
- format stringargs
- format arguments- Returns:
- the formatted string
-
replaceLast
Replace the last occurrence of target in str by the replacement- Parameters:
str
- the string to search intarget
- the string to search forreplacement
- the replacement part- Returns:
- the new string with replacement instead of last target occurence or original string if target was not found
-
substring
Flexible version ofString.substring(int, int)
. Allow Python-like negative indexes for convenience.- Parameters:
s
- a stringbegin
- index in the [-s_length, +s_length] rangeend
- index in the [-s_length, +s_length] range- Returns:
- the substring
-
truncate
Truncate a string.- Parameters:
s
- a stringmaxLength
- positive length- Returns:
- the truncated string, which will contain at most `maxLength` characters
-
truncateWithSuffix
Truncate a string and append an optional suffix to it if it was actually truncated.- Parameters:
s
- a stringmaxLength
- positive length, which must be greater than or equal to the suffix, if one was providedsuffix
- optional suffix appended to a string that is actualy truncated- Returns:
-
indentBlock
Indent a buffer.- Parameters:
blk
-indent
-- Returns:
-
indentBlock
Indent a buffer using a 4-space indentation.- Parameters:
blk
-- Returns:
-
urlencodeUTF8
Urlencode a string. The resulting string will have the following characteristics:- a-z, A-Z, 0-9 remain the same
- ., -, *, _ remain the same
- space is converted to +
- all other characters are UTF8 encoded using the "%xx" scheme
- Parameters:
s
- the string to be encoded- Returns:
- the encoded string
-
urldecodeUTF8
Decode aURL-encoded
string.- Parameters:
s
- the encoded string- Returns:
- the decoded string
-
parseUrlParameters
Extract the parameters of a URL-like encoded string. No decoding is taking place. Example:- s: "type=home&subtype=house&[another_key]=[another_value]" - entries: "type", "subtype" - returns: ["home", "house"]
- Parameters:
s
- the string to be parsedentries
- the entries, whose count must match the number of key-value pairs- Returns:
- the list of parameters, as they were (ie, without any decoding applied)
-
parseUrlParameter
Same asparseUrlParameters(java.lang.String, java.lang.String...)
with a single entry.- Parameters:
s
- the URL-like string to be parsed, containing a single key-value pair, eghometype=house
entry
-- Returns:
- the parameter (without decoding applied)
-
encodeArray
Encode an array of objects.- Parameters:
array
- the array of objects- Returns:
- the encoded array as a string
-
decodeArray
Decode an encoded array of objects.- Parameters:
s
- the encoded array- Returns:
- the array of decoded strings
-
encodeList
Encode a list of objects.- Parameters:
list
- the list of objects- Returns:
- the encoded list as a string
-
decodeList
Decode an encoded list of objects.- Parameters:
s
- optional encoded list- Returns:
- the list of decoded strings
-
encodeMap
Encode a dictionary. The encoding scheme will produce strings like:encodedKey1=encodedValue1&encodedKey2=encodedValue2&...
- Parameters:
map
- the map of key/values- Returns:
- the encoded map as a string
-
decodeMap
Decode an encoded map.- Parameters:
s
- optional encoded map- Returns:
- the decoded map
-
encodeUTF8
Encode a string using a UTF-8 encoder. If the encoder is not available, the string is encoded using the system's default encoder. This should never happen.- Parameters:
s
- mandatory string- Returns:
- the encoded byte buffer
-
decodeUTF8
Decode a byte buffer using a UTF-8 decoder. If the decoder is not available, the byte buffer is decoded using the system's default decoder.- Parameters:
bytes
- byte bufferoffset
- start offsetlength
- count of bytes to be decoded- Returns:
- the decoded string
-
decodeUTF8
Decode a byte buffer using a UTF-8 decoder. If the decoder is not available, the byte buffer is decoded using the system's default decoder.- Parameters:
bytes
- mandatory byte buffer- Returns:
- the decoded string
-
encodeASCII
Encode a string using an ASCII encoder. If the encoder is not available, the string is encoded using the system's default encoder. This should never happen.- Parameters:
s
- mandatory string- Returns:
- the encoded byte buffer
-
decodeASCII
Decode a byte buffer using an ASCII decoder. If the decoder is not available, the byte buffer is decoded using the system's default decoder.- Parameters:
bytes
- byte bufferoffset
- start offsetlength
- count of bytes to be decoded- Returns:
- the decoded string
-
decodeASCII
Decode a byte buffer using an ASCII decoder. If the decoder is not available, the byte buffer is decoded using the system's default decoder.- Parameters:
bytes
- mandatory byte buffer- Returns:
- the decoded string
-
encodeLocal
Encode a string using the local platform's default charset. This method is potentially dangerous.- Parameters:
s
- mandatory string- Returns:
- the encoded byte buffer
-
decodeLocal
Decode a byte buffer using the local platform's default charset. This method is potentially dangerous.- Parameters:
bytes
- byte bufferoffset
- start offsetlength
- count of bytes to be decoded- Returns:
- the decoded string
-
decodeLocal
Decode a byte buffer using the local platform's default charset. This method is potentially dangerous.- Parameters:
bytes
- mandatory byte buffer- Returns:
- the decoded string
-
encodeBinary
Generate a byte array consisting of the low-bytes of the input string characters.- Parameters:
s
-- Returns:
-
getComparator
Get a case-sensitive string comparator that treats hexadecimal sequences as numbers, and orders them accordingly, instead as simple strings.Refer to
NumberComparator
andAlphanumCharComparator
for details.- Returns:
- the comparator
-
getComparator
Get a string comparator that can treat hexadecimal sequences as numbers (and order them accordingly) instead as simple strings.Refer to
NumberComparator
andAlphanumCharComparator
for details.- Parameters:
caseSensitive
-scanHexadecimal
-- Returns:
- the comparator
-
makeNewLine
Append a new-line character to the provided buffer unless the buffer is empty or the last character in the buffer is a new-line.- Parameters:
sb
- a string builder
-
randomUniqueId
Generate a 32-character long random unique identifier. The UID returned consists of the digits 0 to 9 and letters a to f (lower-case).- Returns:
-
pad
Repeat character c, iter times and build aCharSequence
from it. For example pad('0', 4) will return "0000".- Parameters:
c
- inner charactercount
- times to repeat character.- Returns:
CharSequence
-
capitalizeFirst
Capitalize the first character of a string.- Parameters:
s
-- Returns:
-
camelCaseToString
public static String camelCaseToString(String s, boolean breakOnDigits, boolean keepUppercaseAcronyms) throws ParseException Convert a camel-case string to a sentence. Example:ThisIsACamelCaseString -> This is a camel case string ThisIsACamel44CaseString -> This is a camel44 case string CountryUSA -> Country u s a with breakOnDigits=true: ThisIsACamel44CaseString -> This is a camel 44 case string with keepUppercaseAcronyms=true: CountryUSA -> Country USA
A legal camel-case string always starts with an upper-case letter, and does not contain whitespace characters.- Parameters:
s
- the input camel-case stringbreakOnDigits
- if true, base-10 numbers will also be used as breakskeepUppercaseAcronyms
- keep 2+ upper-case letter acronyms intact, eg:CountryUSA
would be converted toCountry USA
instead ofCountry u s a
- Returns:
- the result sentence
- Throws:
ParseException
- if the input string was not camel-case formatted
-
camelCaseToString
Convert a camel-case string to a sentence. Example:ThisIsACamelCaseString -> This is a camel case string
A legal camel-case string always starts with an upper-case letter, and does not contain whitespace characters.- Parameters:
s
- the input camel-case string- Returns:
- the result sentence
- Throws:
ParseException
- if the input string was not camel-case formatted
-
hasRtl
Determine if a string contains right-to-left (RTL) characters, eg Arabic or Hebrew characters.- Parameters:
s
-- Returns:
-
parseCommandline
Parse a string as a command line. Source: ant.jar.- Parameters:
s
- the command line to process.- Returns:
- the command line broken into strings
-
isWellFormedUTF8
public static boolean isWellFormedUTF8(byte[] bytes) - Parameters:
bytes
-- Returns:
-
isWellFormedUTF8
public static boolean isWellFormedUTF8(byte[] bytes, int off, int len) - Parameters:
bytes
-off
-len
-- Returns:
-
isPrintableUTF8Header
public static boolean isPrintableUTF8Header(byte[] headerBytes) Validate if some starting bytes may be considered as an UTF-8 printable character header.- Parameters:
headerBytes
- starting bytes. May be cropped without incidence (will be more accurate with more bytes, though).- Returns:
- true if bytes appears to represent UTF-8.
-
isPrintableCharsetHeader
Validate if some starting bytes may be encoded with a particular charset.- Parameters:
headerBytes
- starting bytes. May be cropped without incidence (will be more accurate with more bytes, though).charset
- Charset to detect. USeisPrintableUTF8Header(byte[])
for UTF-8.- Returns:
- true if bytes appears to represent the provided charset.
-
decodeUTF8Ex
- Parameters:
bytes
-useStandardDecoderFirst
-- Returns:
-
decodeUTF8Ex
- Parameters:
bytes
-off
-len
-useStandardDecoderFirst
-- Returns:
-
getBOMSize
public static int getBOMSize(byte[] input) Retrieve the size taken by the BOM or equivalent encoding mark. Detect UTF-8, UTF-16 and UTF-32.- Parameters:
input
- byte array. Be sure to have at least 4 bytes to analyze all.- Returns:
- the size taken by BOM or 0 if no BOM was detected
-
readBOM
Retrieve the charset from start bytes. Detect UTF-8, UTF-16LE/BE and UTF-32LE-BE.- Parameters:
input
- first bytes of a string- Returns:
- the detected charset or null if no BOM was detected.
-
getInitialBlankSize
public static int getInitialBlankSize(InputStream stream, boolean includeBOM, char... extraWhitespaceCharacters) throws IOException Retrieve the initial blank bytes at the beginning of a stream (non data)- Parameters:
stream
- input Stream to analyzeincludeBOM
- true will consider BOM at start of the stream as an initial blank bytes- Returns:
- the number of bytes considered as blank
- Throws:
IOException
-
count
Count the number of occurrences of a character within a string.- Parameters:
str
- haystackch
- needle- Returns:
- the number of occurrences
-
count
Count the number of occurrences of a sub-string within a string.Note: a search for 'aaa' inside 'aaaaaa' would return 4, not 2!
- Parameters:
str
- haystacksub
- needlecountOverlaps
- if true, a search for 'aaa' inside 'aaaaaa' will return 4 instead of 2- Returns:
- the number of occurrences
-
like
Check whether an input string matches a providedregex pattern
. This method is case-sensitive.- Parameters:
str
- a stringpat
- a regular expression- Returns:
-
likei
Check whether an input string matches a providedregex pattern
. This method is case-insensitive.- Parameters:
str
- a stringpat
- a regular expression- Returns:
-
starMatches
Check whether an input string matches a provided pattern using aStarMatcher
.- Parameters:
str
- a stringpat
- a wildcard pattern- Returns:
-
findWordBoundaries
Find a word in the string- Parameters:
str
- a stringoffset
- offset in the string, for which the underlying word should be found- Returns:
- a tuple (start, end) in the string, specifying the word boundaries; if nothing is found, the tuple returned will be (provided_offset, provided_offset)
-
findWordBoundaries
Find a word in the string- Parameters:
str
- a stringoffset
- offset in the string, for which the underlying word should be foundoptional
- custom boundaryTester; leave null to use the default boundary tester (in that case, characters considered as boundaries are: white-space characters, punctuation characters except dash and underscore)- Returns:
- a tuple (start, end) in the string, specifying the word boundaries; if nothing is found, the tuple returned will be (provided_offset, provided_offset)
-