public class CsvParserSettings extends CommonParserSettings<CsvFormat>
CsvParser
)
In addition to the configuration options provided by CommonParserSettings
, the CSVParserSettings include:
When reading, if the parser does not read any character from the input, and the input is within quotes, the empty is used instead of an empty string
CsvParser
,
CsvFormat
,
CommonParserSettings
headerExtractionEnabled
Constructor and Description |
---|
CsvParserSettings() |
Modifier and Type | Method and Description |
---|---|
protected void |
addConfiguration(Map<String,Object> out) |
CsvParserSettings |
clone()
Clones this configuration object.
|
CsvParserSettings |
clone(boolean clearInputSpecificSettings)
Clones this configuration object to reuse user-provided settings.
|
protected CsvFormat |
createDefaultFormat()
Returns the default CsvFormat configured to handle CSV inputs compliant to the RFC4180 standard.
|
void |
detectFormatAutomatically()
Convenience method to turn on all format detection features in a single method call, namely:
setDelimiterDetectionEnabled(boolean, char[])
setQuoteDetectionEnabled(boolean)
CommonParserSettings.setLineSeparatorDetectionEnabled(boolean)
|
void |
detectFormatAutomatically(char... delimitersForDetection)
Convenience method to turn on all format detection features in a single method call, namely:
setDelimiterDetectionEnabled(boolean, char[])
setQuoteDetectionEnabled(boolean)
CommonParserSettings.setLineSeparatorDetectionEnabled(boolean)
|
char[] |
getDelimitersForDetection()
Returns the sequence of possible delimiters for detection when
isDelimiterDetectionEnabled() evaluates
to true , in order of priority. |
String |
getEmptyValue()
Returns the String representation of an empty value (defaults to null)
|
int |
getFormatDetectorRowSampleCount()
Returns the number of sample rows used in the CSV format auto-detection process (defaults to 20)
|
boolean |
getIgnoreLeadingWhitespacesInQuotes()
Returns whether or not leading whitespaces from quoted values should be skipped (defaults to false)
Note: if
keepQuotes evaluates to true , values won't be trimmed. |
boolean |
getIgnoreTrailingWhitespacesInQuotes()
Returns whether or not trailing whitespaces from within quoted values should be skipped (defaults to false)
Note: if
keepQuotes evaluates to true , values won't be trimmed. |
boolean |
getKeepQuotes()
Flag indicating whether the parser should keep enclosing quote characters in the values parsed from the input.
|
UnescapedQuoteHandling |
getUnescapedQuoteHandling()
Returns the method of handling values with unescaped quotes.
|
boolean |
isDelimiterDetectionEnabled()
Returns a flag indicating whether the parser should analyze the input to discover the column delimiter character.
|
boolean |
isEscapeUnquotedValues()
Indicates whether escape sequences should be processed in unquoted values.
|
boolean |
isKeepEscapeSequences()
Indicates whether the parser should keep any escape sequences if they are present in the input (i.e.
|
boolean |
isNormalizeLineEndingsWithinQuotes()
Flag indicating whether the parser should replace line separators, specified in
Format.getLineSeparator()
by the normalized line separator character specified in Format.getNormalizedNewline() , even on quoted values. |
boolean |
isParseUnescapedQuotes()
Deprecated.
use
getUnescapedQuoteHandling() instead. The configuration returned by getUnescapedQuoteHandling() will override this
setting if not null. |
boolean |
isParseUnescapedQuotesUntilDelimiter()
Deprecated.
use
getUnescapedQuoteHandling() instead. The configuration returned by getUnescapedQuoteHandling() will override this
setting if not null. |
boolean |
isQuoteDetectionEnabled()
Returns a flag indicating whether the parser should analyze the input to discover the quote character.
|
protected CharAppender |
newCharAppender()
Returns an instance of CharAppender with the configured limit of maximum characters per column and the default value used to represent an empty value
(when the String parsed from the input, within quotes, is empty)
|
void |
setDelimiterDetectionEnabled(boolean separatorDetectionEnabled)
Configures the parser to analyze the input before parsing to discover the column delimiter character.
|
void |
setDelimiterDetectionEnabled(boolean separatorDetectionEnabled,
char... delimitersForDetection)
Configures the parser to analyze the input before parsing to discover the column delimiter character.
|
void |
setEmptyValue(String emptyValue)
Sets the String representation of an empty value (defaults to null)
|
void |
setEscapeUnquotedValues(boolean escapeUnquotedValues)
Configures the parser to process escape sequences in unquoted values.
|
void |
setFormatDetectorRowSampleCount(int formatDetectorRowSampleCount)
Updates the number of sample rows used in the CSV format auto-detection process (defaults to 20)
|
void |
setIgnoreLeadingWhitespacesInQuotes(boolean ignoreLeadingWhitespacesInQuotes)
Defines whether or not leading whitespaces from quoted values should be skipped (defaults to false)
Note: if
keepQuotes evaluates to true , values won't be trimmed. |
void |
setIgnoreTrailingWhitespacesInQuotes(boolean ignoreTrailingWhitespacesInQuotes)
Defines whether or not trailing whitespaces from quoted values should be skipped (defaults to false)
Note: if
keepQuotes evaluates to true , values won't be trimmed. |
void |
setKeepEscapeSequences(boolean keepEscapeSequences)
Configures the parser to keep any escape sequences if they are present in the input (i.e.
|
void |
setKeepQuotes(boolean keepQuotes)
Configures the parser to keep enclosing quote characters in the values parsed from the input.
|
void |
setNormalizeLineEndingsWithinQuotes(boolean normalizeLineEndingsWithinQuotes)
Configures the parser to replace line separators, specified in
Format.getLineSeparator()
by the normalized line separator character specified in Format.getNormalizedNewline() , even on quoted values. |
void |
setParseUnescapedQuotes(boolean parseUnescapedQuotes)
Deprecated.
use
setUnescapedQuoteHandling(UnescapedQuoteHandling) instead. The configuration returned by getUnescapedQuoteHandling()
will override this setting if not null. |
void |
setParseUnescapedQuotesUntilDelimiter(boolean parseUnescapedQuotesUntilDelimiter)
Deprecated.
use
setUnescapedQuoteHandling(UnescapedQuoteHandling) instead. The configuration returned by getUnescapedQuoteHandling()
will override this setting if not null. |
void |
setQuoteDetectionEnabled(boolean quoteDetectionEnabled)
Configures the parser to analyze the input before parsing to discover the quote character.
|
void |
setUnescapedQuoteHandling(UnescapedQuoteHandling unescapedQuoteHandling)
Configures the handling of values with unescaped quotes.
|
void |
trimQuotedValues(boolean trim)
Configures the parser to trim any whitespaces around values extracted from within quotes.
|
addInputAnalysisProcess, clearInputSpecificSettings, configureFromAnnotations, getInputAnalysisProcesses, getInputBufferSize, getNumberOfRecordsToRead, getNumberOfRowsToSkip, getProcessor, getReadInputOnSeparateThread, getRowProcessor, isAutoClosingEnabled, isColumnReorderingEnabled, isCommentCollectionEnabled, isCommentProcessingEnabled, isHeaderExtractionEnabled, isLineSeparatorDetectionEnabled, newCharInputReader, setAutoClosingEnabled, setColumnReorderingEnabled, setCommentCollectionEnabled, setCommentProcessingEnabled, setHeaderExtractionEnabled, setInputBufferSize, setLineSeparatorDetectionEnabled, setNumberOfRecordsToRead, setNumberOfRowsToSkip, setProcessor, setReadInputOnSeparateThread, setRowProcessor
excludeFields, excludeFields, excludeIndexes, getErrorContentLength, getFormat, getHeaders, getIgnoreLeadingWhitespaces, getIgnoreTrailingWhitespaces, getMaxCharsPerColumn, getMaxColumns, getNullValue, getProcessorErrorHandler, getRowProcessorErrorHandler, getSkipBitsAsWhitespace, getSkipEmptyLines, getWhitespaceRangeStart, isAutoConfigurationEnabled, isProcessorErrorHandlerDefined, selectFields, selectFields, selectIndexes, setAutoConfigurationEnabled, setErrorContentLength, setFormat, setHeaders, setIgnoreLeadingWhitespaces, setIgnoreTrailingWhitespaces, setMaxCharsPerColumn, setMaxColumns, setNullValue, setProcessorErrorHandler, setRowProcessorErrorHandler, setSkipBitsAsWhitespace, setSkipEmptyLines, toString, trimValues
public String getEmptyValue()
When reading, if the parser does not read any character from the input, and the input is within quotes, the empty is used instead of an empty string
public void setEmptyValue(String emptyValue)
When reading, if the parser does not read any character from the input, and the input is within quotes, the empty is used instead of an empty string
emptyValue
- the String representation of an empty valueprotected CharAppender newCharAppender()
This overrides the parent's version because the CSV parser does not rely on the appender to identify null values, but on the other hand, the appender is required to identify empty values.
newCharAppender
in class CommonParserSettings<CsvFormat>
protected CsvFormat createDefaultFormat()
createDefaultFormat
in class CommonSettings<CsvFormat>
@Deprecated public boolean isParseUnescapedQuotes()
getUnescapedQuoteHandling()
instead. The configuration returned by getUnescapedQuoteHandling()
will override this
setting if not null.true
.@Deprecated public void setParseUnescapedQuotes(boolean parseUnescapedQuotes)
setUnescapedQuoteHandling(UnescapedQuoteHandling)
instead. The configuration returned by getUnescapedQuoteHandling()
will override this setting if not null.true
, the parser will parse the quote normally as part of the value.
If set the false
, a TextParsingException
will be thrown. Defaults to true
.parseUnescapedQuotes
- indicates whether or not the CSV parser should accept unescaped quotes inside quoted values.@Deprecated public void setParseUnescapedQuotesUntilDelimiter(boolean parseUnescapedQuotesUntilDelimiter)
setUnescapedQuoteHandling(UnescapedQuoteHandling)
instead. The configuration returned by getUnescapedQuoteHandling()
will override this setting if not null.true
)parseUnescapedQuotesUntilDelimiter
- a flag indicating that the parser should stop accumulating values when a field delimiter character is
found when parsing unquoted and unescaped values.@Deprecated public boolean isParseUnescapedQuotesUntilDelimiter()
getUnescapedQuoteHandling()
instead. The configuration returned by getUnescapedQuoteHandling()
will override this
setting if not null.true
)public boolean isEscapeUnquotedValues()
false
.
By default, this is disabled and if the input is A""B,C
, the resulting value will be
[A""B] and [C]
(i.e. the content is read as-is). However, if the parser is configured
to process escape sequences in unquoted values, the result will be [A"B] and [C]
public void setEscapeUnquotedValues(boolean escapeUnquotedValues)
false
.
By default, this is disabled and if the input is A""B,C
, the resulting value will be
[A""B] and [C]
(i.e. the content is read as-is). However, if the parser is configured
to process escape sequences in unquoted values, the result will be [A"B] and [C]
escapeUnquotedValues
- a flag indicating whether escape sequences should be processed in unquoted valuespublic final boolean isKeepEscapeSequences()
""
won't be replaced by a single double quote "
).
This is disabled by default
public final void setKeepEscapeSequences(boolean keepEscapeSequences)
""
won't be replaced by a single double quote "
).
This is disabled by default
keepEscapeSequences
- the flag indicating whether escape sequences should be kept (and not replaced) by the parser.public final boolean isDelimiterDetectionEnabled()
Note that the detection process is not guaranteed to discover the correct column delimiter. In this case the delimiter provided by CsvFormat.getDelimiter()
will be used
public final void setDelimiterDetectionEnabled(boolean separatorDetectionEnabled)
Note that the detection process is not guaranteed to discover the correct column delimiter.
The first character in the list of delimiters allowed for detection will be used, if available, otherwise
the delimiter returned by CsvFormat.getDelimiter()
will be used.
separatorDetectionEnabled
- the flag to enable/disable discovery of the column delimiter character.
to true
, in order of priority.public final void setDelimiterDetectionEnabled(boolean separatorDetectionEnabled, char... delimitersForDetection)
Note that the detection process is not guaranteed to discover the correct column delimiter.
The first character in the list of delimiters allowed for detection will be used, if available, otherwise
the delimiter returned by CsvFormat.getDelimiter()
will be used.
separatorDetectionEnabled
- the flag to enable/disable discovery of the column delimiter character.delimitersForDetection
- possible delimiters for detection when isDelimiterDetectionEnabled()
evaluates
to true
, in order of priority.public final boolean isQuoteDetectionEnabled()
Note that the detection process is not guaranteed to discover the correct quote & escape.
In this case the characters provided by CsvFormat.getQuote()
and CsvFormat.getQuoteEscape()
will be used
public final void setQuoteDetectionEnabled(boolean quoteDetectionEnabled)
Note that the detection process is not guaranteed to discover the correct quote & escape.
In this case the characters provided by CsvFormat.getQuote()
and CsvFormat.getQuoteEscape()
will be used
quoteDetectionEnabled
- the flag to enable/disable discovery of the quote character. The quote escape will also be detected as part of this process.public final void detectFormatAutomatically()
public final void detectFormatAutomatically(char... delimitersForDetection)
delimitersForDetection
- possible delimiters for detection, in order of priority.public boolean isNormalizeLineEndingsWithinQuotes()
Format.getLineSeparator()
by the normalized line separator character specified in Format.getNormalizedNewline()
, even on quoted values.
This is enabled by default and is used to ensure data be read on any platform without introducing unwanted blank lines.
For example, consider the quoted value "Line1 \r\n Line2"
. If this is parsed using "\r\n"
as
the line separator sequence, and the normalized new line is set to '\n'
(the default), the output will be:
[Line1 \n Line2]
However, if the value is meant to be kept untouched, and the original line separator should be maintained, set
the normalizeLineEndingsWithinQuotes
to false
. This will make the parser read the value as-is, producing:
[Line1 \r\n Line2]
true
if line separators in quoted values will be normalized, false
otherwisepublic void setNormalizeLineEndingsWithinQuotes(boolean normalizeLineEndingsWithinQuotes)
Format.getLineSeparator()
by the normalized line separator character specified in Format.getNormalizedNewline()
, even on quoted values.
This is enabled by default and is used to ensure data be read on any platform without introducing unwanted blank lines.
For example, consider the quoted value "Line1 \r\n Line2"
. If this is parsed using "\r\n"
as
the line separator sequence, and the normalized new line is set to '\n'
(the default), the output will be:
[Line1 \n Line2]
However, if the value is meant to be kept untouched, and the original line separator should be maintained, set
the normalizeLineEndingsWithinQuotes
to false
. This will make the parser read the value as-is, producing:
[Line1 \r\n Line2]
normalizeLineEndingsWithinQuotes
- flag indicating whether line separators in quoted values should be replaced by
the the character specified in Format.getNormalizedNewline()
.public void setUnescapedQuoteHandling(UnescapedQuoteHandling unescapedQuoteHandling)
null
, for backward compatibility with isParseUnescapedQuotes()
and isParseUnescapedQuotesUntilDelimiter()
.
If set to a non-null value, this setting will override the configuration of isParseUnescapedQuotes()
and isParseUnescapedQuotesUntilDelimiter()
.unescapedQuoteHandling
- the handling method to be used when unescaped quotes are found in the input.public UnescapedQuoteHandling getUnescapedQuoteHandling()
null
, for backward compatibility with isParseUnescapedQuotes()
and isParseUnescapedQuotesUntilDelimiter()
If set to a non-null value, this setting will override the configuration of isParseUnescapedQuotes()
and isParseUnescapedQuotesUntilDelimiter()
.null
if not set.public boolean getKeepQuotes()
Defaults to false
public void setKeepQuotes(boolean keepQuotes)
Defaults to false
keepQuotes
- flag indicating whether enclosing quotes should be maintained when parsing quoted values.protected void addConfiguration(Map<String,Object> out)
addConfiguration
in class CommonParserSettings<CsvFormat>
public final CsvParserSettings clone()
CommonSettings
CommonSettings.clone(boolean)
method to reset properties that are
specific to a given input, such as header names and selection of fields.clone
in class CommonParserSettings<CsvFormat>
public final CsvParserSettings clone(boolean clearInputSpecificSettings)
CommonSettings
clearInputSpecificSettings
flag is set to true
clone
in class CommonParserSettings<CsvFormat>
clearInputSpecificSettings
- flag indicating whether to clear settings that are likely to be associated with a given input.public final char[] getDelimitersForDetection()
isDelimiterDetectionEnabled()
evaluates
to true
, in order of priority.public boolean getIgnoreTrailingWhitespacesInQuotes()
keepQuotes
evaluates to true
, values won't be trimmed.public void setIgnoreTrailingWhitespacesInQuotes(boolean ignoreTrailingWhitespacesInQuotes)
keepQuotes
evaluates to true
, values won't be trimmed.ignoreTrailingWhitespacesInQuotes
- whether trailing whitespaces from quoted values should be skippedpublic boolean getIgnoreLeadingWhitespacesInQuotes()
keepQuotes
evaluates to true
, values won't be trimmed.public void setIgnoreLeadingWhitespacesInQuotes(boolean ignoreLeadingWhitespacesInQuotes)
keepQuotes
evaluates to true
, values won't be trimmed.ignoreLeadingWhitespacesInQuotes
- whether leading whitespaces from quoted values should be skippedpublic final void trimQuotedValues(boolean trim)
setIgnoreLeadingWhitespacesInQuotes(boolean)
and setIgnoreTrailingWhitespacesInQuotes(boolean)
Note: if keepQuotes
evaluates to true
, values won't be trimmed.trim
- a flag indicating whether whitespaces around values extracted from a quoted field should be removedpublic int getFormatDetectorRowSampleCount()
public void setFormatDetectorRowSampleCount(int formatDetectorRowSampleCount)
formatDetectorRowSampleCount
- the number of sample rows used in the CSV format auto-detection processCopyright © 2024 Univocity Software Pty Ltd. All rights reserved.