F
- the format supported by this settings class.public abstract class CommonSettings<F extends Format> extends Object implements Cloneable
AbstractParser
) and writers (AbstractWriter
)
By default, all parsers and writers work with, at least, the following configuration options:
when reading, if the parser does not read any character from the input, the nullValue is used instead of an empty string
when writing, if the writer has a null object to write to the output, the nullValue is used instead of an empty string
You need this to avoid OutOfMemoryErrors in case a file does not have a valid format. In such cases the parser might just keep reading from the input until its end or the memory is exhausted. This sets a limit which avoids unwanted JVM crashes.
when reading, if the parser reads a line that is empty, it will be skipped.
when writing, if the writer receives an empty or null row to write to the output, it will be ignored
when reading, the given header names will be used to refer to each column irrespective of whether or not the input contains a header row
when writing, the given header names will be used to refer to each column and can be used for writing the header row
when reading, the selected fields only will be parsed and the remaining fields will be discarded.
when writing, the selected fields only will be written and the remaining fields will be discarded
CommonParserSettings
,
CommonWriterSettings
,
CsvParserSettings
,
CsvWriterSettings
,
FixedWidthParserSettings
,
FixedWidthWriterSettings
Constructor and Description |
---|
CommonSettings()
Creates a new instance of this settings object using the default format specified by the concrete class that inherits from
CommonSettings |
Modifier and Type | Method and Description |
---|---|
protected void |
addConfiguration(Map<String,Object> out) |
protected void |
clearInputSpecificSettings()
Clears settings that are likely to be specific to a given input.
|
protected CommonSettings |
clone()
Clones this configuration object.
|
protected CommonSettings |
clone(boolean clearInputSpecificSettings)
Clones this configuration object to reuse user-provided settings.
|
protected abstract F |
createDefaultFormat()
Extending classes must implement this method to return the default format settings for their parser/writer
|
FieldSet<Enum> |
excludeFields(Enum... columns)
Selects columns which will not be read/written, by their names
|
FieldSet<String> |
excludeFields(String... fieldNames)
Selects fields which will not be read/written, by their names
|
FieldSet<Integer> |
excludeIndexes(Integer... fieldIndexes)
Selects columns which will not be read/written, by their positions
|
int |
getErrorContentLength()
Configures the parser/writer to limit the length of displayed contents being parsed/written in the exception message when an error occurs
|
F |
getFormat()
The format of the file to be parsed/written (returns the format's defaults).
|
String[] |
getHeaders()
Returns the field names in the input/output, in the sequence they occur (defaults to null).
|
boolean |
getIgnoreLeadingWhitespaces()
Returns whether or not leading whitespaces from values being read/written should be skipped (defaults to true)
|
boolean |
getIgnoreTrailingWhitespaces()
Returns whether or not trailing whitespaces from values being read/written should be skipped (defaults to true)
|
int |
getMaxCharsPerColumn()
The maximum number of characters allowed for any given value being written/read.
|
int |
getMaxColumns()
Returns the hard limit of how many columns a record can have (defaults to 512).
|
String |
getNullValue()
Returns the String representation of a null value (defaults to null)
|
<T extends Context> |
getProcessorErrorHandler()
Returns the custom error handler to be used to capture and handle errors that might happen while processing records with a
Processor
or a RowWriterProcessor (i.e. |
RowProcessorErrorHandler |
getRowProcessorErrorHandler()
Deprecated.
Use the
getProcessorErrorHandler() method as it allows format-specific error handlers to be built to work with different implementations of Context .
Implementations based on RowProcessorErrorHandler allow only parsers who provide a ParsingContext to be used. |
boolean |
getSkipBitsAsWhitespace()
Returns a flag indicating whether the parser/writer should skip bit values as whitespace.
|
boolean |
getSkipEmptyLines()
Returns whether or not empty lines should be ignored (defaults to true)
|
protected int |
getWhitespaceRangeStart()
Returns the starting decimal range for
characters <= ' ' that should be skipped as whitespace, as
determined by getSkipBitsAsWhitespace() |
boolean |
isAutoConfigurationEnabled()
Indicates whether this settings object can automatically derive configuration options.
|
boolean |
isProcessorErrorHandlerDefined()
Returns a flag indicating whether or not a
ProcessorErrorHandler has been defined through the use of method setProcessorErrorHandler(ProcessorErrorHandler) |
FieldSet<Enum> |
selectFields(Enum... columns)
Selects a sequence of fields for reading/writing by their names
|
FieldSet<String> |
selectFields(String... fieldNames)
Selects a sequence of fields for reading/writing by their names.
|
FieldSet<Integer> |
selectIndexes(Integer... fieldIndexes)
Selects a sequence of fields for reading/writing by their positions.
|
void |
setAutoConfigurationEnabled(boolean autoConfigurationEnabled)
Indicates whether this settings object can automatically derive configuration options.
|
void |
setErrorContentLength(int errorContentLength)
Configures the parser/writer to limit the length of displayed contents being parsed/written in the exception message when an error occurs.
|
void |
setFormat(F format)
Defines the format of the file to be parsed/written (returns the format's defaults).
|
void |
setHeaders(String... headers)
Defines the field names in the input/output, in the sequence they occur (defaults to null).
|
void |
setIgnoreLeadingWhitespaces(boolean ignoreLeadingWhitespaces)
Defines whether or not leading whitespaces from values being read/written should be skipped (defaults to true)
|
void |
setIgnoreTrailingWhitespaces(boolean ignoreTrailingWhitespaces)
Defines whether or not trailing whitespaces from values being read/written should be skipped (defaults to true)
|
void |
setMaxCharsPerColumn(int maxCharsPerColumn)
Defines the maximum number of characters allowed for any given value being written/read.
|
void |
setMaxColumns(int maxColumns)
Defines a hard limit of how many columns a record can have (defaults to 512).
|
void |
setNullValue(String emptyValue)
Sets the String representation of a null value (defaults to null)
|
void |
setProcessorErrorHandler(ProcessorErrorHandler<? extends Context> processorErrorHandler)
Defines a custom error handler to capture and handle errors that might happen while processing records with a
Processor
or a RowWriterProcessor (i.e. |
void |
setRowProcessorErrorHandler(RowProcessorErrorHandler rowProcessorErrorHandler)
Deprecated.
Use the
setProcessorErrorHandler(ProcessorErrorHandler) method as it allows format-specific error handlers to be built to work with different implementations of Context .
Implementations based on RowProcessorErrorHandler allow only parsers who provide a ParsingContext to be used. |
void |
setSkipBitsAsWhitespace(boolean skipBitsAsWhitespace)
Configures the parser to skip bit values as whitespace.
|
void |
setSkipEmptyLines(boolean skipEmptyLines)
Defines whether or not empty lines should be ignored (defaults to true)
|
String |
toString() |
void |
trimValues(boolean trim)
Configures the parser/writer to trim or keep leading and trailing whitespaces around values
This has the same effect as invoking both
setIgnoreLeadingWhitespaces(boolean) and setIgnoreTrailingWhitespaces(boolean)
with the same value. |
public CommonSettings()
CommonSettings
public String getNullValue()
When reading, if the parser does not read any character from the input, the nullValue is used instead of an empty string
When writing, if the writer has a null object to write to the output, the nullValue is used instead of an empty string
public void setNullValue(String emptyValue)
When reading, if the parser does not read any character from the input, the nullValue is used instead of an empty string
When writing, if the writer has a null object to write to the output, the nullValue is used instead of an empty string
emptyValue
- the String representation of a null valuepublic int getMaxCharsPerColumn()
If set to -1
, then the internal internal array will expand automatically, up to the limit allowed by the JVM
public void setMaxCharsPerColumn(int maxCharsPerColumn)
To enable auto-expansion of the internal array, set this property to -1
maxCharsPerColumn
- The maximum number of characters allowed for any given value being written/readpublic boolean getSkipEmptyLines()
when reading, if the parser reads a line that is empty, it will be skipped.
when writing, if the writer receives an empty or null row to write to the output, it will be ignored
public void setSkipEmptyLines(boolean skipEmptyLines)
when reading, if the parser reads a line that is empty, it will be skipped.
when writing, if the writer receives an empty or null row to write to the output, it will be ignored
skipEmptyLines
- true if empty lines should be ignored, false otherwisepublic boolean getIgnoreTrailingWhitespaces()
public void setIgnoreTrailingWhitespaces(boolean ignoreTrailingWhitespaces)
ignoreTrailingWhitespaces
- true if trailing whitespaces from values being read/written should be skipped, false otherwisepublic boolean getIgnoreLeadingWhitespaces()
public void setIgnoreLeadingWhitespaces(boolean ignoreLeadingWhitespaces)
ignoreLeadingWhitespaces
- true if leading whitespaces from values being read/written should be skipped, false otherwisepublic void setHeaders(String... headers)
when reading, the given header names will be used to refer to each column irrespective of whether or not the input contains a header row
when writing, the given header names will be used to refer to each column and can be used for writing the header row
headers
- the field name sequence associated with each column in the input/output.public String[] getHeaders()
when reading, the given header names will be used to refer to each column irrespective of whether or not the input contains a header row
when writing, the given header names will be used to refer to each column and can be used for writing the header row
public int getMaxColumns()
public void setMaxColumns(int maxColumns)
maxColumns
- The maximum number of columns a record can have.public F getFormat()
public void setFormat(F format)
format
- The format of the file to be parsed/writtenpublic FieldSet<String> selectFields(String... fieldNames)
When reading, only the values of the selected columns will be parsed, and the content of the other columns ignored.
The resulting rows will be returned with the selected columns only, in the order specified. If you want to
obtain the original row format, with all columns included and nulls in the fields that have not been selected,
set CommonParserSettings.setColumnReorderingEnabled(boolean)
with false
.
When writing, the sequence provided represents the expected format of the input rows. For example, headers can be "H1,H2,H3", but the input data is coming with values for two columns and in a different order, such as "V_H3, V_H1". Selecting fields "H3" and "H1" will allow the writer to write values in the expected locations. Using the given example, the output row will be generated as: "V_H1,null,V_H3"
fieldNames
- The field names to read/writepublic FieldSet<String> excludeFields(String... fieldNames)
When reading, only the values of the selected columns will be parsed, and the content of the other columns ignored.
The resulting rows will be returned with the selected columns only, in the order specified. If you want to
obtain the original row format, with all columns included and nulls in the fields that have not been selected,
set CommonParserSettings.setColumnReorderingEnabled(boolean)
with false
.
When writing, the sequence of non-excluded fields represents the expected format of the input rows. For example, headers can be "H1,H2,H3", but the input data is coming with values for two columns and in a different order, such as "V_H3, V_H1". Selecting fields "H3" and "H1" will allow the writer to write values in the expected locations. Using the given example, the output row will be generated as: "V_H1,null,V_H3"
fieldNames
- The field names to exclude from the parsing/writing processpublic FieldSet<Integer> selectIndexes(Integer... fieldIndexes)
When reading, only the values of the selected columns will be parsed, and the content of the other columns ignored.
The resulting rows will be returned with the selected columns only, in the order specified. If you want to
obtain the original row format, with all columns included and nulls in the fields that have not been selected,
set CommonParserSettings.setColumnReorderingEnabled(boolean)
with false
.
When writing, the sequence provided represents the expected format of the input rows. For example, headers can be "H1,H2,H3", but the input data is coming with values for two columns and in a different order, such as "V_H3, V_H1". Selecting indexes "2" and "0" will allow the writer to write values in the expected locations. Using the given example, the output row will be generated as: "V_H1,null,V_H3"
fieldIndexes
- The indexes to read/writepublic FieldSet<Integer> excludeIndexes(Integer... fieldIndexes)
When reading, only the values of the selected columns will be parsed, and the content of the other columns ignored.
The resulting rows will be returned with the selected columns only, in the order specified. If you want to
obtain the original row format, with all columns included and nulls in the fields that have not been selected,
set CommonParserSettings.setColumnReorderingEnabled(boolean)
with false
.
When writing, the sequence of non-excluded fields represents the expected format of the input rows. For example, headers can be "H1,H2,H3", but the input data is coming with values for two columns and in a different order, such as "V_H3, V_H1". Selecting fields by index, such as "2" and "0" will allow the writer to write values in the expected locations. Using the given example, the output row will be generated as: "V_H1,null,V_H3"
fieldIndexes
- indexes of columns to exclude from the parsing/writing processpublic FieldSet<Enum> selectFields(Enum... columns)
When reading, only the values of the selected columns will be parsed, and the content of the other columns ignored.
The resulting rows will be returned with the selected columns only, in the order specified. If you want to
obtain the original row format, with all columns included and nulls in the fields that have not been selected,
set CommonParserSettings.setColumnReorderingEnabled(boolean)
with false
.
When writing, the sequence provided represents the expected format of the input rows. For example, headers can be "H1,H2,H3", but the input data is coming with values for two columns and in a different order, such as "V_H3, V_H1". Selecting fields "H3" and "H1" will allow the writer to write values in the expected locations. Using the given example, the output row will be generated as: "V_H1,null,V_H3"
columns
- The columns to read/writepublic FieldSet<Enum> excludeFields(Enum... columns)
When reading, only the values of the selected columns will be parsed, and the content of the other columns ignored.
The resulting rows will be returned with the selected columns only, in the order specified. If you want to
obtain the original row format, with all columns included and nulls in the fields that have not been selected,
set CommonParserSettings.setColumnReorderingEnabled(boolean)
with false
.
When writing, the sequence of non-excluded fields represents the expected format of the input rows. For example, headers can be "H1,H2,H3", but the input data is coming with values for two columns and in a different order, such as "V_H3, V_H1". Selecting fields "H3" and "H1" will allow the writer to write values in the expected locations. Using the given example, the output row will be generated as: "V_H1,null,V_H3"
columns
- The columns to exclude from the parsing/writing processpublic final boolean isAutoConfigurationEnabled()
BeanWriterProcessor
where the bean class contains a Headers
annotation, or to enable header extraction when the bean class of a
BeanProcessor
has attributes mapping to header names.
Defaults to true
true
if the automatic configuration feature is enabled, false otherwisepublic final void setAutoConfigurationEnabled(boolean autoConfigurationEnabled)
BeanWriterProcessor
where the bean class contains a Headers
annotation, or to enable header extraction when the bean class of a
BeanProcessor
has attributes mapping to header names.autoConfigurationEnabled
- a flag to turn the automatic configuration feature on/off.@Deprecated public RowProcessorErrorHandler getRowProcessorErrorHandler()
getProcessorErrorHandler()
method as it allows format-specific error handlers to be built to work with different implementations of Context
.
Implementations based on RowProcessorErrorHandler
allow only parsers who provide a ParsingContext
to be used.RowProcessor
or a RowWriterProcessor
(i.e. non-fatal DataProcessingException
s).
The parsing/writing process won't stop (unless the error handler rethrows the DataProcessingException
or manually stops the process).
DataProcessingException
.@Deprecated public void setRowProcessorErrorHandler(RowProcessorErrorHandler rowProcessorErrorHandler)
setProcessorErrorHandler(ProcessorErrorHandler)
method as it allows format-specific error handlers to be built to work with different implementations of Context
.
Implementations based on RowProcessorErrorHandler
allow only parsers who provide a ParsingContext
to be used.RowProcessor
or a RowWriterProcessor
(i.e. non-fatal DataProcessingException
s).
The parsing parsing/writing won't stop (unless the error handler rethrows the DataProcessingException
or manually stops the process).
rowProcessorErrorHandler
- the callback error handler with custom code to manage occurrences of DataProcessingException
.public <T extends Context> ProcessorErrorHandler<T> getProcessorErrorHandler()
Processor
or a RowWriterProcessor
(i.e. non-fatal DataProcessingException
s).
The parsing/writing process won't stop (unless the error handler rethrows the DataProcessingException
or manually stops the process).
T
- the Context
type provided by the parser implementation.DataProcessingException
.public void setProcessorErrorHandler(ProcessorErrorHandler<? extends Context> processorErrorHandler)
Processor
or a RowWriterProcessor
(i.e. non-fatal DataProcessingException
s).
The parsing parsing/writing won't stop (unless the error handler rethrows the DataProcessingException
or manually stops the process).
processorErrorHandler
- the callback error handler with custom code to manage occurrences of DataProcessingException
.public boolean isProcessorErrorHandlerDefined()
ProcessorErrorHandler
has been defined through the use of method setProcessorErrorHandler(ProcessorErrorHandler)
true
if the parser/writer is configured to use a ProcessorErrorHandler
protected abstract F createDefaultFormat()
public final void trimValues(boolean trim)
setIgnoreLeadingWhitespaces(boolean)
and setIgnoreTrailingWhitespaces(boolean)
with the same value.trim
- a flag indicating whether the whitespaces should remove whitespaces around values parsed/written.public int getErrorContentLength()
If set to 0
, then no exceptions will include the content being manipulated in their attributes,
and the "<omitted>"
string will appear in error messages as the parsed/written content.
defaults to -1
(no limit)
public void setErrorContentLength(int errorContentLength)
If set to 0
, then no exceptions will include the content being manipulated in their attributes,
and the "<omitted>"
string will appear in error messages as the parsed/written content.
defaults to -1
(no limit)
errorContentLength
- maximum length of contents displayed in exception messages in case of errors while parsing/writing.public final boolean getSkipBitsAsWhitespace()
character <= ' '
evaluates to
true
. This includes bit values, i.e. 0
(the \0 character) and 1
which might
be produced by database dumps. Disabling this flag will prevent the parser/writer from discarding these characters
when getIgnoreLeadingWhitespaces()
or getIgnoreTrailingWhitespaces()
evaluate to true
.
defaults to true
public final void setSkipBitsAsWhitespace(boolean skipBitsAsWhitespace)
character <= ' '
evaluates to
true
. This includes bit values, i.e. 0
(the \0 character) and 1
which might
be produced by database dumps. Disabling this flag will prevent the parser/writer from discarding these characters
when getIgnoreLeadingWhitespaces()
or getIgnoreTrailingWhitespaces()
evaluate to true
.
defaults to true
skipBitsAsWhitespace
- a flag indicating whether bit values (0 or 1) should be considered whitespace.protected final int getWhitespaceRangeStart()
characters <= ' '
that should be skipped as whitespace, as
determined by getSkipBitsAsWhitespace()
protected CommonSettings clone(boolean clearInputSpecificSettings)
clearInputSpecificSettings
flag is set to true
clearInputSpecificSettings
- flag indicating whether to clear settings that are likely to be associated with a given input.protected CommonSettings clone()
clone(boolean)
method to reset properties that are
specific to a given input, such as header names and selection of fields.protected void clearInputSpecificSettings()
Copyright © 2022 Univocity Software Pty Ltd. All rights reserved.