Skip to content

Performance issue in DefaultFieldSet due to the usage of SimpleDateFormat #1694

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
spring-projects-issues opened this issue Oct 25, 2012 · 8 comments

Comments

@spring-projects-issues
Copy link
Collaborator

Brian Tarbox opened BATCH-1902 and commented

DefaultFieldSet creates a SimpleDateFormat (SDF) object as a member variable. This means it can not be overridden. SDF eventually calls TimeZone which calls getDefaultInAppContext. This is a static, synchronized and very slow method. This results in extremely slow reads.

To fix the problem in my own code means I have to basically make a copy of the entire DefaultFieldSet class. If the SDF were injected then I could change the slow behavior without having to copy the whole class.

I spoke with Gary Gregory (Spring Batch In Action) and he liked the idea of this change.


Affects: 2.1.8

Reference URL: http://stackoverflow.com/questions/12984345/java-7-calendar-getinstance-timezone-gettimezone-got-synchronized-and-slow-any

0 votes, 5 watchers

@spring-projects-issues
Copy link
Collaborator Author

Markus Bernhardt commented

We analyzed a small batch which reads and writes 1 million rows with YourKit and found this initializer costs roughly 6% of the overall execution time of our batch.

Couldn't that at least be replaced with something more performant like:

private final static ThreadLocal<DateFormat> DEFAULT_DATE_FORMAT = new ThreadLocal<DateFormat>() {
    @Override
    protected DateFormat initialValue() {
        DateFormat dateFormat = new SimpleDateFormat(DEFAULT_DATE_PATTERN);
        dateFormat.setLenient(false);
        return dateFormat;
    }
};
private DateFormat dateFormat = DEFAULT_DATE_FORMAT.get();

So only one SimpleDateFormat get initialized per worker thread.
The change also doesn't change the behaviour or interface of the class.


BTW the same problem existst for the NumberFormat instance in that class. His initializer costs another 3% of the overall execution time of our batch. Perhaps you could change that accordingly:

private final static ThreadLocal<NumberFormat> DEFAULT_NUMBER_FORMAT = new ThreadLocal<NumberFormat>() {
    @Override
    protected NumberFormat initialValue() {
        return NumberFormat.getInstance(Locale.US);
    }
};
private NumberFormat numberFormat = DEFAULT_NUMBER_FORMAT.get();

And last but not leats, one of the constructors does a unneccessary double initialization, which costs in our batch another 3% of the overall execution.

/**
 * Create a FieldSet with anonymous tokens. They can only be retrieved by
 * column number.
 * @param tokens the token values
 * @see FieldSet#readString(int)
 */
public DefaultFieldSet(String[] tokens) {
     this.tokens = tokens == null ? null : (String[]) tokens.clone();
           // Remove the following line. The number format field is already initialized.
     // setNumberFormat(NumberFormat.getInstance(Locale.US)); 
}

What do you think about those changes?

@spring-projects-issues
Copy link
Collaborator Author

Brian Tarbox commented

I think those are fine changes and would likely address the problem at hand.
If these changes are deemed to be less intrusive than letting the user of the class inject/override the date stuff then I'd be fine with the changes.

@spring-projects-issues
Copy link
Collaborator Author

Michael Minella commented

A couple thoughts:

  1. I would be interested in, from a numbers perspective, what constitutes an "extremely slow read". Given how widely this class is used in production environments, I'm hesitant to make any drastic changes yet.
  2. We do allow the date formatter to be injected (see http://static.springsource.org/spring-batch/apidocs/org/springframework/batch/item/file/transform/DefaultFieldSet.html#setDateFormat(java.text.DateFormat) ). We currently handle things the way we do because we the class is "newed" via the framework so we don't have any hooks that allow us to verify the dependency has been satisfied (ala InitializingBean#afterPropertiesSet). We could require this via a constructor or add logic to the DefaultFieldSetFactory to prevent it from being instantiated if an alternative is going to be injected.
  3. With regards to the ThreadLocal option, I'm not a fan due to complexity. The FieldSet shouldn't need to be worrying about threads directly like that. I'm also not convinced that there is a large enough performance impact and we can address the issue in a more direct route as mentioned in the previous bullet.
  4. I agree with the extra initialization that can be removed from the various constructors.

@spring-projects-issues
Copy link
Collaborator Author

Brian Tarbox commented

I don't have the exact numbers handy anymore (entered the bug a while ago), but I recall the system was spinning at 99% cpu traceable directly to this.

WRT to solutions...in my own local copy of DefaultFieldSet I went for simple and took advantage of the fact that the SimpleDateFormater in question always uses a hardcoded format of "yyyy-MM-dd", and so did the following for the standard case:

private final static String DEFAULT_DATE_PATTERN = "yyyy-MM-dd";
private static DateFormat dateFormat = new SimpleDateFormat(DEFAULT_DATE_PATTERN);
{
   dateFormat.setLenient(false);
}

and then added a setter for DateFormat in case someone wanted a different format.

This solved the problem for us.

@spring-projects-issues
Copy link
Collaborator Author

Michael Minella commented

I see. With regards to the solution you used, that wouldn't work on the general case since SimpleDateFormat isn't thread safe. For the setter, that is already there per my previous comment.

I definitely think we can update the DefaultFieldSet to not create the formatter by default and have the factory inject it if there is not another to be injected.

@spring-projects-issues
Copy link
Collaborator Author

Markus Bernhardt commented

Hi Michael,

I found this ticket, because we are evaluating at the moment to switch our batches to Spring Batch. I ported a small batch that reads roughly 1.5 million rows (350MB) fixed length data in EBCDIC from the file system, does some not too complicated computations and writes them back. I have implemented one prototype based on the Spring Batch samples, one in plain old java.

So far so good. The problem is, the plain old java solution does the job in under 6 seconds. The Spring Batch based solution took in the first version 28 Minutes. So we fired up YourKit and looked into it. We found lots of database transactions. After a little while we found that we were reading the data in chunks of 1 row. So for every row the application context was stored. By increasing the chunk size to 100,000 rows, this problem was solved and the overall executing time sank to a little over 4 minutes.

Still ways to slow. So we looked again into it with yourkit and found, that 12% of the overall runtime, thats 30 seconds, are spent initializing this two formatters. Only the FormatterLineAggregator.doAggregate is worse performance wise in our code.

Regarding your points:

  1. To be honest I'm a little bit shocked to find such code at evaluating Spring Batch. Creating Formatters is slow in Java since JDK 1.1.
  2. I'm not familiar with the principles of Spring Batch, so you should have a much better opinion on that. I only like to mention, if you really have to initialize this variable insied the DefaultFieldSet, then please do it at some other way.
  3. From a user perspective I would like to have the default behaviour that the Formatters are only created once or if needed once per thread.
  4. Great.
  5. Please do not forget the NumberFormatter in this class. Same problem.

@spring-projects-issues
Copy link
Collaborator Author

Brian Tarbox commented

Michael,
You are right of course about SimpleDateFormat not being thread safe...we did a quick/dirty that worked for us.

I'll mention that we also had to create our own DefaultFieldSetFactory that was called by the file tokenizer. We had to do this to have a place to do the injection of the NumberFormat object (i.e. the SimpleDateFormat).

To Markus's point we also had to create a new BufferedReaderFactory to give the FlatFileItemReader a large buffer with which to read files. We were reading files over the network which the roundtrip time dominated and the default size of a buffered reader (1024 bytes I think....) was way too small.

I'm hoping that the end resolution to this jira issue will address all of these highly-related issues. Thanks.

@spring-projects-issues
Copy link
Collaborator Author

Jimmy Praet commented

The way the framework currently allows you to override the default DateFormat and NumberFormat through DefaultFieldSetFactory.setDateFormat() and DefaultFieldSetFactory.setNumberFormat() is also quite risky. As both of these SimpleDateFormat and NumberFormat classes aren't thread safe, you should pay attention as a user to define these beans with scope="step".

The javadoc of DefaultFieldSetFactory is also incorrect: it states that the NumberFormat defaults to the default locale, and the DateFormat defaults to yyyy/MM/dd. But in fact the NumberFormat defaults to Locale.US and the DateFormat defaults to yyyy-MM-dd.

@fmbenhassine fmbenhassine changed the title DefaultFieldSet creates a new SimpleDateFormat as a class member; newer JVMs cause SDF to get stuck in TimeZone.getDefaultInAppContext which is slow [BATCH-1902] DefaultFieldSet creates a new SimpleDateFormat as a class member; newer JVMs cause SDF to get stuck in TimeZone.getDefaultInAppContext which is slow May 7, 2024
@fmbenhassine fmbenhassine added this to the 5.2.0-M1 milestone May 7, 2024
natedanner pushed a commit to natedanner/spring-projects__spring-batch that referenced this issue May 20, 2024
@fmbenhassine fmbenhassine changed the title DefaultFieldSet creates a new SimpleDateFormat as a class member; newer JVMs cause SDF to get stuck in TimeZone.getDefaultInAppContext which is slow Performance issue in DefaultFieldSet due to the usage of SimpleDateFormat Sep 18, 2024
FBibonne pushed a commit to FBibonne/spring-batch that referenced this issue Feb 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants