Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support different compositions with particles #5963

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

tjhei
Copy link
Member

@tjhei tjhei commented Jul 10, 2024

This is a first version of particle support with different kinds of compositional fields.

@tjhei tjhei force-pushed the composition-particles branch 2 times, most recently from 5df5cff to 686a968 Compare July 11, 2024 23:42
@tjhei
Copy link
Member Author

tjhei commented Jul 12, 2024

@gassmoeller This seems to be working now if you want to take a first look. I still need to add some tests and verify if everything works correctly.

I left two code paths in here (see line 881 "if (false)") to switch between grouping FEs that are the same (like you do on main right now) vs not doing it. Do you have a suggestion how I can test the performance difference between the two? My current test sees no measurable difference, but I am not sure I am setting this up in a sensible way.

@gassmoeller
Copy link
Member

Do you have a suggestion how I can test the performance difference between the two? My current test sees no measurable difference, but I am not sure I am setting this up in a sensible way.

You need a test that has plenty of particles per cell and many particle properties that require dynamic updates during the run (the particle advection only uses the velocity evaluator, so you wont see a difference in the Particles: advect timing, only Particles: update properties). Maybe benchmark/integrated_strain/(pure_shear and or simple_shear) would be a good testcase. You would have to increase the number of particles (it currently only creates 1), but they have a particle property that requires updates, but is relatively cheap to compute, so the solution evaluation should stand out as one of the major costs of the particle update process.

Copy link
Member

@gassmoeller gassmoeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool, I am looking forward to using this. I left some comments, mostly I forgot if we group composition base elements of the same type that are not consecutive. If we do I think your implementation here likely doesnt work (it evaluates n consecutive components as far as I can see, not n components). If we do not it should be fine.

@@ -627,7 +776,7 @@ namespace aspect
// we create this derived class with an additional template. This makes it possible
// to access the functionality through the base class, but create an object of this
// derived class with the correct number of components at runtime.
template <int dim, int n_compositional_fields>
template <int dim>
class SolutionEvaluatorsImplementation: public SolutionEvaluators<dim>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do no longer need this template can we remove the distinction SolutionEvaluators/SolutionEvaluatorsImplementation again? That would simplify the code somewhat.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this could go. Do you want to review this in a single large PR or separately?

DynamicFEPointEvaluationImpl(NonMatching::MappingInfo<dim> &mapping,
const FiniteElement<dim> &fe,
const unsigned int first_selected_component)
: DynamicFEPointEvaluation<dim>(first_selected_component, n_components),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you do if the n_components are not consecutive? FEPointEvaluation only assumes consecutive components, but as far as I remember we now allow merging the base elements of non-consecutive components?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only works for consecutive items (and this is what introspection returns). The only place where we do magic with non-consecutive items is with matrix block locations.

source/particle/world.cc Outdated Show resolved Hide resolved
Comment on lines 650 to 654
template <int dim>
Tensor<1,dim> to_tensor(const Tensor<1,dim> &in)
{
return in;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont understand the necessity for to_tensorx functions. At first I thought you use them to create a non-const copy, but in the place they are used below they are not modified. Can you explain and document?

Edit: Oh, do you need the specialization for dim == 1 because the type is different? That would be something to document.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. This is documented now.

result[c] = x[c];
return result;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check empty lines between functions. also maybe some documentation?

source/particle/world.cc Outdated Show resolved Hide resolved
j);
for (const auto &eval : compositions)
{
const std::vector<double> values = eval->get_value(evaluation_point);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid the copy and maybe use a type that avoids the memory allocation altogether (e.g. boost::container::small_vector)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not allocate (return value optimization, as we return a copy from a locally constructed temporary in that function). I think a bigger optimization would be to do all quadrature points = particles at once. Is there a reason you evaluate them separately?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(but we obviously allocate the std::vector inside that function)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a bigger optimization would be to do all quadrature points = particles at once. Is there a reason you evaluate them separately?

Why would that be faster? I thought get_value only returns the value that was computed earlier in evaluate (which is called in SolutionEvaluatorsImplementation::reinit)

source/particle/world.cc Outdated Show resolved Hide resolved
}
else
{
// We consider each group of consecutive compositions of the same time together. This is because it is more efficient
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// We consider each group of consecutive compositions of the same time together. This is because it is more efficient
// We consider each group of consecutive compositions of the same degree together. This is because it is more efficient

Copy link
Member

@gassmoeller gassmoeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this could go. Do you want to review this in a single large PR or separately?

Let's do it separately.

I think this is almost good to go, but can you add a test?

@tjhei
Copy link
Member Author

tjhei commented Jul 15, 2024

I think this is almost good to go

Close, but I still need to address a few points.

I just did a quick benchmark (4 fields, 1M particles) with the following conclusions:

New batched:    117.7 B cycles
New unbatched:  119.1 B cycles
current main:   115.8 B cycles
(experimental:  115.7 B cycles)

where "particle: update properties" is 4.2% of the total runtime. Experimental is me trying to optimize the current version by:

  1. grouping over all quadrature points together
  2. filling the data in-place instead of returning a std::vector

This a bit messy, but I am basically replacing:

evaluators.reinit(..)
for (unsigned int i = 0; i<n_particles_in_cell; ++i)
  evaluators.get_solution(i, solution[i]);

by

evaluators.reinit(..)
evaluators.get_solutions(solution);

and yes, this is doable as reinit() initializes values with all quadrature points at once.

@tjhei
Copy link
Member Author

tjhei commented Jul 15, 2024

In other words, this PR is measurably slower, maybe up to 25%, (either because of the virtual function calls per particle or the temporary std::vectors).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants