Java::Stream Collectors

Learn
4 min readOct 14, 2020

--

Streams paradigm
One of the most important part of the streams paradigm is to see the overall computation as a sequence of transformations of the input data.

Pure functions
These transformations are best done using pure functions where a pure function is one whose output depends only on the input coming into the function and is not dependent on any mutable state, and also it does not update state.

forEach with computational logic is not streams paradigm code
forEach should be used only to report the final result of the computation of the stream of operations typically. So, if you have complex logic happening inside the forEach function, consider moving those upstream, and if not possible, replace streams style with iterative style as most likely the use case is more suited to an iterative style of coding.

“Bad smell in code”
forEach operation that does more than report the final result of the computation.
A lambda that mutates state.

Collector as reduction strategy
Collector functions act as a reducer in that a collectors collects combines many elements into one object.
For e.g., a stream of many elements into one list, or one set, or one map.

Statically import Collectors api for code readability
Statically import Collectors.toList, Collectors.toSet, Collectors.toMap etc so that the stream can be collected with concise and readable code.

words.collect(toList());

Top N using limit
To get top N elements, use limit(N).

words.limit(10).collect(toList());

Map collection
Collecting into a map is way more complicated than collecting into a list or set.
The simplest form of collecting into a map is

words.toMap(keyMapper, valueMapper);

where keyMapper is a function that can return from each element in the stream, the object that needs to be used as key while creating the map.

e.g. If the class for the stream element had a name and you want to use the name as the key in the map, then you would use the method MyClass::getName

Analogously, valueMapper is the function that returns the object that needs to serve as the value while creating the map for each element in the stream.
If the stream element as such needs to be used as the value, then you could say e -> e.

toMap simple form works only if each element maps to a unique key

If multiple elements from the stream can lead to the same key, then toMap simple form would end with an IllegalStateException.

toMap breaking collisions with a merge function
We have a stream of elements as the input. From each element in the stream, we have a keyMapper that would serve to extract the key corresponding to that element, the valueMapper would serve to extract the value corresponding to that element, and the binaryOperator<V> would be used in the event of a collision. So, when the collision happened, what were the entities that collided? The value that we just got from the valueMapper for the current stream element, and the value that was already present in the map against the key that we just computed using the keyMapper. So, now we need to figure what value to put against the existing key. That is where the third argument of the binary operator comes in.

The format would be

toMap(keyMapper, valueMapper, BinaryOperator<V>)

The binary operator defines the operation between the two value objects that would result in the binary operation result. The value got from applying the BinaryOperator would be pushed into the map. The collision is thus resolved.

For e.g, one simple merge function in the case of numbers could be addition/multiplication.

toMap breaking collisions with chosen element associated with the key
e.g. maxBy

albums.toMap(Album::artist, a->a, maxBy(comparing(Album::sales))

toMap breaking collisions with last-write wins policy
That is, in case of a collision, overwrite with the latest value.

words.toMap(keyMapper, w->w, (w1 , w2) -> w2);

toMap with map factory
In addition to the merge function, there would be a forth parameter that would create the map.
This is useful when you need to specify the particular type of map like EnumMap, TreeMap etc.

groupingBy and Classifier function
groupingBy is another collector similar to toMap. While toMap had explicit keyMappers and valueMappers giving the programmer control over what serves as the key, and what serves as the value for each incoming stream element, for groupingBy, the value is the list of stream elements always. The key would be computed using a classifier function, and values would be a list of values that compute to the same value when classifier function is applied on the stream element.

words.collect(groupingBy(word -> alphabetize(word)));

groupingBy with classifier and downstream collector
In the event that we want groupingBy to have something other than list as the value type in the map, the collector that you want to use can be specified in the second argument.

downstream collector with toSet

words.collect(groupingBy(word->word.toUpperCase(), toSet()));

downstream collector with collectionFactory

words.collect(groupingBy(word->word.toLowerCase()), toCollection(LinkedList::new));

COUNT(*) functionality using groupingBy with classifier and counting() as the downstream collector
groupingBy with counting() collector relates closely with the COUNT(*) in the database world.
counting() collector serves to take the count of stream elements that classify to the same result.

Map<String,Integer> wordCount = words.collect(groupingBy(String::toUpperCase, counting()));

groupingBy with classifier, downstream collector and map factory
This variant is applicable when you want to specify the type of map that would get created.

Map<String,Integer> wordCount = words.collect(groupingBy(String::toLowerCase, counting(), TreeMap::new));

joining
joining is a specialized collector that can be used for joining stream of strings. The three argument form has the first argument as the delimiter, the second and third argument as the prefix and suffix.

words.collect(joining(“,”, “(“,”)”));

--

--

Learn
Learn

Written by Learn

On a continuing learning journey..

No responses yet