What should I do when it is difficult to implement code with Java Stream?

·

5 min read

Stream was first introduced with Java 8. The class library offers richer and smoother Lambda syntax, facilitating the process of achieving set-oriented operations. But Stream’s professionalism is far from enough, and there are still many shortcomings in structured computing.

When members of a set are of simple data types (integer, floating-point, string or date), Stream is convenient to use for implementing set-oriented operations. Below is an example of integer array filtering:

IntStream iStream=IntStream.of(1,3,5,2,3,6); 
IntStream r1=iStream.filter(m->m>2);
Stream r2=iStream.boxed().sorted();
int r3=iStream.sum();

However, the data object of structured computing is not a simple data type but a record (Map\ entity\ record). Once the data object becomes a record, Stream is not so convenient. For example, group by year and Client:

Calendar cal=Calendar.getInstance()
Map<Object, DoubleSummaryStatistics>  c=Orders.collect(Collectors.groupingBy(
 r->{
 cal.setTime(r.OrderDate);
 return  cal.get(Calendar.YEAR)+"_"+r.SellerId;
 },
  Collectors.summarizingDouble(r->{
 return r.Amount;
 })
 )
 );
for(Object sellerid:c.keySet()){
  DoubleSummaryStatistics r  =c.get(sellerid);
  String  year_sellerid[]=((String)sellerid).split("_");
  System.out.println("group  is (year):"+year_sellerid[0]+"\t (sellerid):"+year_sellerid[1]+"\t  sum is:"+r.getSum());
 }

;

Stream does not provide a direct support for join operations. To perform the inner join on Orders table and Employee table, and group the result by Employee.Dept and sum Orders.Amount, for instance:

Map<Integer, Employee> EIds =  Employees.collect(Collectors.toMap(Employee::EId, Function.identity()))
//Create new OrderRelationclass, where SellerIdcontains single valuesthat point to corresponding Employeeobjects
record OrderRelation(int OrderID, String Client, Employee SellerId, double  Amount, Date OrderDate){}
Stream<OrderRelation> ORS=Orders.map(r -> {
  Employee e=EIds.get(r.SellerId);
  OrderRelation or=new  OrderRelation(r.OrderID,r.Client,e,r.Amount,r.OrderDate);
  return or;
 }).filter(e->e.SellerId!=null);


Map<String, DoubleSummaryStatistics> c=ORS.collect(Collectors.groupingBy(r->r.SellerId.Dept,Collectors.summarizingDouble(r->r.Amount)));
for(String dept:c.keySet()){
  DoubleSummaryStatistics r  =c.get(dept);
  System.out.println("group(dept):"+dept+" sum(Amount):"+r.getSum());
 };

The hardcoded inner join is long and complex. Left joins and outer joins need hardcoding too but with different logic and more complexity, which is a challenge even to Java programmers.

Before Stream, it is knotty for Java to achieve set-oriented operations. Its introduction to Java adds special structured data computation support, including basic set-oriented operations and Lambda-syntax-friendly design, to the high-level language. Yet, Stream still needs to use Java-based data types to do operations due to the lack of professional structured data objects. The improvements are only superficial.

In fact, no class libraries that implement computations within Java are truly professional due to the lack of solid low-level support. The fundamental reason is that Java lacks professional structured data objects. A structured computation returns a result set whose structure varies according to computing processes and generates intermediate results of dynamic structures. It is almost impossible to pre-define the structures. Yet, Java is a strongly typed language, which requires that structures of data objects be pre-defined (otherwise only difficult to manipulate types of data objects, say map, can be used), resulting in rigid and inflexible coding process and greatly restricting the ability of Lambda syntax. If it is an interpreted language, it can simplify the definition of a parameter by specifying within a function that a parameter expression will be parsed as a value or a function. Java is a compiled language that cannot distinguish different types of parameters. It can only implement an anonymous function (using Lambda syntax) by designing a complicated and difficult-to-understand interface. This is difficult even for SQL programmers. Though the structured data computations can be considerably simplified by skipping the data object to directly reference fields (like the form of “UnitPriceQuantity”), Java cannot support the cleverly simple syntax due to the lack of professional structured data objects. It thus produces lengthy and non-intuitive code (like “x.UnitPricex.Quantity”).

Stream is far from professional computing structured data because of the lack of special structured data objects, while SQL is professional enough but relies heavily on databases. Both have their advantages and disadvantages. Sometimes we need both SQL professional structured computing grammar and out-of-database computing capabilities like Stream. What should we do in this situation?

Solution: esProc – the professional computational package for Java

esProc is a class library dedicated to Java-based calculations and aims to simplify Java code. SPL is a scripting language based on the esProc computing package. It can be deployed together with Java programs and understood as a stored procedure outside the database. Its usage is the same as calling a stored procedure in a Java program. It is passed to the Java program for execution through the JDBC interface, realizing step-by-step structured computing, return the ResultSet object.

Data set filtering, sorting, grouping and summarizing, joining, etc., are very simple to read with SPL. For example: find classes with an average English score of less than 70.

=T("E:/txt/Students_scores.txt")
=A1.groups(CLASS;avg(English):avg_En)    
=A2.select(avg_En<70)

This block of code can be debugged or executed in esProc IDE, or stored as a script file (like condition.dfx) for invocation from a Java program through the JDBC interface. Below is the code for invocation:

package Test
 import java.sql.Connection;
 import java.sql.DriverManager;
 import java.sql.ResultSet;
 import java.sql.Statement;
 public class test1 {
  public static void main(String[]  args)throws Exception {
  Class.forName("com.esproc.jdbc.InternalDriver");
  Connection connection  =DriverManager.getConnection("jdbc:esproc:local://");
  Statement statement =  connection.createStatement();
  ResultSet result =  statement.executeQuery("call condition.dfx");
  printResult(result);
  if(connection != null)  connection.close();
  }


…


};

This is similar to calling a stored procedure. SPL also supports the SQL-like way of embedding the code directly into a Java program without the need of storing it as a script file. Below is the code for embedding:

ResultSet result = statement.executeQuery("
=file(\"D:\\sOrder.csv\").groups(CLASS;avg(English):avg_En).select(avg_En<70)");
…

For details on integration with Java programs, please refer to How to Call an SPL Script in Java

Use SQL to implement associated calculations:

For example, the sales order and product are stored in two text files respectively and calculate the sales amount of each order. The data structure of the two files is as follows:

=T("e:/orders/sales.csv")
=T("e:/orders/product.csv").keys(ID)    
=A1.join(ProductID,A2,Name,Price)    
=A3.derive(Quantity*Price:amount)

SPL provides a complete method of querying data with SQL:

For example, State, Department, and Employee are stored in 3 text files respectively, query employees in New York state whose manager is in California.

$select  e.NAME as ENAMEfrom  E:/txt/EMPLOYEE.txt as e  join E:/txt/DEPARTMENT.txt as d on  e.DEPT=d.NAME join E:/txt/EMPLOYEE.txt as emp on d.MANAGER=emp.EID where  e.STATE='New York' and emp.STATE='California'

Using SPL can greatly simplify the calculation of structured data in Java programs. Examples are summarized as follows:

Loop operations

Accessing members of data set by sequence numbers

Locate operations on ordered sets

Alignment operations between ordered sets

TopN operations

Existence checking

Membership test

Unconventional aggregation

Alignment grouping

Select operation

More calculation examples: Use SPL in applications