This guide will walk you through using query expressions (a.k.a. language integrated queries) on data to filter, sort, and join with different data sets to produce new data.
Ballerina has first-class support for writing SQL-like queries to process data. Language-integrated queries can process any Ballerina iterable.
Set up the prerequisites
To run this guide, you need the following prerequisites:
- Ballerina 2202.0.0 (Swan Lake) or greater
- A text editor
Tip: Preferably, Visual Studio Code with the Ballerina extension installed.
- A command terminal
Create the Ballerina package
Ballerina uses packages to group code. You need to create a Ballerina package and write the business logic in it. In the terminal, execute the command below to create the Ballerina package for the implementation.
Info: For more information on Ballerina packages, see Organizing Ballerina code.
You view the output below.
This creates a directory named query_expressions
with the default module along with a sample code for the service as shown below.
Create the dataset
To keep things simple, an in-memory table is used to store the COVID-19 dataset. To create the dataset, replace the main.bal
file with the code below.
In this code:
- Each record of type
CovidEntry
in the table represents the COVID-19 data related to a particular country. - The
iso_code
is used to uniquely identify a country and other fields are self-explanatory.
Filter the data
Create the filterCountriesByCases
function
To define a function, which will filter out the records, which have values higher than 100,000 for the cases
field, add the code below to the main.bal
file.
In this code:
- The
filterCountriesByCases
function uses a query expression to iterate the records in thedataTable
table and filter only the records, which have more thannoOfCases
cases. Awhere
clause with a condition is used to filter. - All records, which satisfy the condition in the
where
clause will be selected.
Create the main
function
To call the filterCountriesByCases
function from inside the main
function, add the code below to the main.bal
file.
In this code,
- The
filterCountriesByCases
function is called andcovidTable
and10000000
are provided as parameters so that the function will filter the countries, which have more than 10000000 COVID-19 cases. - The next line prints the result of the function.
The complete code of the filtering function
Below is the complete code after adding the filtering function.
Run the package for filtering
In the terminal, navigate to the query_expressions
directory, and execute the command below to run the service package.
Info: The console should have warning logs related to the isolatedness of resources. It is a built-in service concurrency safety feature of Ballerina.
You view the output below.
Sort countries by COVID-19 deaths
Create the findCountriesByHighestNoOfDeaths
function
To define a new function to find the top three countries with the highest number of COVID-19 deaths, add the code below to the main.bal
file.
In this code,
- You use another query to sort and retrieve a limited number of records from the table.
- The
findCountriesByHighestNoOfDeaths
function uses a query expression to find the top three countries with the highest COVID-19 deaths. - The
order by
clause is used to sort the records in the table indescending
order and thelimit
clause to limit the number of output records of the query ton
. - The query produces an array of tuples of type
[string, decimal]
as the result. Each tuple contains the country name and the number of reported deaths. - The produced array is in descending order by the number of deaths.
Update the main
function for sorting
To call the findCountriesByHighestNoOfDeaths
function from within the main
function to find the top three countries by the number of deaths, add the code below to the main
function of the main.bal
file.
The complete code with the sorting function
Below is the complete code after adding the sorting function.
Run the package for sorting
In the terminal, navigate to the query_expressions
directory, and execute the command below to run the service package.
You view the output below.
Join with another data source
Create the findRecoveredPatientsOfCountries
function
Using query expressions, you can join two collections and produce a new collection. The join
operation is similar to the SQL join
operation. To join the covidTable
with a string array, which contains three countries, add the code below to the main.bal
file.
In this code:
- The
findRecoveredPatientsOfCountries
function uses a query expression to join thedataTable
table with an array of strings namedcountries
. The join condition is provided after theon
keyword. - For every record in the
dataTable
, all the elements in thecountries
array will be joined. - The output array of tuples will have the country and the number of recovered patients only if the condition after the
on
keyword is satisfied for that particular pair of table records and element of the array being joined.
Update the main
function for joining
To call the findRecoveredPatientsOfCountries
function at the end to get the number of recovered patients, add the code below to the main
function of the main.bal
file.
Info: You will get the number of recovered patients in the USA, India, and Afghanistan.
The complete code with the joining function
Below is the complete code after adding the joining function.
Run the package for joining
In the terminal, navigate to the query_expressions
directory, and execute the command below to run the service package.
You view the output below.
Find discrepancies in the dataset
Create the printErroneousData
function
This example shows how you can use the let
clause to maintain an intermediate state while iterating a collection using query expression and how to use that intermediate state for further processing. For example, in this dataset, the total number of reported cases should be equal to the sum of the number of deaths, recovered, and active. If they are not equal, an error should have occurred while the dataset is populated.
To define a function called printErroneousData
to find any erroneous records in the dataset, add the code below to the main.bal
file.
In this code:
- If there is any record in which the number of reported
cases
is not equal to the sum ofrecovered
,active
, anddeaths
, this function will print it. - Here, you use the
sum
variable to hold the results of intermediate calculations in the query expression.
Note: Even though this particular example uses a separate variable to demonstrate the usage of the
let
clause, you can do the calculations inline from within thewhere
clause also.
Call the printErroneousData
function
To call the findRecoveredPatientsOfCountries
function at the end to get the number of recovered patients, add the code below to the main
function of the main.bal
file.
The complete code
Below is the complete code after adding the finding function.
Run the package
In the terminal, navigate to the query_expressions
directory, and execute the command below to run the service package.
You view the output below.