Mashup Programming

Introduction

Mashup and Use

Mashup or mashing-up is when a website makes requests to multiple services to provide content requested by a user.

It is a relatively new technique that enables a website to leverage the services from other sites and data providers. Mashing-up can be used to customize a service, e.g. simplifying the interaction. It can also be used to enhance a service, e.g. adding additional features to a service.

Request Flow

The typical request flow is that the user makes a request to the host website and the host website responses to the user’s request by making the multiple requests to remote services, and then packages the responses from the remote services into a map, and final passes the map to the view to respond to user’s request.

In this request flow there are two types of requests and responses. There is the request made by the user to the host website and the requests made by the host website to remote services. Likewise there are two types of responses. The responses made by the remote services and the response made by the host website to the user. These requests and responses are distinct and illustrated below.

 

Request flow:  User -------->> Host website ------->> Remote service
Response flow: User <<-------- Host website <<------- Remote service

For a simple request made by the user, e.g. clicking on a link, the request is encoded in the URL and Grails routes the request to a controller and its action. In a more complex case, e.g. clicking on a submit button, Grails packages the form data into the “request” object and sends it the the controller’s action.

http://docs.grails.org/latest/ref/Servlet%20API/request.html

In Grails, or any MVC framework, the request are routed and handled by the controller. The controller may need to access the domain or other services to fulfill the request. After accessing the services the controller responds to the requests by packaging the data into a map and sending the map to the view. To be consistent with this this design pattern, the controller’s action should make the request to the remote services and package the data, i.e. responses from the remote services are packaged into a map for the view.

The request object in Grails is named in reference to the host website, meaning that it is a request made of the website. It is not the object to use for the website to make a request to remote a service. A different object must be used by the controller to make requests to remote services. Groovy calls this object HTTPBuilder.

https://github.com/jgritman/httpbuilder/wiki

In other languages HTTPBulider is called CURL.

Curl is a funny name, but it is really cURL, short for callURL. Groovy has a URL class, but it is not nearly a powerful as curl.

http://docs.groovy-lang.org/latest/html/groovy-jdk/java/net/URL.html

HTTPBuilder

Using HTTPBuilder is a two step process. First you make a HTTPbuilder object using the domain name of the site you wish to make the request to, and then using the HTTPBuilder object you make the request, specifying the method and content-type. The third argument to the request is a closure for handling the different responses, e.g. 200 or 404.

HTTPBuilder can handle two types of request methods, GET and POST. We’ll study the easier of the two methods first.

GET Method

This section references the HTTPBuilder wiki at

https://github.com/jgritman/httpbuilder/wiki/GET-Examples

The code for the first GET example:

/**
 * Grab gets the http-builder jar from the maven site
 *
 * If you get the error
 *
 *       error groovyc cannot @Grab without Ivy
 *
 * then
 *      1. Download the binary for Ivy at
 *            http://ant.apache.org/ivy/
 *      2. Unzip and extract the jar
 *      3. Put it in a nearby directory
 *      4. Add it as a module to the project by
 *             i. File -> Project structure -> Modules -> Dependencies
 *             ii. Add by clicking on the "+" on the right, select JARs
 *             iii. Navigate to where you put the Ivy jar
 *      Reference
 *        https://intellij-support.jetbrains.com/hc/en-us/community/posts/206913575-Installing-Ivy-plugin-
 */
@Grab(group='org.codehaus.groovy.modules.http-builder', module='http-builder', version='0.7')

import groovyx.net.http.HTTPBuilder
import static groovyx.net.http.Method.GET
import static groovyx.net.http.ContentType.TEXT

def http = new HTTPBuilder("http://example.org")

/* This works */
http.request(GET, TEXT ){ req ->
    response.success = { resp, reader ->
        println "success"
        println "My response handler got response: ${resp.statusLine}"
        println "Response length: ${resp.headers.'Content-Length'}"
        System.out << reader
    }
    response.'404' = {println "Not Found"}
}

HTTPBuilder currently is not part of the standard groovy, so you have to get the HTTPBuilder API from the marvin repository. When I first try to use Grab, my program got an error:

error groovyc cannot @Grab without Ivy

I followed the instructions for adding Ivy to the build for the script at

https://intellij-support.jetbrains.com/hc/en-us/community/posts/206913575-Installing-Ivy-plugin-

I installed the jar in a directory called “jars” in my workspace and then added it to the module by

  1. File -> Project structure -> Modules -> Dependencies
  2. Add by clicking on the “+” on the right, select JARs
  3. Navigate to where you put the Ivy jar

Grab then works for all groovy scripts in the workspace.

On “success”, the script just outputs the text with the html code. Run the code from your own development machine. Try other websites.

HTTPBuilder has convenience methods for both GET and POST request methods. The convenience methods return the default response. For success, the get methods returns the html as a parsed DOM. Your program can then navigate the DOM and extract the text and attributes from the nodes.

Below is example code using the convenience get method and navigating the DOM.

/**
 * Created by Robert Pastel on 1/8/2017.
 */
// Grap HTTPBuilder component from maven repository
@Grab(group='org.codehaus.groovy.modules.http-builder', module='http-builder', version='0.7')

// import of HttpBuilder related stuff
import groovyx.net.http.HTTPBuilder

def http = new HTTPBuilder("http://example.org")

html = http.get(path : '')
println "html: "
println html

// Now try transversing the DOM
println "Extract text from nodes"
println "H1: " + html.BODY.DIV.H1
println "Anchor: " + html.BODY.DIV.P.A
println ""

// Extract attributes
println "Extract attributes from nodes"
println "href: " + html.BODY.DIV.P.A.@href
println ""

//Extract the name of a tag
println "Extract names of tags"
println "html name: " + html.name()
println "html.BODY.DIV name: " + html.BODY.DIV.name()
println ""

// Depth first search of nodes
println "Find all paragraph elements"
html."**".findAll {it.name() == "P"}.each{
    println ""
    println it
}

 

In the example code, the “html” object is the parsed DOM. It is a GPath object, actually a GPathResult object:

http://www.groovy-lang.org/processing-xml.html#_gpath

Printing “html’, prints the entire GPath object, but only the content of the tags, not the tag themselves or their attributes.

You can traverse the DOM by designating the tag sequence down the branch. For example:

html.BODY.DIV.H1

Will navigate into the html tag then the body tag to the first div and then the h1 tag. While:

html.BODY.DIV.P.A

will navigate from html to body to the first div the first paragraph and finally the anchor tag. To get the value of a tag attribute, use the “@” operator to navigate into the tag. For example  

html.BODY.DIV.P.A.@href

retrieves the value of the href attribute of the anchor tag. To get the name of the tag, use the name() methods. For example

html.name()

returns “HTML.” You need the preferences for the name method, otherwise GPath will think it is looking for the next tag. This may not seem very useful, since you already know the name of the tag, but you can also make breadth first and depth first searches in GPath object. Then you will want the name of the tag. Breath first and depth first searches have shorthand notations, “*” for breadth first search and  “**” for depth first.

http://www.groovy-lang.org/processing-xml.html#_speed_things_up_with_breadthfirst_and_depthfirst

For example, we can use the depth first search with the findAll method to find all the paragraphs:

html."**".findAll {it == "P"}.each{...}

Copy the above code, load another webpage and navigate its DOM.

POST Method

Requesting by a POST method, requires a post body. Typically these are the form data when a user makes a submit.

HTTPBuilder has a “post” convenience method that has a “body” argument. The body argument is a map or json, which HTTPBuilder will encode.

The example code below uses the “restmirror.appsot.com” web site to send a post. The “restmirror” just mirrors the post back.

/**
 * Created by Robert Pastel on 11/12/2016.
 */
// Grap HTTPBuilder component from maven repository
import groovy.json.JsonSlurper
@Grab(group='org.codehaus.groovy.modules.http-builder', module='http-builder', version='0.7')

import groovyx.net.http.HTTPBuilder
import static groovyx.net.http.ContentType.*

def http = new HTTPBuilder( 'http://restmirror.appspot.com/' )
def postBody = [name: [first: 'robert', last: 'pastel'], title: 'programmer'] // will be url-encoded

def html = http.post(path:'/', body: postBody, requestContentType: JSON)

println "*** html ****"
println html
println""

// Use as a JSON
// Unfortunately the response is not really json,
// so we make a JSON
def jsonSlurper = new JsonSlurper()
def json = jsonSlurper.parseText(html.toString())
println "json.name.first = " + json.name.first

Unfortunately the response from “restmirror” is not really a JSON, but a node from a DOM. We have to convert the response to a String and then use JsonSlurper to convert the response to a JSON.

Mashing Up an Old Website: Disturbed WEPP

Explanation of Current Website

This is an old style mashup example. It demonstrates programmatically submitting a website form and parsing a response that is a web page and text. It requires thorough analysis of the website, what it is doing and returning to user.

Visit

https://forest.moscowfsl.wsu.edu/cgi-bin/fswepp/wd/weppdist.pl

The website is for Forest Service personnel to estimate the erosion of at slopes. The basic function of the webpage is a form formatted as a table table that the user submits by clicking the “Run WEPP” button. When the user clicks the “Run WEPP” button, JavaScript collects the parameters entered in the table and sends the parameters to a Peril script which in turns feeds the parameters to a model called WEPP.

In order, to make the estimate of erosion the WEPP model needs:

  • The number of years for the estimate. See the “Years to simulate” field.
  • The climate model for the region. See the selection field under “Climate.”
  • The soil texture. See the selection field under “Soil Texture”
  • Parameters for slope which is composed of two types. See table rows “Upper” and “Lower.” The slope parameters include:
    • The vegetation or treatment. See the selection field under “Vegetation/Treatment”.   
    • The gradient of the slope. See the “Gradient” field.
    • The length of the slope. See the “Horizontal Length” field.
    • The percent coverage of the slope, which is complicated. Click the “?” adjacent to the “Cover” field.
    • The percent of rocks on the slope surface. See the “Rock” field.

Try running the model for different values. Be sure to change the “Years to simulate” values to several different values. The response to clicking the “Run WEPP” button is another webpage with some of the model output in several tables.  In addition the results webpage has links on the bottom. The first five just show the input parameters that the user gave to model. The last link, “WEPP results” shows the complete output from the WEPP model. Click on the button. You’ll see that it is plain text with many tables. The four tables in the results web page are derived from the tables in the complete output from WEPP.

Client Goals and Basic Implementation

In a sense this, website simplifies the use of a complex model developed by scientists. The website enables Forest Service personnel to use a complex model by simplifying the input and output of the script. Our scientist/client wants citizen to be able to use the website. Our client understands that the current “Disturbed WEPP” website is too complex for untrained citizens to use. Our clients proposes several changes to simplify the website:

  • Making the slope have only one gradient instead of “upper” and “lower.”
  • The client has a geodatabase, so that it can determine the soil texture, rock coverage and possibly the slope given the latitude and longitude location of the slope.
  • The result can be only one table, “Return period analysis” table and a single parameter from the complete results, “AVERAGE ANNUAL SEDIMENT LEAVING PROFILE” in tons per hectare (t/ha).

We can implement the client’s desires by mashing up. We can make our own website with a form that inputs only the parameters we need. Our website can make remote services request to the client’s geodatabase and then call the script that the “Disturbed WEPP” website calls with the body that our program constructs. When we get the response, we then parse the webpage, extracting the results that we want, and display them our website.

Analysis of Current Website

Request Analysis

We know our goal and basically how to implement it, now is the time to get to work. First we must study how the website works. Run the website by clicking on the “Run WEPP” button. On the results web page, view the developer tools. In Chrome, right click the page and select “Inspect Elements.” Make sure the “Network” tab is showing and displays the list of request made by the web page. You may have to click “Network” and refresh the page. At the top of the list of network request should be “wd.pl” ran as a POST method. This is the script that returns the results for the original request of the webpage. Click “wd.pl”, it is a link, and you should see a accordion with sections “General”, “Response Headers”, “Request Headers”, and “Form Data.” We are interested in the “Form Data”, so click the arrow adjacent to “Form Data.” see the details. What you see is the body of the post. If it is not well formatted, click “view parsed.” If it is well formatted, you can view the original format by clicking “view source.” We want to look at the well formatted form data. It the map (body) of the post request sent to script that runs the WEPP model. What the keys correspond to in the form input should be obvious. If not obvious then you can play with the form website, inputting values that you can recognize in this list.  

Web Page Structure

Now let us analyze the results page structure. Right click on the results web page and select “view source page.” In the HEAD of the source are several long scripts. They are basically what is run when the user clicks the links at the bottom of the page. We’ll ignore them for the time being, but will inspect them later.

Scan down to the BODY of the source. It is at the very bottom. Now search for the table of interest, “Return period analysis.” Recall that we use HTTPBuilder to make the request, which will give us a GPath to find our table in the DOM. What is the path to the table. Working backwards through the tagas you discover that it is

html.BODY.FONT.BLOCKQUOTE.CENTER.P.TABLE

But there are several tables in the web page, so this path may not be unique. But funny and lucky for us, it is a unique path. It is the only table within a paragraph tag, <p> … </p> after the center tag. If it was not we would have to search for the CENTER tags and check that the H3 tag had content beginning with “Return period analysis. We could then grab the table within that center tag.

Now go to the very bottom of the BODY where the links at the bottom of the page are defined. Note that the last one is for “WEPP results”

<a href="javascript:void(showextendedoutput())">WEPP results</a>

The href defines the JavaScript function to run. The JavaSript “void” operator results in the web browser showing the results of the JavaScript on a new page. It is a trick.

Search for the JavaScript function, showextendedoutptut, it is near the top of the page. In fact it is the bulk of the source. You’ll see that the function is primarily composed of lines like

filewindow.document.writeln("...")

Each line just writes a line to the window. Although it is a very long script, it’s structure is basically very simple. Recall that our client wants the value of “AVERAGE ANNUAL SEDIMENT LEAVING PROFILE” in units for tons per hectare. Search for the section in the JavaScript function. Note that the units are designated “t/ha”. Search the page source for “t/ha”. You’ll notice that it is the only occurrence in the whole page source. We got lucky again. This will make coding easy, as you will see.

Coding the Groovy Script

Now it is time to code the script that will request results from Disturbed WEPP website and parse the response. We know everything we need:

    • Domain: https://forest.moscowfsl.wsu.edu
    • Path: /cgi-bin/fswepp/wd/wd.pl
  • Body for the POST from the Form Data
  • GPath to the our table: html.BODY.FONT.BLOCKQUOTE.CENTER.P.TABLE
  • How to find the tons per hectare value: searching on “t/ha

The rest is hard and tedious work of coding.

/**
 * Created by Robert Pastel on 11/12/2016.
 */
/**
 * Grab gets the http-builder jar from the maven site
 *
 * If you get the error
 *
 *       error groovyc cannot @Grab without Ivy
 *
 * then
 *      1. Download the binary for Ivy at
 *            http://ant.apache.org/ivy/
 *      2. Unzip and extract the jar
 *      3. Put it in a nearby directory
 *      4. Add it as a module to the project by
 *             i. File -> Project structure -> Modules -> Dependencies
 *             ii. Add by clicking on the "+" on the right, select JARs
 *             iii. Navigate to where you put the Ivy jar
 *      Reference
 *        https://intellij-support.jetbrains.com/hc/en-us/community/posts/206913575-Installing-Ivy-plugin-
 */
@Grab(group='org.codehaus.groovy.modules.http-builder', module='http-builder', version='0.7')

import groovyx.net.http.HTTPBuilder
import static groovyx.net.http.ContentType.*
import static groovyx.net.http.Method.*

/**
 * Make the post request
 *
 * Note that the HTTPBulider should reference only the site
 * and  the path specifies the path to script.
 * This avoids 403 (Forbidden)
 *
 * Documentation for HTTPBuilder is at
 * https://github.com/jgritman/httpbuilder/wiki
 */
def http = new HTTPBuilder( 'https://forest.moscowfsl.wsu.edu' )
// You can find the post variables and value, by submitting a request from the website
// then inspect -> Networks -> click wd.pl -> Form Data.
// You can even copy and paste from the inspector to your script.
def postBody = [
        me:'' ,
        units:'ft',
        description:'' ,
        climyears:'10',
        Climate:'../climates/al010831',
        achtung:'WEPP run',
        SoilType:'clay',
        UpSlopeType:'OldForest',
        ofe1_top_slope:'0',
        ofe1_length:'50',
        ofe1_pcover:'100',
        ofe1_rock:'20',
        ofe1_mid_slope:'30',
        LowSlopeType:'OldForest',
        ofe2_top_slope:'30',
        ofe2_length:'50',
        ofe2_pcover:'100',
        ofe2_rock:'20',
        ofe2_bot_slope:'5',
        climate_name:'BIRMINGHAM WB AP AL',
        Units:'m',
        actionw:'Run WEPP'
]
// Make the post request and get back the GPath for the html.
// Note the path to the script. It is necessary to split up the URI this way.
def html = http.post(path: '/cgi-bin/fswepp/wd/wd.pl', body: postBody)



// Now get the table of interest results using GPATH
// See http://groovy-lang.org/processing-xml.html#_gpath
def erodeTable = html.BODY.FONT.BLOCKQUOTE.CENTER.P.TABLE
// Note that sometimes the GPath hierarchy is broken,
// but you can always make the depth first searches
/**
 * Map of Maps
 *
 * We want a map like this:
 * analysis[period][variable] -> value
 *
 * also want to make a Map from variables to untis
 *
 * Note that this should work for any value of "years to simulate".
 */
// create the analysis Map
def analysis = [:]

// Gather the keys and make the units map
def i = 0 // counts table rows, so we can do something special for the first table row
def periods = []
def variables = []
def units = [:]
// Note that erodeTable is a GPathResult, so we can search it.
erodeTable."**".findAll{it.name() == "TR"}.each{tr ->
    // The first table row lists the variables with their untis
    if ( i == 0){
        for(j = 0; j < tr.TH.size(); j++){
            // We want to sikp the first header
            if(j > 0){
                String variable_unit = tr.TH[j]
                // Some regular expression to extract variable names and units
                // See http://groovy-lang.org/operators.html#_regular_expression_operators
                // and http://www.regular-expressions.info/
                // Variable names will have only alphabets and units are inside parenthesis
                def m = variable_unit =~ /([A-Za-z]+)\((.+)\)/
                if (m){
                    // Note that the capture groups are Strings
                    variables[j-1] = m[0][1]
                    units.put(variables[j-1], m[0][2])
                }
            }
        }
    }
    // Table rows greater than 0 contains a periods
    else if(i > 0){
        // We will want to use the table header as a key to a map,
        // so we MUST use toString method so that the hashing works properly.
        // Java hashes Objects different from Strings
        periods[i-1] = tr.TH.toString()
    }
    i++
}

//println periods
//println variables
//println units

// Now construct the analysis table from the bottom up
i = 0 // for tracking the periods and table row
erodeTable."**".findAll{it.name() == "TR"}.each{ tr ->
    // construct the period_variable map
    if (i > 0) { // skip the first row because it is a header row
        def j = 0; // for tracking the variables
        def period_variable = [:]
        tr."**".findAll { it.name() == "TD" }.each { td ->
            period_variable.put(variables[j], td)
            j++
        }
        analysis.put(periods[i-1], period_variable) // use i-1 because we skip the first table row
    }
    i++
}
// now we can access the analysis table like this
String period = "Average"
String variable = "Runoff"
println "The ${period} ${variable} is ${analysis[period][variable]} ${units[variable]}"

/**
 *  Find the average annual sediment leaving profile in units t/ha
 *  It is in function showextendedoutput() that is evoked by "WEPP results" link
 *
 *  Note that there is only one occurrence of "t/ha" in the entire response.
 *
 */
def scriptNode = html.HEAD.SCRIPT
// extract the SCRIPT
String script = scriptNode.toString() // Make sure it is a string
// create the variables to save captures from matches
def leavingLine
def leavingValues = []
def leavingUnits = ["t/ha", "ha"]

// Use regular expressions to get line, Note that quotes delineate the line
//def m = script =~ /".+t\/ha.+"/  // this works, but the slash must be escaped
// We can also use groovy strings and then the / is escaped for us
def m = script =~ /".+${leavingUnits[0]}.+"/
if(m){
    // Found it, so clean up the leavingLine. Note the use of regular expressions in the replaceAll
    leavingLine = m[0]
    leavingLine = leavingLine.replaceAll(/ +/, ' ') // remove extra spaces
    leavingLine = leavingLine.replaceAll(/" /,'') // remove leading quote
    leavingLine = leavingLine.replaceAll(/"/,'') // remove remaining quotes

    // Now extract the values
    m = leavingLine =~ /([\d\.]+) ${leavingUnits[0]} \([\w\s]+([\d\.]+) ${leavingUnits[1]}/
    if(m){
        leavingValues[0] = m[0][1]
        leavingValues[1] = m[0][2]
    }
}
// Now we can use our regular expression captures like this
//println leavingLine
println "The average annual sediment leaving is ${leavingValues[0]} ${leavingUnits[0]}"

You can also download the code from the resource/mashup/ directory.

First part of the code, making the HTTPBuilder object and making the POST request should be familiar. The only trick is to be sure to separate the domain from the path. If you should put all the URL in the argument for the HTTPBulider constructor, i.e.

https://forest.moscowfsl.wsu.edu/cgi-bin/fswepp/wd/wd.pl

You will get a “forbidden” response. I believe that the cgi-bin/ directory is protected so that only requests from the https://forest.moscowfsl.wsu.edu domain have access.

Grabbing and Analyzing the Table

Then the code grabs the table of interest, which is the erodeTable:

def erodeTable = html.BODY.FONT.BLOCKQUOTE.CENTER.P.TABLE

Now the code constructs a map of the table value so that it can be passed to our own view. The map is the “analysis” object. It will be a map of maps.

analysis[period][variable] -> value

For each period (row in the table) there is a map of value with keys: Precipitation, Runoff, Erosion, Sediment. If you played with the website, in particular tried different years to simulate, you’ll will have discovered that the number of rows and years for the Return period are different. We’ll need to parse these years, called periods in the code, and use them as keys to the maps that represent the rows in the table. To do all this, we need to use Regular Expressions. Hopefully you paid attention in your formal methods course.

Brief Introduction to Regular Expressions

Groovy has regular expressions built into the language as operators. Study the syntax at

http://groovy-lang.org/operators.html#_regular_expression_operators

I use the find operator

http://groovy-lang.org/operators.html#_find_operator

There are many tutorial on the web for Regular Expression, but please note that regular expressions comes in different flavors depending on the programming language. My favorite reference is

http://www.regular-expressions.info/

Although it is not the best tutorial, it is the most complete reference I have found. If you go to the reference manual

http://www.regular-expressions.info/reference.html

You’ll notice that you can select the language of your choice to display the tables. We are interested in Java. That is is the flavor of Groovy. And you might need the JavaScript tables.

I assume you know the basics of making patterns, but go back to the quick totorials

http://www.regular-expressions.info/tutorial.html

Study the “Special Characters”,  “Character Classes”,  “Repetition”, and “Grouping and Capturing” tutorials:

I use these aspects of regular expression extensively.

In the code we first parse the keys for the map we are going to make from the table. Note that the keys or the first table row and column which use the TH tag. We use the a double loop to parse the keys for the map, periods and  variables, but the outer loop is not a for-loop. The looping is done by the “each” method for the findall in the depth first search.

erodeTable."**".findAll{it.name() == "TR"}.each{tr -> ... }

We control the index variable, i, by hand, so that we can identify which row the code is parsing. The first row, i == 0, has the variable keys and their units. We need to the for loop to go through this row. The variable key is made of alphabet character, lower and upper case, and the units are anything inside of parenthesis. We want to capture them both. The pattern is

def m = variable_unit =~ /([A-Za-z]+)\((.+)\)/

Capture groups are designated by “(….)” in the pattern, so to match on parenthesis you have to escape them. The first capture group is ([A-Za-z]+) and the second capture group is inside the escaped parentheses, (.+).  The variable “m” contains the matches which is a 2 dimensional array, i.e. matrix. The element m[0] and m[0][0] contains the entire match. The element m[0][1] contains the first capture group, the variable name for the key and m[0][2] is the second capture group, the units for the variable.

Get the years for the period key is easier. It is the content for the TH tag. Note that because the key will be hashed by Java/Groovy, we must be assured that it is a String. What is returned from the GPath is not a String but an object, so we use toString method to convert the object to a string.

Now that we have the keys for our map, we can construct the map. This does not require regular expression. We use two interleaved depth first searches. First on the TR tags to find the rows then on the TD tags to find the data cells. The inner depth first search makes the map for the period_variable map and the outer depth first search puts the peroid_vairable map into the analysis map.

Grabbing the ANNUAL SEDIMENT LEAVING PROFILE

Because the tons per hectare units, “t/ha”, only occurs once on the page source it is fairly easy to get this value, but it still takes two matches to get the values. First we grab only the SCRIPT part of the GPath, and then match on the line that contains “t/ha”.

def m = script =~ /".+${leavingUnits[0]}.+"/

This matching pattern is using groovy string notation, ${…}. The value of the array element leavingUntits[0] is “t/ha”. So the match is anything between quotes, “…” that has the character sequence “t/ha”. Now that we have the line with the “t/ha”, we clean it up by removing unnecessary spaces and the quotes. Thes we use capture groups to grab the values we want

m = leavingLine =~ /([\d\.]+) ${leavingUnits[0]} \([\w\s]+([\d\.]+) ${leavingUnits[1]}/

The values contain digits and a decimal point. The period representing the decimal point must be escaped because it is a regular expression special character. Again we use groovy string notation make sure that the capture groups are in the correct location.

Mashing Up a New Service

Our client will need to make a API for the geodatabase, and that will be one of our new service. Because our website will be just retrieving entries from the database, our HTTPBuilder object will probably use a GET method. It might need a query string to for the latitude and longitude.  Study the examples on GET Example page.

https://github.com/jgritman/httpbuilder/wiki/GET-Examples

The query string is specified in the “query” parameter in the get method. It is an array of key values.

query : [q:'Groovy']

In the example above, the “q” is the key and ‘Groovy’ is the value. The keys must be recognized by the services. Another example with two query keys.

uri.query = [ v:'1.0', q: 'Calvin and Hobbes' ]

The query parameter will be converted into

?v='1.0'&q='Calvin and Hobbes'

And append it to the URL.

Client Point Query API

Our client has provided a simple API with one query, point_query. An example using the API:

http://rred.mtri.org/baer/hci/point_query?lat=37&lon=-105

If you point your browser or click on the link, the return will be:

{"slope": 47.5440521240234, "soil_rock_percent": 5.0, "soil_texture": "loam"}

This is a josn with three properties:

  • slope
  • soil rock percent
  • soil texture

Coding the Groovy Script

/**
 * Created by Robert Pastel on 1/19/2017.
 */

import groovy.json.JsonSlurper
@Grab(group='org.codehaus.groovy.modules.http-builder', module='http-builder', version='0.7')

// import of HttpBuilder related stuff
import groovyx.net.http.HTTPBuilder

def http = new HTTPBuilder("http://geodjango.mtri.org")

def json = http.get( path : '/baer/hci/point_query', query : [lat:37, lon:-105] )

println "json = " + json
println json.getClass() // It is a JSON Map
println "keySet = " + json.keySet() // with these strings

// We can access values like this
println "slope = " + json.slope
println "soil_rock_percent = " + json.soil_rock_percent
println "soil_texture = " + json.soil_texture

We create the HTTPBuilder object, http, with the domain of the service and then use HTTPBuilder’s convenient get method to specify the  path to “point_query” and the query. The the JSON, json, is returned.

The “json” object in the code does not print like a JSON. That is because it has already been converted to a Groovy Map by HTTPBulider.

http://groovy-lang.org/groovy-dev-kit.html#Collections-Maps

We can access the values for the keys using the “dot” notation.

Restricting Access

Some services restrict access, so that not anyone can use the service. This is done by either provide an API key or token to the developers. The API key is added to the URL and unless the API key matches one in the list that the service has, the service will deny service. A token is like a API key but placed in the header of the request. For examples using API keys, see Weather Underground API:

https://www.wunderground.com/weather/api/d/docs

For examples using a token, see NOAA API

https://www.ncdc.noaa.gov/cdo-web/webservices/v2

To add to the header, using HTTPBuilder, you can use the full version of the request not. See the last example in the GET Example page.

https://github.com/jgritman/httpbuilder/wiki/GET-Examples

It might be possible to use the GET convenient method,

Also look at the RESTClient, for more ways to make requests.

https://github.com/jgritman/httpbuilder/wiki/RESTClient

Building HTTPBuilder in a Grails App

The “@Grab” is a Grape annotation which adds dependencies at run time.

http://docs.groovy-lang.org/latest/html/documentation/grape.html

This works well for quickly writing Groovy scripts without using a build script, but it will not work in a Grails app deployed on the Tomcat server. Even adding the Ivy dependencies in build.gradle will result in a 500 Internal Server Error.

We need to add HTTPBuilder dependencies directly to the project’s build. There are two ways to do this. We can download the HTTPBuilder from the Maven Central repository to the project’s local repository or we can configure build.gradle to access the Maven Central repository directly. Both techniques are outline below.

Adding HTTPBuilder to Local Maven Repository

We can use IntelliJ IDEA to download HTTPBuilder to the local Maven repository and then associate it with the project modules. This takes multiple steps:

  1. Select “File” menu and then select “Project Structure …”  to open the Project Structure window.
  2. In the Project Structure middle pane, click the green “+” and select “From Maven…” to open the Download Library From Maven Repository window.
  3. Search for HTTPBuilder in the Maven repository by typing “org.codehaus.groovy.modules.http-builder” in the text box and clicking the search icon to right of the text box.
  4. Wait while InelliJ IDEA searches for all the versions of HTTPBuilder.
  5. After the search is complete, select the highest version, currently   “org.codehaus.groovy.modules.http-builder:http-builder:0.7.1” and click “OK” and the Choose Modules widow opens.
  6. In the Choose Modules window, select all modules of the project and click OK.

The HTTPBuilder is now downloaded into your local repository and associated with the projects modules. Now the dependency should be added to build.gradle. Add the compile dependency to the dependencies section in build.gradle.

dependencies {
    compile "org.codehaus.groovy.modules.http-builder:http-builder:0.7.1"
    ...
}

You can now import groovy.net.http.HTTPBuilder in your controller classes and use HTTPBuilder.

Configuring build.gradle to Access Maven Central

To configure the build to access Maven Central directly only requires adding “mavenCentral()” function to the repositories section in build.gradle.

repositories {
    mavenLocal()
    maven { url "https://repo.grails.org/grails/core" }
    mavenCentral()
}

You do not want to add it the the repository section in “buildscript”. section. Also best to add “mavenCentral()” to the bottom of the list because list determines the order that gradle will search the repository. We want gradle to search mavenCentral last.

Add the compile dependency to the dependencies section in build.gradle.

dependencies {
    compile "org.codehaus.groovy.modules.http-builder:http-builder:0.7.1"
    ...
}

You can now import groovy.net.http.HTTPBuilder in your controller classes and use HTTPBuilder.

It may appear that configuring to build.gradle to access Maven Central is easier, but you have to know that the jar version. Also the process of using IntelliJ IDEA does not take that long.

If you want to search Maven Central for jar without using IntelliJ IDEA, you can use the Maven Search website.

https://search.maven.org/

Form the Maven Search website, you can download the pom.xml, jar or source.