Reading an http file in csv format

I want to read this csv file “http://api.pluralsight.com/api-v0.9/courses” into a snap
When I used mapper and file reader, I obtained this error message:
Failure: Unable to read from http://api.pluralsight.com/api-v0.9/courses, Reason: The file can not be accessed: http://api.pluralsight.com/api-v0.9/courses, Resolution: Verify that access permissions are granted on the file: http://api.pluralsight.com/api-v0.9/courses

Hi @bepelletier,

I tried to recreate the error you are getting by going on the path with the File Reader Snap, and yes, I got the same error message.

So it occurred to me to implement a workaround, and it appears to be functioning. Please find my solution in the attached pipeline.

The solution consists of using a REST Get Snap to make a call to http://api.pluralsight.com/api-v0.9/courses. The REST Get Snap will throw an error because, naturally, it tries to parse the $content field of the response as JSON, but the value of that field is CSV encoded data. Regardless, in the Mapper Snap connected to the error view of the REST Get Snap I map the necessary fields in order to write the $content (the actual CSV encoded data we want) to sldb with the use of a File Writer. After that I proceed to read the file from sldb as usual.

I hope this solution can be useful to you, in you particular circumstances. If you have any questions I’d be happy to help you further.

BR,

Dimitri

Read_HTTP_File_In_CSV_Format_2020_04_16.slp (15.2 KB)

P.S. : I should warn you that, at least in my case, when I try to preview the data at the error view of the REST Get Snap, the page in my browser tends to freeze. I guess that’s because the value of the $content is rather large, the file being 8+ MB in size.

That’s clever! It’s kind of weird to start a pipeline getting good data from an error. My mind is not set for this kind of thing. But it is definitely working.
I thank you very much Dimitri.

1 Like

As an addition, or rather, a subtraction :slight_smile:, I tried and removed the File Writer and File Reader snap, and connected the binary output of the Mapper Snap to the binary input of the CSV Parser Snap, and, it still works :slight_smile:

Cheers,
Dimitri

Hi. I have a few things to share about this.

You don’t need to connect anything to the REST Get’s error view as a workaround. Just change the REST GET’s “Response entity type” setting from DEFAULT to BINARY. Follow that with a Mapper that maps $entity to $content (or whatever you want to call it), and use the Views section of the Mapper’s settings to change the output view’s Type from Document to Binary.

As for the File Reader’s failure to read the URL, there are a few interesting things going on here.

The File Reader is implemented a bit differently than the REST Get. The HTTP request it sends to the server has a User-Agent header whose value is the Java version you’re using. The HTTP requests sent by the REST Get snap don’t include a User-Agent header.

Oddly, the server responds differently depending on the value of the User-Agent header.

If you use Java 8, the User-Agent header is something like Java/1.8.0_181, and the server rejects the request with a 403 Forbidden response. I have no idea why the server is doing this. I’ve never seen this before.

If you use Java 11, the server responds with a 301 Moved Permanently, plus the header “Location: https://api.pluralsight.com:443/api-v0.9/courses”. This is much more normal. It’s known as a redirect.
The server is asking the client to use https instead of http to make the request. In other words, use SSL so the response will be encrypted. Many http clients will automatically follow such redirects by making another request. The REST Get snap automatically follows redirects when the “Follow redirects” setting is checked, which is why it works. Unfortunately, the File Reader does not follow redirects. If you use the File Reader with the http URL and you’re using Java 11, you’ll essentially get an HTML response with the “301 Moved Permanently” response, which isn’t very useful. However, to work around this, simply modify the File Reader’s File setting to specify https instead of http.

In short, the simplest way to read the CSV file is using a File Reader configured with the URL https://api.pluralsight.com/api-v0.9/courses, and make sure you’re using Java 11 for your plex.

2 Likes

You can use the curl command-line utility to see how the server responds differently depending on the value of the User-Agent header. Try these:

curl -v -L http://api.pluralsight.com/api-v0.9/courses -o courses.csv -H “User-Agent: Java/11.0.7”

curl -v -L http://api.pluralsight.com/api-v0.9/courses -o courses.csv -H “User-Agent: Java/1.8”

The first command successfully retrieves the csv file. The second fails with a 403 Forbidden response.

Thank you very much Patrick! You explanation is very interesting.

image001.jpg