January 30, 2025
Bringing ERDDAP to ArcGIS Online
Jerad King, GCOOS DMAC GIS Manager and Developer
Run time: 41:49
00:00:01:00 - 00:00:38:13
Matt Biddle
Welcome to the January edition of the DMAC Tech Webinar. Thank you all for joining us. Today we have a really exciting presentation from Jerad King from the Gulf of Mexico Coastal Ocean Observing System, and he's going to talk about ERDDAP to ArcGIS Online. But before I get into that, I'd like to give a little background. The IOOS Tech Webinar is — it's really intended to be an opportunity for the DMAC community to come together and talk about topics related to data management and cyberinfrastructure approaches within Regional Associations, projects, or partners of the IOOS community.
00:00:38:14 - 00:01:05:19
Matt Biddle
So thank you all for joining us, for this webinar today. Topics can be technical, diving into a specific code base like we're going to talk about today. They can also go into other topics like reviewing metadata and data standards or an overview of of other topics of interest to the rest of the community. So if you have things of interest, please feel free to reach out and we can get you on the calendar to join in a future meeting or to lead a discussion on on something of interest.
00:01:05:21 - 00:01:35:08
Matt Biddle
So with that, Jerad King is going to be talking to us about the project ERDDAP2agol, or ERDDAP to ArcGIS Online. Jerad King is a GIS developer with the Gulf of Mexico Coastal Ocean Observing System, or GCOOS as we all know it, where he leads the technical components of GIS-centric projects. He is dedicated to advancing GIS as a core technology within GCOOS's existing services, in addition to the broader marine science community.
00:01:35:10 - 00:01:43:02
Matt Biddle
So with that, I'll let Jerad take us away, and then we'll jump into, Q&A at the end of his talk. All right.
00:01:43:04 - 00:01:52:21
Jerad King
Thank you for the introduction, Matt. And how does it look? Everything okay?
00:01:52:23 - 00:01:54:07
Matt Biddle
Let's go ahead.
00:01:54:09 - 00:02:24:24
Jerad King
Perfect. Well, howdy, everyone. As we say here at Texas A&M. Once again, thank you to Matt for the opportunity to present today and talk about a little bit, about what we're doing at GCOOS. Particularly with respect to bringing ERDDAP to ArcGIS Online, and what that means for marine science, our organization and the IOOS community as a whole.
00:02:25:01 - 00:02:46:02
Jerad King
So, just a little bit about what we'll talk about, just some quick background and about me. I want to talk about GIS and marine science and some of the challenges that are faced by it, as well as some of the project catalysts, right? What brought us to developing this program? And then we'll go into the program itself.
00:02:46:04 - 00:03:17:07
Jerad King
In which case we'll talk about some of the challenges and solutions that we came up with during development. Punctuated by demonstrations. And then I'll close out with the roadmap, what we intend to develop next, and what our development priorities are, and have a little discussion about it. You know, I'm excited to share this with the audience of such expertise and draw from your knowledge because this ultimately is something that saves me time at the end of the day.
00:03:17:09 - 00:03:46:01
Jerad King
And, as valuable as time is, you know, this this might be a program that could help. So I graduated from Texas A&M University with an M.S. and geography. I studied under Dr. Zhe Zhang in the Cyberinfrastructure and Spatial Decision Intelligence Lab. And my thesis actually related to GPS-based remote sensing methods of wildfires, so shout out to the National Geodetic Survey.
00:03:46:03 - 00:04:10:07
Jerad King
If there's anybody here from there, well, I guess the fires got a little too hot, and I needed to go for a swim because the next step was joining GCOOS. And I joined GCOOS as the GIS manager and developer. So what I'm responsible for at GCOOS is the synthesis, publication, and management of our data — the administration and development of our infrastructure itself.
00:04:10:09 - 00:04:38:16
Jerad King
So right now we have actually two ArcGIS Online platforms and are in the process of deploying an on onsite ArcGIS Enterprise System, as well as the research and development of new data and applications, which is what ERDDAP to ArcGIS Online falls into. So like I mentioned earlier, I was met with an interesting challenge, something I didn't anticipate coming from a more holistic, cross-domain background,
00:04:38:18 - 00:05:07:12
Jerad King
and that was creating GIS products that that serve oceanographers. Now traditionally, GIS has not been the first choice for rigid marine science. And a lot of that is predicated on the fact that spatial algorithms are designed to consider movement in a certain dimension, right? The X and the Y. But as oceanographers, you know, the importance of depth, right?
00:05:07:12 - 00:05:39:09
Jerad King
That that additional dimension. And that is not something that traditionally GIS software has been fantastic at handling. Because at the end of the day, with all this data, what are we trying to do but but learn about the natural world and gain insight? But nonetheless, there's this clear, concerted effort to make GIS applicable for marine science and use that as an additional tool in an oceanographers toolbelt.
00:05:39:11 - 00:06:09:21
Jerad King
And part of that is this collaboration between, NOAA and ESRI and, this screenshot in the top right, I think is really interesting. It's a form at the bottom of the Oceans Hub, and it asks the questions, what data are you most interested in, right? Because in oceanography we have these these certain architectural patterns that might not be reflected in GIS if GIS has traditionally not been a choice.
00:06:09:23 - 00:06:40:04
Jerad King
What I also find interesting is that analytical capability. What analytical capability would you like added to ArcGIS? So, there's this clear effort in understanding, what the community needs and what further development from ESRI side needs to be done to make this a competitive option. That, you know, GIS is not going to be something that you're sacrificing, any sort of utility for if you choose to use it.
00:06:40:06 - 00:07:14:21
Jerad King
So that said, I'd like to highlight a few applications. The first one is ERMA, so shout out to the ERMA team. I think this is a very nice data explorer, and there are quite a number of data sets that are connected to ArcGIS-hosted services. I think the ability to put datasets on top of each other on a map and explore them visually assists users in not only understanding the spatial relationship of data, but potentially inspiring study itself.
00:07:14:23 - 00:07:45:11
Jerad King
So in this case, we're looking at these bathymetric contours, which leads us to an ArcGIS rest services link, and that's from a coast.noaa.gov, so that's an enterprise system. An additional application similar to ERMA is the NWS GIS viewer. And in this case we actually have data being hosted on ArcGIS Online itself through the NOAA Geoplatform.
00:07:45:13 - 00:08:13:09
Jerad King
So going to this page, it's a lot more friendly than say a rest services link where it's a JSON object printed on a blank web page. It's a familiar thing that will encourage people to go to say those additional links, and actually download and acquire the data itself. But that first step of, of hosting on ArcGIS Online promotes that discovery of the data.
00:08:13:11 - 00:08:53:16
Jerad King
Yeah, and the connection with GCOOS here is the CETACEAN project where the compilation of environmental threat and animal data for a cetacean population health analyses platform, which you can find at cetacean.gcoos.org. One of the challenges that we were faced with is combining all these different datasets from disparate data sources into one single catalog, and creating hosted services from that data, such that users would now have an enhanced utility, and be able to access, view, and visualize the data in browser in a more meaningful way.
00:08:53:18 - 00:09:26:20
Jerad King
So the idea is a one stop shop for studying impacts to large marine mammal populations in the Gulf of Mexico. Now, in conversations about where to gather environmental data from, one of the clear options was our ourselves. We are the Gulf of Mexico Coastal Ocean Observing System, and we have a number of ERDDAPs. So, we have the historical collections — which are 25,000 data sets I believe? —
00:09:26:22 - 00:10:01:06
Jerad King
that date back to, to previous in situ measurements. We have our near real time data ERDDAP, which is our in situ oceanographic meterological observing system, and then we also have a biological and socioeconomics ERDDAP that distributes biological and socioeconomic data. So we were sitting on a treasure trove of data, but it's a lot of data and it's on ERDDAP, and we want to bring it to ArcGIS Online.
00:10:01:08 - 00:10:28:18
Jerad King
I think you can see where this is headed, right? So, given the audience, you know, I’ll briefly talked about ERDDAP. When I first heard about ERDDAP, and getting acquainted with the GCOOS IT, I mean, my ears perked up when I heard standardized variable names, The bane of any GIS developer’s existence is the creative ways that people name their latitude and longitude fields.
00:10:28:20 - 00:10:51:04
Jerad King
Is it latitude with a capital L or a lowercase L? Is it just lat, right? Do you want to have a dictionary of all those possible combinations, and check every time you pull in large amounts of data? Probably not. But on ERDDAP, right, all those variable names are the same and you can consistently pull data and know what you're pulling from.
00:10:51:06 - 00:11:19:04
Jerad King
So that brings us to ERDDAP to ArcGIS Online, and how we connect these two services. So the current capabilities of ERDDAP to ArcGIS Online is the ability to publish static ERDDAP data - so, a dataset that's not updated. This is currently limited to tabledap, and datasets must have latitude, longitude, and time.
00:11:19:06 - 00:11:44:06
Jerad King
There's also the capability to create glider track datasets when you select the Glider DAC menu from the Glider DAC ERDDAP. And what that does is that actually takes the tabular data and creates multiline segments in which that data is binned into those segments. And then finally there's the ability to create near real time hosted feature layers,
00:11:44:06 - 00:12:19:20
Jerad King
and with that comes updating those near real time datasets. So, we'll talk about next, a high level overview, and then first begin with the process of near real time datasets. But additionally, in terms of design philosophy, I think ease of use is really important, right? Even if you're making a program as a data manager for other data managers, you don't want to have to read a whole bunch of documentation for something that's supposed to be saving you time.
00:12:19:22 - 00:12:48:16
Jerad King
So, no matter your technical skill set, the learning curve should be minimal to none because this is really about convenience. And the second point is that the development of features is determined by organizational needs. So this is where, you know, you have that GCOOS fingerprint on this project. We prioritize the development of features based upon what's needed to support our existing projects.
00:12:48:17 - 00:13:15:05
Jerad King
So that comes first and foremost. And of course, it's free and open source or else, you So that comes first and foremost. And of course, it’s free and open source or else, you know, I would be here talking today and sharing this powerpoint. And, and the last point is data accessibility. You know, there's so much rich data on ERDDAP, so much quality data, but a lot of people might be turned off by the view of the dataset list, and it can be quite daunting.
00:13:15:07 - 00:13:48:20
Jerad King
So we solved that issue with reflecting that data, bringing it into a one-to-one from that ERDDAP service to ArcGIS Online. That assists accessibility for users who wish to consume your data, but it also assists accessibility for developers. For example, just yesterday I was in a meeting in which folks from a commercial sector were familiar with ArcGIS and interfacing with that API,
00:13:49:01 - 00:14:13:23
Jerad King
but they were not familiar with ERDDAP. So, if we're going to create a meaningful collaboration with them, you know, the best thing that we can do is provide them something that they're familiar with rather than putting the the burden on that developer to say, learn a new API. So, at a high level, you know, what really happens is we talk to ERDDAP,
00:14:14:00 - 00:14:39:00
Jerad King
we make that dataset selection, and then we pass it to ArcGIS Online. The name basically says what's going on, but That ERDDAP selection - currently this is tied with the command line user interface. However, I'm interested in thoughts on how a programmatic workflow might be useful because that is certainly planned for version one.
00:14:39:02 - 00:15:08:01
Jerad King
So once you connect to the ERDDAP server, you can get that list of datasets. And after you select the datasets you want, we’re simply getting a list of those dataset strings, and we're actually creating dataset objects of those IDs. And upon initialization, we do things like get the dataset attribute structure and write it to the object's attributes
00:15:08:03 - 00:15:29:20
Jerad King
so that we know how to request the data. And that's what you see going on here is we have this dataset object, and as it's initialized, that metadata is received. It knows everything it needs to know to then generate the URL and write it to your temp directory so it can be posted to ArcGIS Online.
00:15:29:22 - 00:15:48:18
Jerad King
And after that, the processing is done. Those dataset objects are inherited by the ArcGIS Online. It's called the wrangler class - ArcGIS on wrangler. One because it's Texas, and two, that's what we're doing at the end of the day is we have to wrangle these two services to get them to talk to each other.
00:15:48:20 - 00:16:14:16
Jerad King
It's like two friends from different social circles and they don't really like each other, but you want them to, right? So it helps to divide that workflow into different classes, and have those methods that know, ok, this is what I'm expecting from ERDDAP, this is what I'm expecting from ArcGIS Online. So at a high level, this is that division of processing that occurs when you make a request.
00:16:14:18 - 00:16:48:09
Jerad King
Right. So going into near real time data - the reason I'm starting with this is this was actually the initial inspiration for the project. We work with a collaborator from ERSI named Keith Van Graafeiland who shared a script with me that took an active - I believe it was some sort of glider mission - and overwrote the dataset at a certain interval using a package known as overwrite feature service.
00:16:48:11 - 00:17:19:14
Jerad King
So this overwrite feature service is a throughpoint through development. So this is kind of where things started, and this is a package that is still currently used in the NRT update because it's quite powerful. You can overwrite the data or metadata in a hosted feature service while preserving the endpoint. So I can completely transform a content item on ArcGIS Online without breaking the link essentially.
00:17:19:16 - 00:17:44:06
Jerad King
And while that's not a core API functionality yet for the API itself, you can use this package overwrite feature service to do so. So then development began on that rest URL generation for the GCOOS ERDDAP in particular because we thought wouldn't it be cool if we could have these near real time datasets in ArcGIS Online?
00:17:44:08 - 00:18:06:21
Jerad King
So with that, comes to question of where does the code live, right? Where is it hosted? Is it on Docker that's on a server somewhere? It turned out to be a quite simple solution. We use ArcGIS Online notebooks, right? So the program is formatted as a package that can be installed into an ArcGIS Online environment,
00:18:06:23 - 00:18:38:04
Jerad King
and then you can schedule your update tasks with the ArcGIS Online notebook task scheduler. So this provides user control over update frequency without having to say, like, modify the program or pass any arguments to any functions. And this is also where you can see logs, so it helps you adjust that granularity of update. Another benefit of of running this in ArcGIS Online environment is that a user is automatically authenticated.
00:18:38:06 - 00:19:11:06
Jerad King
So sometimes if say an organization has single sign on, you have to do a lot of tricks with the ArcGIS API to recognize who you are. So if your organization has additional steps, then you can simply log into your ArcGIS Online, authenticate as necessary, and you can take care of that that way. So, these are some screenshots from the first near real time test, which happened to coincide with Hurricane Milton,
00:19:11:08 - 00:19:43:00
Jerad King
and this was interesting because just in Map Viewer, we were seeing the latest measurements from different buoys that report data to the ERDDAP, the GCOOS Near Real Time ERDDAP, and we’re able to make charts of that data. And that functionality also allows us to then overlay, say, this ESRI Living Atlas Layer that tracks hurricanes in real time, and allows you to visualize that in conjunction with your sensor data.
00:19:43:02 - 00:20:11:22
Jerad King
And on the right we have a chart of that air pressure dropping as the hurricane moved over - that was C13, which I believe suffered some damage. So, the next problem I want to talk about is how do we know which datasets to update? And ideally if, say, this was just a GCOOS tool, we would have a hosted database somewhere that we would make a call to and get the relevant information, and update the dataset.
00:20:11:24 - 00:20:38:13
Jerad King
So that initially looked like creating a database in the user's ArcGIS Online python environment, but that raised some questions of reliability. That environment can be a little bit chaotic. Files could get deleted or overwritten, and say if you have 200 datasets that you want to update or manage, you can't afford to have just your database sitting in an ArcGIS Online file, right?
00:20:38:15 - 00:21:04:15
Jerad King
So the solution is to include the information that we need to update the data in the data itself, in that hosted feature layer. And what that looks like actually within the program is this. So we use the tags to first find the datasets we need to update. In this case, we're looking for the erddap2agol tag, which would return all our ERDDAP datasets,
00:21:04:17 - 00:21:36:15
Jerad King
but we have this additional e2a_nrt tags. So we're seArching for those datasets there. And once those datasets are found, we pull that base URL that is also included in the tag. Now I’m not a huge fan of how that looks, so some of these things will be moved to more hidden locations of the content item that the program can still access, but users won't necessarily see because it's not the prettiest thing to have a URL sitting in the tag.
00:21:36:17 - 00:22:07:11
Jerad King
The same goes for the dataset ID. We pull that from the title, but nonetheless, that is another thing that we would like to move to tags so that you can modify the title in any way that you would like. Currently you can, as long as there is the dataset ID and a space, you can have anything you want from that because we split that dataset ID from the first index of a title after splitting the string.
00:22:07:13 - 00:22:29:15
Jerad King
So with that information, we know the base URL and the dataset IDs so we can request a dataset attribute structure and get the rest of the information that we need. And near real time is defined by ERDDAP to ArcGIS Online as a seven day moving window. So we just take the current time in UTC, and then get a time delta for seven days,
00:22:29:15 - 00:22:53:06
Jerad King
and that's our start and end time. Then we pass that update URL and feature service ID to overwrite feature service, and it updates our dataset for us. So from the Hurricane Milton test, this is what that actually looked like in terms of the time it took to update datasets and the result. And it wasn’t pretty.
00:22:53:06 - 00:23:22:05
Jerad King
It was about 20 — it could take 20 to 40 minutes to update 28 datasets — and there was about a 40% failure rate. So, needless to say, I mean, running ArcGIS Online notebook and a Python environment doesn't scream efficiency, but with that comes our solution, and that's concurrency. So, until Python’s global interpreter lock is banished,
00:23:23:00 - 00:23:46:21
Jerad King
the splitting off those subprocesses from the thread are the best we do. So we submit four to five requests simultaneously. That's a little bit of the voodoo that goes on. That just happens to be the the number that works well. And what that looks like in terms of performance is now we're updating 68 datasets in 8 to 10 minutes
00:23:46:23 - 00:24:12:05
Jerad King
with a failure rate always below 10%. And you can see in this first record here that this actually took 7 minutes to update. So a significant improvement in the amount of time it takes to update the datasets as well as a reliability improvement in the reduction of failure rate. Now in this case, you see that the data set failed after 20 minutes.
00:24:12:10 - 00:24:26:18
Jerad King
So because we know it takes about 8 to 10 minutes to update our datasets, we can set the notebook to, the time out after 20 minutes. That way it doesn't interfere with the next task.
00:24:26:20 - 00:24:55:06
Jerad King
And in terms of our current production near real time ERDDAP datasets, there are two places you can kind of access that right now. The first is our Sea Turtle Atlas site, which is in production now, and that is the view on the right. So what I like about including, for example, the dataset attributes in the tags is say, for example, you wanted the latest measurements for just air temperature.
00:24:55:08 - 00:25:17:01
Jerad King
Because we encode those dataset attributes into the text, then what you could do in the catalog is select air temperature, and then now you know that, ok, these stations are only including air temperature because there's different sensors aboard different stations, and if you don't have that a priori knowledge of what's where, then that won't be very helpful.
00:25:17:03 - 00:25:32:02
Jerad King
And then there is a content group that also can be accessed on our ArcGIS Online homepage. So now comes a little demonstration. I swapped tabs. Are we still good?
00:25:32:04 - 00:25:33:07
Matt Biddle
Yeah. That's good.
00:25:33:09 - 00:25:45:22
Jerad King
Great. So, the read me. If you go to the bottom here, you can access the ready-to-go notebook on ArcGIS Online.
00:25:45:24 - 00:25:59:01
Jerad King
And from here, you will then open the notebook,
00:25:59:03 - 00:26:24:03
Jerad King
and it will take a second to load, and after it loads - give it a few more seconds because it doesn't like to run immediately after loading. So the ArcGIS notebooks can be a little bit problematic, but that's why there's the other option of running the program in a local environment, which is described in the readme.
00:26:24:05 - 00:26:42:00
Jerad King
So as long as you're working with a Python interpreter configured and an ArcGIS enabled Conda environment, you can run the local run script.
00:26:42:02 - 00:26:49:08
Jerad King
Let’s try it again.
00:26:49:10 - 00:27:09:03
Jerad King
Now this is what I'm excited about for enterprise, because if our service is slow, then the only person I have to blame is myself, but because ArcGIS Online is a SaaS platform, we have to blame ESRI.
00:27:09:05 - 00:27:42:09
Jerad King
So in terms of the process of running the program, this is actually no longer necessary, but first we uninstall any existing version of ArcGIS Online. Then we install, we pip install from the repository. If you're cool like me, you get an administrator role caution. So now we're building the program.
00:27:42:11 - 00:28:12:15
Jerad King
Alright, so now we are using ERDDAP to ArcGIS Online. So we want to create near real time items, so we'll select that menu. And now we're looking at a list of 63 different ERDDAP servers. So this list is actually pulled from the Irish Marine Institute's awesome ERDDAP page. So as additional ERDDAPs are added to that list, they will be reflected in this program.
00:28:12:17 - 00:28:25:02
Jerad King
So let's actually go to the IOOS Sensors ERDDAP, which is number eleven. So we'll type 11. Say yes.
00:28:25:04 - 00:28:52:02
Jerad King
Alright, so now we are looking at the datasets on the IOOS Sensors ERDDAP. Now because we selected the near real time menu, we are only seeing datasets that have had data which have been updated within the last seven days. So we're sorting by recency there with the advanced search function. So, I'm gonna pick homefield advantage because I'm from La Jolla.
00:28:52:04 - 00:29:14:16
Jerad King
I missed San Diego dearly, especially in the hot Texas summer days. So we added that, we typed the the number of the dataset we want. We can see, ok, we have that in the cart, And for now we're just gonna say done. So we're downloading the data, connected to ArcGIS Online. We first add the item,
00:29:14:16 - 00:29:33:05
Jerad King
so we add the file itself, and then we create a hosted feature layer from that item on ArcGIS Online. So this will actually be the slowest step in the process because we're relying on that service itself.
00:29:33:07 - 00:29:48:12
Jerad King
Alright, so that finished up, and we can go ahead and go to our content and refresh it. And now we have that data here.
00:29:48:14 - 00:30:15:12
Jerad King
And we have the terms of use, a description, we populate the snippet text. There are the tags for the datasets so the attributes of the data, and the credits for the dataset. And then we can go over her,e and see that we have seven days of data.
00:30:15:14 - 00:30:41:09
Jerad King
Alright, so with that, now we’ll talk about the, well, let me do this. If you leave a notebook alone for too long, it will get upset. It’ll time itself out. So we'll just restart the kernel. So, I'd like to now talk about the creation of static ERDDAP datasets, particularly with regards to large datasets.
00:30:41:11 - 00:31:07:20
Jerad King
So while that update logic is its own independent function, and relies heavily on that overwrite feature service package, the one thing that is shared by all three of those processes - the static, glider, and near real time is that initial creation of datasets with an object oriented approach. And, like I mentioned before, this is due to the quirks of each service that requires special attention,
00:31:07:22 - 00:31:37:00
Jerad King
and we want to make sure that this is easy to use and handles those errors that you might come across. So, if we want to create these ERDDAP datasets individually or in batch, typically there would be things you would worry, chiefly like say dataset size. But we want to make sure that, say if you get an email saying we need, you know, X Y and Z datasets added, you don't have to go to the ERDDAP itself,
00:31:37:01 - 00:32:04:04
Jerad King
you can just run the program and trust that it will know how to properly handle the datasets. However, there's a problem with that. Posting large data sets to ArcGIS Online can be challenging. ArcGIS Online actually has a hard coded threshold in their backend for the size of tabular data, and if that threshold is exceeded, then your request or the job you submit will be throttled.
00:32:04:06 - 00:32:33:00
Jerad King
And this can be quite extreme if say you're posting a 4 GB structured tabular dataset. But this isn’t also something that's affected by ArcGIS Online. You can also overrun ERDDAP’s memory limit if you request a dataset that's too large. So, the solution is to see if a dataset exceeds that threshold, and then we divide it into chunks and post the chunks to ArcGIS Online.
00:32:33:02 - 00:32:59:02
Jerad King
However, there's a challenge with that, right? Because we established that ArcGIS throttles by rows, but for tabledap we're requesting our data by time. So this is where, you know, the program does a little bit of voodoo and estimation. With that comes estimating the time for each chunk based upon rows and the time range.
00:32:59:04 - 00:33:24:18
Jerad King
So after getting the dataset attribute structure and taking out the relevant metadata, we request a netCDF header, and from the netCDF header, we can get the number of rows. If the number of rows exceeds the ArcGIS Online threshold of 50000, then we calculate the subset. So with the number of rows divided by the total seconds, we get seconds per row,
00:33:24:20 - 00:33:47:08
Jerad King
multiply that by the row threshold, and we have our seconds per chunk, then you can build your start and end time. You know, that is predicated on the fact that there won’t be any gaps in the data, but nonetheless, if there are gaps in the data, the worst that we're doing is overestimating, right? We won't be underestimating.
00:33:47:10 - 00:34:12:03
Jerad King
So what that looks like for the ArcGIS API is, as we saw earlier in that executive run function, we have this post and published method that's configured to handle both the individual and subset scenarios. So in the case of a subset scenario, what we do is we add that first file, and then publish a hosted feature layer from that first file.
00:34:12:04 - 00:34:29:15
Jerad King
We get that hosted feature layer, then append the subsequent records to that dataset. While this would seem like it wouldn't be faster than than just posting the file outright, it helps quite a bit.
00:34:29:17 - 00:34:45:14
Jerad King
Alright. So, we'll go back to our notebook.
00:34:45:16 - 00:35:11:08
Jerad King
And we're going to create ERDDAP datasets. So now we know that the list of data sets we see will now be sorted by recency. Let's select the the GCOOS Biological and Socioeconomics server, so we'll say 30. We say yes. So this is an example of a very real need.
00:35:11:10 - 00:35:37:10
Jerad King
We were asked to publish all of our CAGES trawl line studies to ArcGIS Online. Now, that's a lot of datasets. You could write a one off script to do that, or you could use something like this program. So, in order to do that, you could say something like one through 50, and that would then add one through 50, all of those datasets.
00:35:37:10 - 00:36:07:02
Jerad King
But for demonstration’s sake, we're gonna say 1 to 3. So we add the first three data sets there and, you know, you otherwise could search next and view the remaining pages, and then add datasets here, but we're gonna say we're all done. So, you can see that we now have visibility of that NC header step,
00:36:07:03 - 00:36:34:08
Jerad King
because when we create that near real time data set, because we know we're operating with a limited range, we won't be requesting the size of the data. So in this case we connected, we have added the first item, and we are on that publishing step. And then we can go over to our content,
00:36:34:10 - 00:36:41:09
Jerad King
and check out that hosted feature layer we just created.
00:36:41:11 - 00:36:57:19
Jerad King
And you can see we have a rich text snippet and description for terms of use, or tags what's included in the dataset. Let's take a peek at it in Map Viewer.
00:36:57:21 - 00:37:09:21
Jerad King
So now we have these, these CAGES trawl line survey datasets in ArcGIS Online, just like that.
00:37:09:23 - 00:37:36:15
Jerad King
And then finally I'd like to talk a little bit about the glider data on ArcGIS Online. So like I said, we take those tabular datasets and we create line segments between each measurement, and within those lines segments, we bin the properties or the readings from the sensor, the last observation carried forward approach into the constructed line segment.
00:37:36:17 - 00:38:00:21
Jerad King
Now, glider missions can have a lot of data, so we're thinking about different ways to do this approach. Perhaps not binning into every line segment, but perhaps just a multiline with one set of properties. But that is a capability that is offered and, you know, hints toward that that three dimensional component where you do have depth
00:38:00:23 - 00:38:31:23
Jerad King
and depth is important. So, as ArcGIS Online grows to be more accommodating for such analysis, I'm excited to see where where datasets like this can go and and what insights they can unlock. But until then, you can visualize and and chart glider data directly in Map Viewer, which is quite convenient to have all at your fingertips just within that viewer.
00:38:32:00 - 00:39:01:02
Jerad King
So kind of wrapping up here and talking about our roadmap, this is more focused on development priorities rather than saying we do this at a certain date. Our current focus or the next development cycle will be grid app. So like I mentioned earlier, the program is currently limited to tabledap. However, that takes away a, you know, very sizable chunk of available datasets that we can use.
00:39:01:04 - 00:39:28:24
Jerad King
I'm not a huge fan of interfacing with tiled imagery layers in ArcGIS online, so ideally, we would like to vectorize that griddap data, and to meet some additional requirements, we’ll enhance that glider add feature with a n-day moving window so that you can specify what your overwrite period is. And then with task schedule scheduler, you'll be able to set update frequency.
00:39:29:01 - 00:39:54:07
Jerad King
There comes a point though where using say overwrite feature service might not make sense because you're overwriting a large amount of data. An append operation might be a better choice. And that's exactly what is the first bullet point for midterm development focuses. Something that's really important in GIS and and queries in general is spatial queries.
00:39:54:09 - 00:40:21:01
Jerad King
And how do we pass that to the ERDDAP API? And one of the ideas is to simply reference an existing content item on ArcGIS Online that provides a bounding box, and then using the API to find what that bounding box is, and encoding that into the request. Additionally, greater user control over publication parameters.
00:40:21:03 - 00:40:50:17
Jerad King
So currently when you add a file, the file that's added is shared at an individual level, and the file that the hosted feature layer that is created from that file, is shared to everyone. But I imagine something in the command line user interface such as this where you'll be able to adjust those those permissions, as well as that title that I had mentioned earlier. Rather than using the dataset ID from the title, use the dataset title.
00:40:50:19 - 00:41:15:07
Jerad King
Another key component will be creating views for the datasets, the near real time datasets, so that if say our update process does hang, then we know we can revert back to that view for a dataset and we’ll preserve our dataset attributes and metadata. And that kind of brings us to like what version one looks like.
00:41:15:08 - 00:41:43:03
Jerad King
And that's a full command line user interface with these features mentioned here, as well as programmatic workflows that would make sense to use. So, with that, that kind of just about wraps it up. Here's some contact information, links to our resources. Once again, thank you very much for listening and I hope this program can help.
00:41:43:05 - 00:41:47:07
Matt Biddle
Awesome. Thank you so much, Jerad. That was a fantastic presentation.