This repository investigates whether the storm impacts (i.e. Sallenger, 2000) of the June 2016 Narrabeen Storm could have been forecasted in advance.
This repository investigates whether the storm impacts ([Sallenger, 2000](https://www.jstor.org/stable/4300099#metadata_info_tab_contents)) of the June 2016 Narrabeen Storm could have been forecasted in advance. At 100 m intervals along each beach, we hindcast the storm impact as one of the four regimes defined by Sallenger (2000): swash, collision, overwash or inundation.
![image](https://i.imgur.com/urMx8Yx.jpg)
## Repository and analysis format
## Repository and analysis format
This repository follows the [Cookiecutter Data Science](https://drivendata.github.io/cookiecutter-data-science/)
This repository follows the [Cookiecutter Data Science](https://drivendata.github.io/cookiecutter-data-science/)
@ -11,22 +13,22 @@ Development is conducted using a [gitflow](https://www.atlassian.com/git/tutoria
#### Getting software requirements
#### Getting software requirements
The following requirements are needed to run various bits:
The following requirements are needed to run various bits:
- [Anacond](https://www.anaconda.com/download/): Used for processing and analysing data. The Anaconda distribution is used for managing environments and is available for Windows, Mac and Linux. Jupyter notebooks are used for exploratory analyis and communication.
- [Anaconda](https://www.anaconda.com/download/): Used for processing and analysing data. The Anaconda distribution is used for managing environments and is available for Windows, Mac and Linux. Jupyter notebooks are used for exploratory analyis and communication.
- [QGIS](https://www.qgis.org/en/site/forusers/download): Used for looking at raw LIDAR pre/post storm surveys and extracting dune crests/toes
- [QGIS](https://www.qgis.org/en/site/forusers/download): Used for looking at raw LIDAR pre/post storm surveys and extracting dune crests/toes
- [rclone](https://rclone.org/downloads/): Data is not tracked by this repository, but is backed up to a remote Chris Leaman working directory located on the WRL coastal drive. Rclone is used to sync local and remote copies. Ensure rclone.exe is located on your `PATH` environment.
- [rclone](https://rclone.org/downloads/): Data is not tracked by this repository, but is backed up to a remote Chris Leaman working directory located on the WRL coastal drive. Rclone is used to sync local and remote copies. Ensure rclone.exe is located on your `PATH` environment.
- [gnuMake](http://gnuwin32.sourceforge.net/packages/make.htm): A list of commands for processing data is provided in the `./Makefile`. Use gnuMake to launch these commands. Ensure make.exe is located on your `PATH` environment.
- [gnuMake](http://gnuwin32.sourceforge.net/packages/make.htm): A list of commands for processing data is provided in the `./Makefile`. Use gnuMake to launch these commands. Ensure make.exe is located on your `PATH` environment.
- git
- [git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git): You'll need to have git installed to push and pull from this repo. If you're not familiar with the command line usage of git, [Git Extensions](http://gitextensions.github.io/) is a Windows based GUI which makes it easier to work with git. There are a whole bunch of other [git clients](https://github.com/dictcp/awesome-git#client) that are available as well.
#### Getting the repository
#### Getting the repository
Clone the repository onto into your local environment:
Clone the repository onto into your local environment:
Commands for setting up the python environment are provided in the `Makefile`. Simply run the following commands in the repo root directory:
Commands for setting up the python environment are provided in the `Makefile`. Simply run the following commands in the repo root directory, ensuring `make` is located on your path:
```
```sh
make venv-init
make venv-init
make venv-activate
make venv-activate
make venv-requirements-install
make venv-requirements-install
@ -35,49 +37,54 @@ You can see what these commands are actually running by inspecting the `Makefile
#### Pull data
#### Pull data
The actual raw, interim and processed data are not tracked by the repository as part of good git practices. A copy of the raw data is stored on the WRL Coastal J:\ drive and can be copied using the following command.
The actual raw, interim and processed data are not tracked by the repository as part of good git practices. A copy of the raw data is stored on the WRL Coastal J:\ drive and can be copied using the following command.
```
```sh
make pull-data
make pull-data
```
```
If you have updated the data and want to copy it back to the J:\ drive, use the following command. Note that it is probably not a good idea to modify data stored in `./data/raw`.
If you have updated the data and want to copy it back to the J:\ drive, use the following command. Note that it is probably not a good idea to modify data stored in `./data/raw/`.
```
```sh
make push-data
make push-data
```
```
#### View notebooks
#### View notebooks
Jupyter notebooks have been set up to help explore the data. Once you have set up your environment and pulled the data, this is probably a good place to start as you. To run the notebook, use the following command and navigate to the `./notebooks` folder once the jupyter interface opens in your web browser.
Jupyter notebooks have been set up to help explore the data and do preliminary analysis. Once you have set up your environment and pulled the data, this is probably a good place to start. To run the notebook, use the following command and navigate to the `./notebooks/` folder once the jupyter interface opens in your web browser.
```
```sh
make notebook
make notebook
```
```
Notebooks use nbstripout to remove output before committing.
In order to allow notebook to be version controlled, [nbstripout](https://github.com/kynan/nbstripout) has been installed as a git filter. It will run automatically when commiting any changes to the notebook and strip out the outputs.
## Available data
## Available data
Raw, interim and processed data used in this analysis is kept in the `/data/` folder. Data is not tracked in the repository due to size constraints, but stored locally. A mirror is kept of the coastal folder J drive which you can
Raw, interim and processed data used in this analysis is kept in the `/data/` folder. Data is not tracked in the repository due to size constraints, but stored locally. A mirror is kept of the coastal folder `J:\` drive which you can use to push/pull to, using rclone. In order to get the data, run `make pull-data`.
use to push/pull to, using rclone. In order to get the data, run `make pull-data`.
List of data:
List of data:
- `/data/raw/processed_shorelines`: This data was recieved from Tom Beuzen in October 2018. It consists of pre/poststorm profiles at every 100 m sections along beaches ranging from Dee Why to Nambucca . Profiles are based on raw
- `./data/raw/grain_size/`: The `sites_grain_size.csv` file contains the D50 grain size of each beach as well as the references for where these values were taken from. Grain size is needed to estimate wave runup using the Power et al. (2018) runup model.
aerial LIDAR and were processed by Mitch Harley. Tides and waves (10 m contour and reverse shoaled deepwater) for each individual 100 m section is also provided.
- `./data/raw/land_lims/`: Not used (?) CKL to check
- `/data/raw/raw_lidar`: This is the raw pre/post storm aerial LIDAR which was taken for the June 2016 storm. `.las` files are the raw files which have been processed into `.tiff` files using `PDAL`. Note that these files have not
- `./data/raw/near_maps/`: This folder contains aerial imagery of some of the beaches taken from Nearmaps. It can be loaded into QGIS and examined to determine storm impacts by comparing pre and post storm images.
been corrected for systematic errors, so actual elevations should be taken from the `processed_shorelines` folder. Obtained November 2018 from Mitch Harley from the black external HDD labeled "UNSW LIDAR".
- `./data/raw/processed_shorelines/`: This data was recieved from Tom Beuzen in October 2018. It consists of pre/poststorm profiles at every 100 m sections along beaches ranging from Dee Why to Nambucca . Profiles are based on raw aerial LIDAR and were processed by Mitch Harley. Tides and waves (10 m contour and reverse shoaled deepwater) for each individual 100 m section is also provided.
- `/data/raw/profile_features`: Dune toe and crest locations based on prestorm LIDAR. Refer to `/notebooks/qgis.qgz` as this shows how they were manually extracted. Note that the shapefiles only show the location (lat/lon) of the dune crest and toe. For actual elevations, these locations need to related to the processed shorelines.
- `./data/raw/profile_features/`: Dune toe and crest locations based on prestorm LIDAR. Refer to `/notebooks/qgis.qgz` as this shows how they were manually extracted. Note that the shapefiles only show the location (lat/lon) of the dune crest and toe. For actual elevations, these locations need to related to the processed shorelines.
- `/data/raw/profile_features_tom_beuzen`: This mat file contains dune toes and crests that Tom Beuzen picked out for each profile. This is used as a basis for the toe/crest locations, but is overridden from data contained in `/data/raw/profile_features_chris_leaman`.
- `./data/raw/profile_features_chris_leaman/`: An excel file containing manually selected dune toes, crests, berms and impacts by Chris Leaman. The values in this file should take preceedence over values picked by Tom Beuzen.
- `/data/raw/profile_features_chris_leaman`: An excel file containing manually selected dune toes, crests, berms and impacts by Chris Leaman. The values in this file should take preceedence over values picked by Tom Beuzen.
- `./data/raw/profile_features_tom_beuzen/`: This mat file contains dune toes and crests that Tom Beuzen picked out for each profile. This is used as a basis for the toe/crest locations, but is overridden from data contained in `/data/raw/profile_features_chris_leaman`.
- `./data/raw/raw_lidar/`: This is the raw pre/post storm aerial LIDAR which was taken for the June 2016 storm. `.las` files are the raw files which have been processed into `.tiff` files using `PDAL`. Note that these files have not been corrected for systematic errors, so actual elevations should be taken from the `processed_shorelines` folder. Obtained November 2018 from Mitch Harley from the black external HDD labeled "UNSW LIDAR".
- `./data/raw/vol_change_kml/`: This data was obtained from Mitch Harley in Feb 2019 and is a `.kml` showing the change in subaerial volume during the storm. It is included for reference only and is not used in the analysis.
## Notebooks
## Notebooks
- `/notebooks/01_exploration.ipynb`: Shows how to import processed shorelines, waves and tides. An interactive widget plots the location and cross sections.
- `./notebooks/01_exploration.ipynb`: Shows how to import processed shorelines, waves and tides. An interactive widget plots the location and cross sections.
- `/notebooks/qgis.qgz`: A QGIS file which is used to explore the aerial LIDAR data in `/data/raw/raw_lidar`. By examining the pre-strom lidar, dune crest and dune toe lines are manually extracted. These are stored in the `/data/profile_features/`.
- `/notebooks/qgis.qgz`: A QGIS file which is used to explore the aerial LIDAR data in `/data/raw/raw_lidar`. By examining the pre-strom lidar, dune crest and dune toe lines are manually extracted. These are stored in the `/data/profile_features/`.
## TODO
## TODO
- [ ] Mitch updated the raw profiles.mat to include more information about the survey time. Our data scripts should be updated to parse this new information and include it in our dataframes.
- [ ] Setup precomit hook for automatic code formatting using [black](https://ljvmiranda921.github.io/notebook/2018/06/21/precommits-using-black-and-flake8/). Low priority as can run black using the command `make format`.
- [ ] Raw tide WL's are interpolated based on location from tide gauges. This probably isn't the most accurate method, but should have a small effect since surge elevation was low during this event. Need to assess the effect of this method.
- [ ] Raw tide WL's are interpolated based on location from tide gauges. This probably isn't the most accurate method, but should have a small effect since surge elevation was low during this event. Need to assess the effect of this method.
- [ ] Estimate max TWL from elevation where pre storm and post storm profiles are the same. Need to think more about this as runup impacting dune toe will move the dune face back, incorrectly raising the observed twl. Perhaps this estimation of max TWL is only useful for the swash regime.
- [ ] Estimate max TWL from elevation where pre storm and post storm profiles are the same. Need to think more about this as runup impacting dune toe will move the dune face back, incorrectly raising the observed twl. Perhaps this estimation of max TWL is only useful for the swash regime.
- [ ] Implement [bayesian change detection algorithm](https://github.com/hildensia/bayesian_changepoint_detection) to help detect dune crests and toes from profiles. Probably low priority at the moment since we are doing manual detection.
- [ ] Implement [bayesian change detection algorithm](https://github.com/hildensia/bayesian_changepoint_detection) to help detect dune crests and toes from profiles. Probably low priority at the moment since we are doing manual detection.
- [ ] Implement dune impact calculations as per Palmsten & Holman. Calculation should be done in a new dataframe.
- [ ] Implement dune impact calculations as per Palmsten & Holman. Calculation should be done in a new dataframe.
- [ ] Implement data/interim/*.csv file checking using py.test. Check for correct columns, number of nans etc. Testing of code is probably a lower priority than just checking the interim data files at the moment. Some functions which should be tested are the slope functions in `forecast_twl.py`, as these can be tricky with different profiles.
- [ ] Implement data/interim/*.csv file checking using py.test. Check for correct columns, number of nans etc. Testing of code is probably a lower priority than just checking the interim data files at the moment. Some functions which should be tested are the slope functions in `forecast_twl.py`, as these can be tricky with different profiles.
- [ ] Investigate using [modin](https://github.com/modin-project/modin) to help speed up analysis.
- [ ] Convert runup model functions to use numpy arrays instead of pandas dataframes. This should give a bit of a speedup.
- [X] Need to think about how relative imports are handled, see [here](https://chrisyeh96.github.io/2017/08/08/definitive-guide-python-imports.html). Maybe the click CLI interface should be moved to the `./src/` folder and it can import all the other packages?
- [ ] Simplify runup_models in Stockdon06 - we should really only have one function for each runup model. Need to make it work with individual values or entire dataframe. Use [np.maskedarray](https://docs.scipy.org/doc/numpy-1.15.0/reference/maskedarray.generic.html)
## Misc
If Qt5Agg error comes up when trying to plot in PyCharm: https://stackoverflow.com/a/42231526