Merge branch 'develop'

7 years ago · 2a27f4e535
parent 8eea9c4393 a3b07f20e8
commit 2a27f4e535
6 changed files with 1046 additions and 911 deletions
--- a/README.md
+++ b/README.md
@ -1,30 +1,56 @@
 # 2016 Narrabeen Storm EWS Performance
-This repository investigates whether the storm impacts (i.e. Sallenger, 2000) of the June 2016 Narrabeen Storm could have been forecasted in advance.
+This repository investigates whether the storm impacts (i.e. Sallenger, 2000) of the June 2016 Narrabeen Storm could 
 have been forecasted in advance. 
 ## Repository and analysis format
 This repository follows the [Cookiecutter Data Science](https://drivendata.github.io/cookiecutter-data-science/) 
-structure where possible. The analysis is done in python (look at the `/src/` folder) with some interactive, exploratory notebooks located at `/notebooks`.
+structure where possible. The analysis is done in python (look at the `/src/` folder) with some interactive, 
 exploratory notebooks located at `/notebooks`. 
 Development is conducted using a [gitflow](https://www.atlassian
 .com/git/tutorials/comparing-workflows/gitflow-workflow) approach - mainly the `master` branch stores the official 
 release history and the `develop` branch serves as an integration branch for features. Other `hotfix` and `feature` 
 branches should be created and merged as necessary.
 ## Where to start?
 1. Clone this repository.
 2. Pull data from WRL coastal J drive with `make pull-data`
-3. Check out jupyter notebook `./notebooks/01_exploration.ipynb` which has an example of how to import the data and some interactive widgets.
+3. Check out jupyter notebook `./notebooks/01_exploration.ipynb` which has an example of how to import the data and 
 some interactive widgets. 
 ## Requirements
 The following requirements are needed to run various bits:
- [Python 3.6+](https://conda.io/docs/user-guide/install/windows.html): Used for processing and analysing data. Jupyter notebooks are used for exploratory analyis and communication.
+- [Python 3.6+](https://conda.io/docs/user-guide/install/windows.html): Used for processing and analysing data. 
- [QGIS](https://www.qgis.org/en/site/forusers/download): Used for looking at raw LIDAR pre/post storm surveys and extracting dune crests/toes
+Jupyter notebooks are used for exploratory analyis and communication.
- [rclone](https://rclone.org/downloads/): Data is not tracked by this repository, but is backed up to a remote Chris Leaman working directory located on the WRL coastal drive. Rclone is used to sync local and remote copies. Ensure rclone.exe is located on your `PATH` environment.
+- [QGIS](https://www.qgis.org/en/site/forusers/download): Used for looking at raw LIDAR pre/post storm surveys and 
- [gnuMake](http://gnuwin32.sourceforge.net/packages/make.htm): A list of commands for processing data is provided in the `./Makefile`. Use gnuMake to launch these commands. Ensure make.exe is located on your `PATH` environment.
+extracting dune crests/toes
 - [rclone](https://rclone.org/downloads/): Data is not tracked by this repository, but is backed up to a remote 
 Chris Leaman working directory located on the WRL coastal drive. Rclone is used to sync local and remote copies. 
 Ensure rclone.exe is located on your `PATH` environment.
 - [gnuMake](http://gnuwin32.sourceforge.net/packages/make.htm): A list of commands for processing data is provided in
 the `./Makefile`. Use gnuMake to launch these commands. Ensure make.exe is located on your `PATH` environment.
 ## Available data
-Raw, interim and processed data used in this analysis is kept in the `/data/` folder. Data is not tracked in the repository due to size constraints, but stored locally. A mirror is kept of the coastal folder J drive which you can use to push/pull to, using rclone. In order to get the data, run `make pull-data`.
+Raw, interim and processed data used in this analysis is kept in the `/data/` folder. Data is not tracked in the 
 repository due to size constraints, but stored locally. A mirror is kept of the coastal folder J drive which you can 
 use to push/pull to, using rclone. In order to get the data, run `make pull-data`.
 List of data:
- `/data/raw/processed_shorelines`: This data was recieved from Tom Beuzen in October 2018. It consists of pre/post storm profiles at every 100 m sections along beaches ranging from Dee Why to Nambucca . Profiles are based on raw aerial LIDAR and were processed by Mitch Harley. Tides and waves (10 m contour and reverse shoaled deepwater) for each individual 100 m section is also provided.
+- `/data/raw/processed_shorelines`: This data was recieved from Tom Beuzen in October 2018. It consists of pre/post 
- `/data/raw/raw_lidar`: This is the raw pre/post storm aerial LIDAR which was taken for the June 2016 storm. `.las` files are the raw files which have been processed into `.tiff` files using `PDAL`. Note that these files have not been corrected for systematic errors, so actual elevations should be taken from the `processed_shorelines` folder. Obtained November 2018 from Mitch Harley from the black external HDD labeled "UNSW LIDAR".
+storm profiles at every 100 m sections along beaches ranging from Dee Why to Nambucca . Profiles are based on raw 
- `/data/raw/profile_features`: Dune toe and crest locations based on prestorm LIDAR. Refer to `/notebooks/qgis.qgz` as this shows how they were manually extracted. Note that the shapefiles only show the location (lat/lon) of the dune crest and toe. For actual elevations, these locations need to related to the processed shorelines.
+aerial LIDAR and were processed by Mitch Harley. Tides and waves (10 m contour and reverse shoaled deepwater) for 
 each individual 100 m section is also provided.
 - `/data/raw/raw_lidar`: This is the raw pre/post storm aerial LIDAR which was taken for the June 2016 storm. `.las` 
 files are the raw files which have been processed into `.tiff` files using `PDAL`. Note that these files have not 
 been corrected for systematic errors, so actual elevations should be taken from the `processed_shorelines` folder. 
 Obtained November 2018 from Mitch Harley from the black external HDD labeled "UNSW LIDAR".
 - `/data/raw/profile_features`: Dune toe and crest locations based on prestorm LIDAR. Refer to `/notebooks/qgis.qgz` 
 as this shows how they were manually extracted. Note that the shapefiles only show the location (lat/lon) of the dune
 crest and toe. For actual elevations, these locations need to related to the processed shorelines.
 ## Notebooks
- `/notebooks/01_exploration.ipynb`: Shows how to import processed shorelines, waves and tides. An interactive widget plots the location and cross sections.
+- `/notebooks/01_exploration.ipynb`: Shows how to import processed shorelines, waves and tides. An interactive widget
- `/notebooks/qgis.qgz`: A QGIS file which is used to explore the aerial LIDAR data in `/data/raw/raw_lidar`. By examining the pre-strom lidar, dune crest and dune toe lines are manually extracted. These are stored in the `/data/profile_features/`. 
+ plots the location and cross sections.
 - `/notebooks/qgis.qgz`: A QGIS file which is used to explore the aerial LIDAR data in `/data/raw/raw_lidar`. By 
 examining the pre-strom lidar, dune crest and dune toe lines are manually extracted. These are stored in the 
 `/data/profile_features/`. 
--- a/notebooks/01_exploration.ipynb
+++ b/notebooks/01_exploration.ipynb
--- a/src/analysis/compare_impacts.py
+++ b/src/analysis/compare_impacts.py
@ -0,0 +1,32 @@
 """
 Compares forecasted and observed impacts, putting them into one data frame and exporting the results.
 """
 import logging.config
 import os
 import pandas as pd
 logging.config.fileConfig('./src/logging.conf', disable_existing_loggers=False)
 logger = logging.getLogger(__name__)
 def compare_impacts(df_forecasted, df_observed):
    """
    Merge forecasted and observed storm impacts
    :param df_forecasted:
    :param df_observed:
    :return:
    """
    df_compared = df_forecasted.merge(df_observed, left_index=True, right_index=True,
                                      suffixes=['_forecasted', '_observed'])
    return df_compared
 if __name__ == '__main__':
    logger.info('Importing existing data')
    data_folder = './data/interim'
    df_forecasted = pd.read_csv(os.path.join(data_folder, 'impacts_forecasted_mean_slope_sto06.csv'), index_col=[0])
    df_observed = pd.read_csv(os.path.join(data_folder, 'impacts_observed.csv'), index_col=[0])
    df_compared = compare_impacts(df_forecasted, df_observed)
    df_compared.to_csv(os.path.join(data_folder, 'impacts_observed_vs_forecasted_mean_slope_sto06.csv'))
--- a/src/analysis/forecasted_storm_impacts.py
+++ b/src/analysis/forecasted_storm_impacts.py
@ -0,0 +1,73 @@
 """
 Estimates the forecasted storm impacts based on the forecasted water level and dune crest/toe.
 """
 import logging.config
 import os
 import pandas as pd
 logging.config.fileConfig('./src/logging.conf', disable_existing_loggers=False)
 logger = logging.getLogger(__name__)
 def forecasted_impacts(df_profile_features, df_forecasted_twl):
    """
    Combines our profile features (containing dune toes and crests) with water levels, to get the forecasted storm
    impacts.
    :param df_profile_features:
    :param df_forecasted_twl:
    :return:
    """
    logger.info('Getting forecasted storm regimes')
    df_forecasted_impacts = pd.DataFrame(index=df_profile_features.index)
    # For each site, find the maximum R_high value and the corresponding R_low value.
    idx = df_forecasted_twl.groupby(level=['site_id'])['R_high'].idxmax().dropna()
    df_r_vals = df_forecasted_twl.loc[idx, ['R_high', 'R_low']].reset_index(['datetime'])
    df_forecasted_impacts = df_forecasted_impacts.merge(df_r_vals, how='left', left_index=True, right_index=True)
    # Join with df_profile features to find dune toe and crest elevations
    df_forecasted_impacts = df_forecasted_impacts.merge(df_profile_features[['dune_toe_z', 'dune_crest_z']],
                                                        how='left',
                                                        left_index=True,
                                                        right_index=True)
    # Compare R_high and R_low wirth dune crest and toe elevations
    df_forecasted_impacts = storm_regime(df_forecasted_impacts)
    return df_forecasted_impacts
 def storm_regime(df_forecasted_impacts):
    """
    Returns the dataframe with an additional column of storm impacts based on the Storm Impact Scale. Refer to
    Sallenger (2000) for details.
    :param df_forecasted_impacts:
    :return:
    """
    logger.info('Getting forecasted storm regimes')
    df_forecasted_impacts.loc[
        df_forecasted_impacts.R_high <= df_forecasted_impacts.dune_toe_z, 'storm_regime'] = 'swash'
    df_forecasted_impacts.loc[
        df_forecasted_impacts.dune_toe_z <= df_forecasted_impacts.R_high, 'storm_regime'] = 'collision'
    df_forecasted_impacts.loc[(df_forecasted_impacts.dune_crest_z <= df_forecasted_impacts.R_high) &
                              (df_forecasted_impacts.R_low <= df_forecasted_impacts.dune_crest_z),
                              'storm_regime'] = 'overwash'
    df_forecasted_impacts.loc[(df_forecasted_impacts.dune_crest_z <= df_forecasted_impacts.R_low) &
                              (df_forecasted_impacts.dune_crest_z <= df_forecasted_impacts.R_high),
                              'storm_regime'] = 'inundation'
    return df_forecasted_impacts
 if __name__ == '__main__':
    logger.info('Importing existing data')
    data_folder = './data/interim'
    df_profiles = pd.read_csv(os.path.join(data_folder, 'profiles.csv'), index_col=[0, 1, 2])
    df_profile_features = pd.read_csv(os.path.join(data_folder, 'profile_features.csv'), index_col=[0])
    df_forecasted_twl = pd.read_csv(os.path.join(data_folder, 'twl_mean_slope_sto06.csv'), index_col=[0, 1])
    df_forecasted_impacts = forecasted_impacts(df_profile_features, df_forecasted_twl)
    df_forecasted_impacts.to_csv(os.path.join(data_folder, 'impacts_forecasted_mean_slope_sto06.csv'))
--- a/src/analysis/observed_storm_impacts.py
+++ b/src/analysis/observed_storm_impacts.py
@ -0,0 +1,137 @@
 import logging.config
 import os
 import numpy as np
 import pandas as pd
 from scipy.integrate import simps
 logging.config.fileConfig('./src/logging.conf', disable_existing_loggers=False)
 logger = logging.getLogger(__name__)
 def return_first_or_nan(l):
    """
    Returns the first value of a list if empty or returns nan. Used for getting dune/toe and crest values.
    :param l:
    :return:
    """
    if len(l) == 0:
        return np.nan
    else:
        return l[0]
 def volume_change(df_profiles, df_profile_features, zone):
    """
    Calculates how much the volume change there is between prestrom and post storm profiles.
    :param df_profiles:
    :param df_profile_features:
    :param zone: Either 'swash' or 'dune_face'
    :return:
    """
    logger.info('Calculating change in beach volume in {} zone'.format(zone))
    df_vol_changes = pd.DataFrame(index=df_profile_features.index)
    df_profiles = df_profiles.sort_index()
    sites = df_profiles.groupby(level=['site_id'])
    for site_id, df_site in sites:
        logger.debug('Calculating change in beach volume at {} in {} zone'.format(site_id, zone))
        prestorm_dune_toe_x = df_profile_features.loc[df_profile_features.index == site_id].dune_toe_x.tolist()
        prestorm_dune_crest_x = df_profile_features.loc[df_profile_features.index == site_id].dune_crest_x.tolist()
        # We may not have a dune toe or crest defined, or there may be multiple defined.
        prestorm_dune_crest_x = return_first_or_nan(prestorm_dune_crest_x)
        prestorm_dune_toe_x = return_first_or_nan(prestorm_dune_toe_x)
        # If no dune to has been defined, Dlow = Dhigh. Refer to Sallenger (2000).
        if np.isnan(prestorm_dune_toe_x):
            prestorm_dune_toe_x = prestorm_dune_crest_x
        # Find last x coordinate where we have both prestorm and poststorm measurements. If we don't do this,
        # the prestorm and poststorm values are going to be calculated over different lengths.
        df_zone = df_site.dropna(subset=['z'])
        x_last_obs = min([max(df_zone.query("profile_type == '{}'".format(profile_type)).index.get_level_values('x'))
                          for profile_type in ['prestorm', 'poststorm']])
        # Where we want to measure pre and post storm volume is dependant on the zone selected
        if zone == 'swash':
            x_min = prestorm_dune_toe_x
            x_max = x_last_obs
        elif zone == 'dune_face':
            x_min = prestorm_dune_crest_x
            x_max = prestorm_dune_toe_x
        else:
            logger.warning('Zone argument not properly specified. Please check')
            x_min = None
            x_max = None
        # Now, compute the volume of sand between the x-coordinates prestorm_dune_toe_x and x_swash_last for both prestorm
        # and post storm profiles.
        prestorm_vol = beach_volume(x=df_zone.query("profile_type=='prestorm'").index.get_level_values('x'),
                                    z=df_zone.query("profile_type=='prestorm'").z,
                                    x_min=x_min,
                                    x_max=x_max)
        poststorm_vol = beach_volume(x=df_zone.query("profile_type=='poststorm'").index.get_level_values('x'),
                                     z=df_zone.query("profile_type=='poststorm'").z,
                                     x_min=x_min,
                                     x_max=x_max)
        df_vol_changes.loc[site_id, 'prestorm_{}_vol'.format(zone)] = prestorm_vol
        df_vol_changes.loc[site_id, 'poststorm_{}_vol'.format(zone)] = poststorm_vol
        df_vol_changes.loc[site_id, '{}_vol_change'.format(zone)] = prestorm_vol - poststorm_vol
    return df_vol_changes
 def beach_volume(x, z, x_min=np.NINF, x_max=np.inf):
    """
    Returns the beach volume of a profile, calculated with Simpsons rule
    :param x: x-coordinates of beach profile
    :param z: z-coordinates of beach profile
    :param x_min: Minimum x-coordinate to consider when calculating volume
    :param x_max: Maximum x-coordinate to consider when calculating volume
    :return:
    """
    profile_mask = [True if x_min < x_coord < x_max else False for x_coord in x]
    x_masked = np.array(x)[profile_mask]
    z_masked = np.array(z)[profile_mask]
    if len(x_masked) == 0 or len(z_masked) == 0:
        return np.nan
    else:
        return simps(z_masked, x_masked)
 def storm_regime(df_observed_impacts):
    """
    Returns the dataframe with an additional column of storm impacts based on the Storm Impact Scale. Refer to
    Sallenger (2000) for details.
    :param df_observed_impacts:
    :return:
    """
    logger.info('Getting observed storm regimes')
    df_observed_impacts.loc[df_observed_impacts.swash_vol_change < 3, 'storm_regime'] = 'swash'
    df_observed_impacts.loc[df_observed_impacts.dune_face_vol_change > 3, 'storm_regime'] = 'collision'
    return df_observed_impacts
 if __name__ == '__main__':
    logger.info('Importing existing data')
    data_folder = './data/interim'
    df_profiles = pd.read_csv(os.path.join(data_folder, 'profiles.csv'), index_col=[0, 1, 2])
    df_profile_features = pd.read_csv(os.path.join(data_folder, 'profile_features.csv'), index_col=[0])
    logger.info('Creating new dataframe for observed impacts')
    df_observed_impacts = pd.DataFrame(index=df_profile_features.index)
    logger.info('Getting pre/post storm volumes')
    df_swash_vol_changes = volume_change(df_profiles, df_profile_features, zone='swash')
    df_dune_face_vol_changes = volume_change(df_profiles, df_profile_features, zone='dune_face')
    df_observed_impacts = df_observed_impacts.join([df_swash_vol_changes, df_dune_face_vol_changes])
    # Classify regime based on volume changes
    df_observed_impacts = storm_regime(df_observed_impacts)
    # Save dataframe to csv
    df_observed_impacts.to_csv(os.path.join(data_folder, 'impacts_observed.csv'))
--- a/src/data/mat_to_csv.py
+++ b/src/data/mat_to_csv.py
@ -7,6 +7,7 @@ from datetime import datetime, timedelta
 import pandas as pd
 from mat4py import loadmat
 import numpy as np
 logging.config.fileConfig('./src/logging.conf', disable_existing_loggers=False)
 logger = logging.getLogger(__name__)
@ -152,6 +153,25 @@ def parse_profiles(profiles_mat):
    df = pd.DataFrame(rows)
    return df
 def remove_zeros(df_profiles):
    """
    When parsing the pre/post storm profiles, the end of some profiles have constant values of zero. Let's change
    these to NaNs for consistancy. Didn't use pandas fillnan because 0 may still be a valid value.
    :param df:
    :return:
    """
    df_profiles = df_profiles.sort_index()
    groups = df_profiles.groupby(level=['site_id','profile_type'])
    for key, _ in groups:
        logger.debug('Removing zeros from {} profile at {}'.format(key[1], key[0]))
        idx_site = (df_profiles.index.get_level_values('site_id') == key[0]) & \
                   (df_profiles.index.get_level_values('profile_type') == key[1])
        df_profile = df_profiles[idx_site]
        x_last_ele = df_profile[df_profile.z!=0].index.get_level_values('x')[-1]
        df_profiles.loc[idx_site & (df_profiles.index.get_level_values('x')>x_last_ele), 'z'] = np.nan
    return df_profiles
 def matlab_datenum_to_datetime(matlab_datenum):
    # https://stackoverflow.com/a/13965852
@ -228,6 +248,9 @@ def main():
    df_tides.set_index(['site_id', 'datetime'], inplace=True)
    df_sites.set_index(['site_id'], inplace=True)
    logger.info('Nanning profile zero elevations')
    df_profiles = remove_zeros(df_profiles)
    logger.info('Outputting .csv files')
    df_profiles.to_csv('./data/interim/profiles.csv')
    df_tides.to_csv('./data/interim/tides.csv')