gallery/rafn 2

Tools

TOOLS REACHABLE BY RIGHT CLICKING DATA COLUMN

Add column in front of selected

Creates a new column to the left of the selected column

Add column after selected

Creates a new column to the right of the selected column

Delete Selected columns

Deletes selected columns. Multiple columns can be selected and deleted at once

Copy content of selected column

Copies every row of the selected column to memory

Paste to selected column

Pastes rows saved to memory by using the Copy content of selected column tool into the selected column

Rename Column

Opens an input control that lets user type a new header text for selected column

Set column data format

Changes the data format of the selected column. Format can be set as Text, Boolean, DateTime, Integer, Decimal or Double.

Save selected cell value to memory

Saves the value of selected cell to memory. A maximum of five value items can be saved (slot 1-5). Saved items can be inserted using the Insert from memory tool in the String operations category

Delete rows

Deletes the selected rows. To use this tool, select the Select rows option in the menu above the data area. Don’t forget to change back to Select columns before continuing

First to header

Changes the first row in the data area to the header row. The tool is selected in the menu above the data area

Edit cells manually

Manual cell edit operations will also be saved into current workflow if Save workflow as is checked and a workflow name is entered

GRID OPERATIONS

Delete rows by value

Presents user with a list of all unique values in selected column. Values can be checked in list and rows with checked values are deleted

Delete rows that contain

Rows containing entered string value will be deleted

Delete rows that dont contain

Rows not containing entered string value will be deleted

Delete duplicate rows

Rows with duplicate values will be deleted. Row with the first occurrence of each value will be kept

Add content to column

Content of the selected column will be added to (after) the content of column selected in the Column(x) drop-down menu

Merge columns

Contents of the selected column will be merged with the content of column selected in the drop-down menu showing all columns currently in the data area. Merged parts may be separated by a separator entered in separator field

Calculate frequency

Presents user with a list of all unique values in selected column and a count of occurrences of each value

Insert source name

Inserts into the selected column either the source file name or the complete source file path. If source is a database, SQL-query will be inserted. If source is an Excel file, you may choose to append the sheet name to the file path

Delete row if/if not in memory

Rows with/without string value held in memory (slot 1-5, see Save selected cell value to memory) in the selected column will be deleted

Delete empty rows

Empty rows (containing empty string value) in selected column will be deleted. Rows with NULL values will not be affected

STRING OPERATIONS

Replace

Value of (or sequence of characters in) row in selected column containing string entered in Value to replace textbox will be replaced by value entered in Replacement value textbox. The tool is case sensitive

Limit left

String value in rows of selected column will be truncated to number of characters entered in String length textbox, counting from the left side

Limit right

String value in rows of selected column will be truncated to number of characters entered in String length textbox, counting from the right side

Limit mid

String value in rows of selected column will be truncated to number of characters entered in String length textbox, counting from position entered in Starting position textbox (index starting with 0)

Split

String values in selected columns will be split at the first instance of character entered in the Split character textbox. Second part of the split string (including the split character) is moved to column selected in the Column(x) drop-down menu

Add text/prefix

Text entered in the Text/prefix textbox will be added in front of values of rows in selected column

Add suffix

Text entered in the Text/suffix textbox will be added after values of rows in selected column

Add suffix if contains value

Text entered in the Text/suffix textbox will be added after values of rows in selected column if the rows contains a value

Remove letters

Characters that are letters will be removed from rows in selected columns (special characters are not affected)

Remove non-numeric

Non numerical characters will be removed from rows in selected columns (special characters included)

Remove numeric

Numerical characters will be removed from rows in selected columns

Remove text

Text entered in the Text to remove textbox will be removed from rows in selected columns

Trim

Leading and trailing spaces will be removed from rows in selected columns

Insert from memory

Values held in memory (slot 1-5, see Save selected cell value to memory) will be inserted into every row of selected column

To lowercase

Letters of rows in selected column will be converted to uppercase

To uppercase

Letters of rows in selected column will be converted to lowercase

CONDITIONAL OPERATIONS

Insert in column(x) if

Inserts value entered in the Value textbox into every row of the column chosen in the Column(x) dropdown menu IF the value of the corresponding row in selected column fits condition entered into the Condition textbox with respect to logic selected in the Logical operator drop-down menu. Logical rules may be that row in selected column should Start with, contain or not contain the entered condition, or that it contains value or is null or empty

Copy to column(x) if

Copies value in row of selected column to column chosen in the Column(x) dropdown menu IF it fits condition entered into the Condition textbox with respect to logic selected in the Logical operator drop-down menu. Logical rules may be that row in selected column should Start with, contain or not contain the entered condition, or that it contains value or is null or empty

Column to column(x) if column(y)

Copies value in row of selected column to column chosen in the Column(x) dropdown menu IF corresponding row in column selected in the Column(y) drop-down menu fits condition entered into the Condition textbox with respect to logic selected in the Logical operator drop-down menu. Logical rules may be that row in selected column should Start with, contain or not contain the entered condition, or that it contains value or is null or empty

Insert in column (x) if in/not in numeric range

Inserts value entered in the Value textbox into every row of the column chosen in the Column(x) dropdown menu IF the numerical value of the corresponding row in selected column is within or not within (select logic in the In/Not in drop-down menu) numerical range entered in the Range start and Range end textboxes

Copy to column(x) if in/not in numeric range

Copies values of rows of selected column to corresponding rows in the column chosen in the Column(x) dropdown menu IF the numerical value of row in selected column is within or not within (select logic in the In/Not in drop-down menu) numerical range entered in the Range start and Range end textboxes

Insert in column (x) if in/not in temporal range

Inserts value entered in the Value textbox into every row of the column chosen in the Column(x) dropdown menu IF the temporal value of the corresponding row in selected column is within or not within (select logic in the In/Not in drop-down menu) temporal range entered in the Range start and Range end textboxes.

Copy to column(x) if in/not in temporal range

Copies values of rows of selected column to corresponding rows in the column chosen in the Column(x) dropdown menu IF the numerical value of row in selected column is within or not within (select logic in the In/Not in drop-down menu) temporal range entered in the Range start and Range end textboxes

Value with highest frequency to memory

The most frequent value in row of selected column will be saved in memory. A maximum of five value items can be saved (slot 1-5). Saved items can be inserted using the Insert from memory tool in the String operations category

Insert in column(x) if in memory

Inserts value entered in the Value textbox into every row of the column chosen in the Column(x) dropdown menu IF the corresponding row of selected column contains value held in memory (slot 1-5)

MATHEMATICAL OPERATIONS

Count values

Displays the number of non-null or empty values in selected column. Results can be saved to memory (slot 1-5).  Use the Save method drop-down menu to decide whether to overwrite or add to existing value in memory slot

Sum values

Displays the sum of numeric values in selected column. Results can be saved to memory (slot 1-5).  Use the Save method drop-down menu to decide whether to overwrite or add to existing value in memory slot. Selected column must contain only numerical values

Basic arithmetic

Performs arithmetic operation on selected column. Operation may be addition, subtraction, multiplication, division, exponentiation, root of nth degree or logarithm. Values of rows in selected column can be added to, subtracted from, etc. a fixed value entered in the Fixed value for calculation textbox or the value in corresponding row in column selected in the Calculation values column drop-down menu. Columns involved in the operation must contain only numerical values

Algebra

Performs an algebraic calculation on values in selected column with respect to values of one or multiple other columns. A mathematical expression may be stated in the Expression textbox, using X to denote the selected column and Y1, Y2 and Y3 to denote columns selected in the Y1, Y2 and Y3 drop-down menus (for example X = 1 + Y1^3 / Y2). Valid mathematical operators are +, -, *, /, ^ and ∑. Roots should be expressed in exponential form, for example ^0,5 for square root. Columns involved in the operation must contain only numerical values

Calculate DateTimes

Adds or subtracts TimeSpan values to or from DateTime values in selected column. Calculations may be carried out with respect to corresponding values of column selected in the TimeSpan/Number column drop-down menu or with respect to a Fixed value in the Fixed TimeSpan/Number textbox. Values used to add to or subtract from DateTimes in selected column should be in hh:mm:ss format or an integer or decimal number denoting hours. Use the Select add or subtract drop-down menu to chose whether to add to or subtract from DateTime-value in selected column

Linear regression

Performs linear OLS regression to estimate the relationship between an independent and a dependent variable. Selected column represents the independent variable. Column selected in the Independent variable drop-down menu represents the dependent variable. Results are displayed in a new column added in front of existing columns. Columns involved in the operation must contain only numerical values. Information about OLS can be found here.

Multiple regression

Performs linear OLS regression to estimate the relationship between multiple independent variables and a dependent variable. Selected column represents the dependent variable. Columns checked in the list showing all columns in the data area represents the dependent variables. Results are displayed in a new column added in front of existing columns. Columns involved in the operation must contain only numerical values. Information about OLS can be found here.

Remove outliers

Removes rows containing outliers (data points that differs significantly from other observations) from loaded data set using the Interquartile Range method. Rows of selected column must contain only numerical values. Information about the method can be found here. Note that since the tool calculates one value to be removed at a time and then recalculates, the method will be slow for large data sets. For such situations the Shrink sample around mean tool may be an alternative.

Shrink sample around mean

Calculates the mean value of rows in selected columns and removes rows containing values that differs from the mean at a selected level. Use the Net sample size textbox to provide the percentage (integer number) of the data set that should remain after the operation is carried out. Rows of selected column must contain only numerical values

Variable selection

Uses a forward selection algorithm to estimate if a subset of loaded data may be relevant in predicting the value of a dependent variable.  Dependent variable is represented by the selected column. All other columns present in the loaded data set are tested for relevance in predicting the value of the dependent column. All values in the data set must be numerical. Use the Selection method drop-down menu to select one or a combination of methods to be used.

 

One by one (1b1) method performs linear OLS regression on each of the columns present in the data set and returns output for each of the variables that are statistically significant at the confidence level selected in the Minimum confidence level drop-down menu and that have a calculated coefficient of determination (r2) higher than the value typed in the Minimum R2 textbox. The result is displayed as a new data set, with one row detailing each of the remaining variables. Hit Undo to get back to your data.

 

1b1 + Collinearity test (coltest) performs the 1b1 method described above, and then goes on to test whether there is a linear association between any two of the remaining variables using the Pearson correlation method. If such association is found, that have higher Pearson correlation than the value typed in the Max allowed collinearity textbox, the less significant variable of the pair is removed from the result.

 

1b1 + F-test performs the 1b1 method, then calculates the overall significance of a model using only the most significant variable. It then adds each of the remaining variables to the model, keeping the added variable only if the overall significance of the model is increased by adding the variable. The result is presented in a new column added in front of existing columns, detailing the overall model and each of the remaining variables.

 

1b1 + coltest + F-test performs all methods described above in a sequence

Value to dummy

Creates a new column for each distinct value in selected column. The rows of the created columns take the value 0 or 1 depending on whether the distinct value that caused the column to be created exists in the corresponding row of the selected column

Validate or calculate check digit

Uses the Luhn-algorithm to verify or calculate check digits of numerical values in selected column (e.g. a credit card number or IMEI-number). Select whether to Verify check digit or Calculate check digit in the Action drop-down menu. Verify will return true or false (is the check digit correct) in column selected in the Column(x) drop-down menu. Calculate will calculate the check digit and add it to the number in selected column

T-distribution table

Displays a two-tailed T-distribution table that can be used to check for statistical significance of OLS-regression results at different confidence levels

UTC to local time

Converts DateTime or string representations of DateTime values in selected column from UTC-time (Coordinated Universal Time) to local time (depending on culture settings on the local system)

CONVERSIONS

Date and time

Converts DateTime or string representations of DateTime values in selected column from format selected in the Source format drop-down menu to format selected in the Target format drop-down menu. The source format may be any valid DateTime format (dates may be validly expressed f. ex as yyyy-MM-dd, yyyy/MM/dd, MM/dd/yyyy or MMM/dd/yy, time may be validly expressed f. ex as hh:mm:ss or H.mm.ss AM) or a value that cannot be validly parsed as DateTime but that can be selected in the Source format menu

Time span

Converts TimeSpan or string representations of TimeSpan values in selected column from format selected in the Source format drop-down menu to format selected in the Target format drop-down menu. Source formats may be hh:mm:ss, strings using h, s or m/min to denote hours minutes and seconds, integer seconds, decimal minutes, decimal hours or 1899-12-30 hh:mm:ss (which is often the result of converting a time-only value to DateTime)

Amount

Converts multiple representations of numerical amounts into current culture numeric form. Use the Grouping character drop-down menu to allow for selected thousand separators or other grouping of numbers. Use the Separate result into +/- columns drop-down menu to decide whether resulting positive and negative numbers should be separated into different columns (new columns will automatically be created)

Geographic coordinates

Converts geographical coordinates in decimal WGS84-format, DMS WGS84-format, UTM-format or MGRS-format into any other of these formats. Select source and target formats in the Source format and Target format drop-down menus. For UTM and MGRS formats hemisphere must be selected. For MGRS format UTM zone/GZD and Grid square must also be entered

Phone number

Converts phone numbers in selected column into different standard formats. Use the Source format and Target format drop-down menus to select the source and target formats. If rows of selected columns have mixed source formats Mixed or other may be selected to choose a generic conversion algorithm. If any of the phone numbers to convert include country codes, or if you want country codes in the output values, values for Expected country code and Min length of international number must be provided (standard length is 10)

Number

Converts numbers in selected column between decimal, hexadecimal, binary and octal formats. Use the Source format and Target format drop-down menus to select source and target formats

EXPORT

Export to new file

Exports loaded and transformed data to a new file. Export can be performed into Excel, .csv, tab separated text, XML and Json formats. For file exports, you will find additional settings in the general settings reachable through the top-level Menu item

Export to database

Exports loaded and transformed data to a database. Supported database types are MS Access, SQL Server, SQLite and MySQL. Use the Connect to database control (opens automatically) to connect to a table of a database. You may then select a database field using the Select database field drop-down menu showing all fields in selected table, and hit the Add pair button to pair it up with selected column in the data area to designate the selected column to be imported into the selected field. Repeat until all the columns that you wish to import are paired with a database table field. Use the Datatable key drop-down menu to select a field in the database table that is a primary key to prevent duplication of values in this field.

 

To make imports to multiple tables, select a new table and repeat the actions described above

Create GUID

Creates a unique GUID (Globally Unique Identifier, also referred to as UUID) in each row of the selected column. GUIDs are often used as unique identifiers in databases

Create one GUID/value in selected columns

Creates one unique GUID (Globally Unique Identifier, also referred to as UUID) in each row of multiple selected columns. Identical values in multiple columns will be represented by identical GUIDs. New columns containing the created GUIDs will be created to the left of each selected column.  GUIDs are often used as unique identifiers in databases

Create timestamp

Creates a timestamp (current date and time) in each row of the selected column

Get key from database

Imports primary keys from a database table if a value of that table is identical to a value present in the loaded data set. The tool is useful if you want to export to a database table using primary keys that already exists in another table of that database. Use the Connect to database control (opens automatically) to connect to a table of a database. Then select the database field containing the key in the DB field containing key drop-down menu and select the database field containing the condition in the DB field containing condition drop-down menu. Select the column of your data set that should equal the condition for import to be performed in the Grid column containing condition drop-down menu. Imported data will be placed in the selected column of the data set

Validate output

Creates output validation rules that can be used to verify that values in an export (to file or database) satisfies a set of conditions. Select the column of your loaded data that you wish that your rule should apply to.  Select the type of rule in the Validation type drop-down menu. You may condition the output data of the selected column on having a specific data type (selected in the Data-type drop-down menu), to be within an entered numeric range, to have a character length within an entered range, to have a certain alphanumeric content (text), or to match an entered regular expression. Rules for ranges, content and regex should be entered in the Rule textbox. Use the Sensitivity textbox to enter the percentage of errors with respect to the selected column and the created rule that must occur for an error to be reported in the output validation report. Multiple rules can be applied to multiple columns. See Display output validation for information on how to access the results of your validation rules

Display output validation

Checks loaded data against rules created with the Validate output tool and creates an error report that opens in a new .txt-file. The error report includes the time of creation of the report, information about the source of the loaded data and a list of errors exceeding the sensitivity limit set while creating rules, detailing the column index in which errors occur, the applied rule and the percentage of errors among the values of that column

GET FROM EXTERNAL SOURCE

Geocode

Translates addresses in selected column into decimal latitude and longitude using a Google API. Results will be displayed in the column selected in the Column(x) drop-down menu (latitude) and the column to the right of that column (longitude). The tool requires internet connection and a Google API key that you can obtain for free from here

Translate

Translates text in selected column from a language selected in the Source language drop-down menu to the language selected in the Target language drop down menu, using Google translate. Results will be displayed in the column selected in the Column(x) drop-down menu. The tool requires internet connection and a Google API key that you can obtain for free from here

Currency conversion

Converts currencies in selected column from currency selected in the From currency drop-down menu to currency selected in the To currency drop down menu, with respect to the date entered in the Conversion date textbox (provide date in yyyy-MM-dd format). Results will be displayed in the column selected in the Column(x) drop-down menu. Currency rates are retrieved from ecb.europa.eu. The tool requires internet connection

Screen scrape

Scrapes text from URL in selected column and searches for phrase entered in the Search phrase textbox. The first line of text in each URL that contains the search phrase will be displayed in the column selected in the Column(x) drop-down menu. The tool can be combined with other tools to create custom screen scraping capability. The tool requires internet connection

IP Geolocation

Displays geolocation data for IP-addresses in selected column. Results will be displayed in the column selected in the Column(x) drop-down menu and four columns to the right of that column, detailing city, country, latitude, longitude and organization associated to the IP-address. If you do not create empty columns to hold the retrieved data, make sure each column that displays results has string data format. The tool requires internet connection. IP details are retrieved from ipapi.co

OSINT Entity Extraction

Extracts person names, addresses, phone numbers, email addresses, IP-addresses and user defined text patterns (defined by a regular expression) from the first 10-100 URLs hit by Google search for search phrase entered in the Search phrase or Custom regex textboxes. Select the number of hit URLs to extract data from in the Search hits to extract from drop-down menu. The tool requires internet connection, a Google API key that you can obtain for free from here, and a custom Google search engine that you can create here. Results are well suited to be displayed with the Show as network chart tool found in Analysis operations. Add additional tools to your workflow to filter your search results

US Public data finder

Displays and retrieves data sets available through www.data.gov (US Government open data). Select a publisher in the Publisher drop-down menu and select an action to perform in the Action drop-down menu. You may choose to retrieve a list of all data sets available from the selected publisher, to check availability as XML or .csv for each data set in the retrieved list (note that checking availability for thousands of rows will be very slow, users are advised to save the list of available data sets locally and manually edit them to include only data sets interesting to the user before checking availability) or to retrieve data from the data set represented by the row index of data set in list entered in the row number textbox. Publisher must be selected while checking availability or retrieving a data set. The tool requires internet connection. Also note that some available data sets are too large to retrieve on standard systems

ANALYSIS OPERATIONS

Show and filter positions on map

Displays geographical coordinates (expressed as decimal WGS84 latitudes and longitudes) over a map and allows for filtering of data by highlighting a map area. Select the columns containing coordinates in the Latitude and Longitude drop-down menus and select whether to show or filter geodata in the Action drop-down menu. The selected column in the data area will be used to provide coordinate labels. Hit Go to display coordinates on the map. Use left mouse button to scroll the map and use the mouse wheel to change the scale. Click Exit map view to return to the grid representation of your data set.

 

You can use the right mouse button to highlight an area on the map. Selecting Filter positions in the Action drop-down menu and hitting Go with an area selected will filter your data set to only include rows with positions within the selected area. The tool requires internet connection

Show as network graph

Displays data in columns selected in the First node and Second node drop-down menus as a link chart detailing connection between the columns. Use the Layout drop-down menu to select the layout algorithm to use for displaying your data. Use the Zoom control to determine zoom level of your displayed data. Click Exit network view to return to the grid representation of your data set

Cluster analysis

Examines every link between nodes selected in the First node and Second node drop-down menus and groups them into clusters. Every value in one of the selected columns that do not have a relationship to a value already related to a cluster will be added to a new cluster. Every value in one of the selected columns that does have a relationship to a value related to a cluster will be added to that cluster. Each unique value in included columns will be presented in a list, accompanied by its cluster number. Results are well suited to be visualized using the Show as network graph tool

Find pattern or word in text document

Searches for a phrase or a regular expression pattern entered in the Search word/pattern textbox in a loaded MS Word (.doc or .docx extension) document. Use the Word/pattern drop-down menu to select whether to search for a specific phrase or for a pattern defined by a regular expression. Multiple documents can be loaded an examined simultaneously. Results will be displayed in a new data set detailing the source document, the selected search phrase and the number of occurrences of entered search phrase or pattern

Find related words

Searches for words related to search word or phrase entered in the Search phrase textbox by returning the frequency of words in each row of the selected column containing the search phrase. The tool can be used to find patterns related to word or a phrase among a large set of observations

Create analysis database

Creates an SQLite analysis database with the Rafn Analytics data model. Clicking Go opens a directory selection dialog, creating the database at the selected location. The created database will be named RafnAnalyticsDatabase. and will, at creation, contain only the tables links and themes.

 

Use the Export to database tool to export data to the database. Export to links table should include two entity instances (phone numbers, financial accounts, a person and an address or any pair of things that are linked in your source data), IDs for each entity as well as for the unique event (preferably GUIDs), and the entity type of each entity instance. The export may also include temporal data, geographical coordinates expressed as decimal WGS84 (dot MUST be used as decimal separator) and free text. You may add additional tables representing each entity type using the Add entity type to analysis database tool. Rafn Analytics Studio will be connected to the created database until you connect to another database. To reconnect, use the Connect to existing database tool

Connect to existing database

Connects to an existing SQLite analysis database with the Rafn Analytics data model. Clicking Go will open a file selection dialog, connection will be established to database at selected location

Add entity type to analysis database

Adds an entity type to connected analysis database. Enter the entity type name in the Entity type textbox (entity type name may NOT contain spaces). Then enter each attribute of the entity type in the Entity attribute textbox and select the data type of the attribute in the Attribute data type drop-down menu. Clicking the Add attribute button will add the attribute to list of attributes. Hitting Go will create the new entity type with selected attributes in database

Query analysis database

Queries analysis database with respect to selected filters (see below) and shows result in grid. Filtering may include entity types and attributes filters, temporal filter, geographical filter, free text filter and theme filter. If selected integer value in the Degrees of separation drop-down menu is larger than 1, result will be expanded with entities connected to entities in result for selected number of iterations

Set entity filter

Sets filter used by the Query analysis database tool with respect to entity type and attribute. Set entity type filter by selecting a value in the Entity type drop-down menu. You may optionally also filter by attribute by selecting an attribute in the Attribute drop-down menu, a logical operator (=, <, >, Contains) in the Logic drop-down menu and a condition in the Filter textbox (e.g. filter by vehicles (entity type) where the attribute color = blue)

Set time filter

Sets filter used by the Query analysis database tool with respect to temporal data. Result will be limited to events with temporal data within the range stated in the From time and To time textboxes. Input should be a valid representation of Date or DateTime (although actual data format may also be string)

Set text filter

Sets filter used by the Query analysis database tool. Result will be limited to events containing text entered in the Filter text textbox

Set geographic filter

Sets filter used by the Query analysis database tool with respect to geographical data. Use the right mouse button to draw a rectangle over the map view. Result will be limited to objects inside the area selected. (You may use the Show and filter positions on map tool to focus map view to your data)

Add theme to selected row

Adds a new (entered in the New theme textbox) or existing (selected in the Existing theme drop-down menu) theme to selected row (check Select rows radio button to select rows). Rows must be retrieved using the Query analysis database tool. Theme can be applied to only one row at a time

Set theme filter

Sets filter used by the Query analysis database tool. Result will be limited to events linked to theme selected in the Theme drop-down menu

Clear filters

Clears all filters used by the Query analysis database tool

SQL to analysis database

Manually queries analysis database. Enter SQL query in the SQL query textbox. Use SQLite query syntax

Entity frequency

The tool is identical to the Query analysis database tool and works with the same filters but, instead of returning the filtered data, entity frequencies of the filtered data will be returned

CUSTOM TOOLS

Get country from phone calling code (example)

It is possible to patch custom tools into Rafn Analytics Studio with no need for reinstalling the application. The Get country from phone calling code tool is provided as an example. Origin country of phone numbers in selected column (depending on phone calling code) will be displayed in the column selected in the Column(x) drop-down menu