Tools
Insert in column(x) if
Inserts value entered in the Value textbox into every row of the column chosen in the Column(x) dropdown menu IF the value of the corresponding row in selected column fits condition entered into the Condition textbox with respect to logic selected in the Logical operator drop-down menu. Logical rules may be that row in selected column should Start with, contain or not contain the entered condition, or that it contains value or is null or empty
Copy to column(x) if
Copies value in row of selected column to column chosen in the Column(x) dropdown menu IF it fits condition entered into the Condition textbox with respect to logic selected in the Logical operator drop-down menu. Logical rules may be that row in selected column should Start with, contain or not contain the entered condition, or that it contains value or is null or empty
Column to column(x) if column(y)
Copies value in row of selected column to column chosen in the Column(x) dropdown menu IF corresponding row in column selected in the Column(y) drop-down menu fits condition entered into the Condition textbox with respect to logic selected in the Logical operator drop-down menu. Logical rules may be that row in selected column should Start with, contain or not contain the entered condition, or that it contains value or is null or empty
Insert in column (x) if in/not in numeric range
Inserts value entered in the Value textbox into every row of the column chosen in the Column(x) dropdown menu IF the numerical value of the corresponding row in selected column is within or not within (select logic in the In/Not in drop-down menu) numerical range entered in the Range start and Range end textboxes
Copy to column(x) if in/not in numeric range
Copies values of rows of selected column to corresponding rows in the column chosen in the Column(x) dropdown menu IF the numerical value of row in selected column is within or not within (select logic in the In/Not in drop-down menu) numerical range entered in the Range start and Range end textboxes
Insert in column (x) if in/not in temporal range
Inserts value entered in the Value textbox into every row of the column chosen in the Column(x) dropdown menu IF the temporal value of the corresponding row in selected column is within or not within (select logic in the In/Not in drop-down menu) temporal range entered in the Range start and Range end textboxes.
Copy to column(x) if in/not in temporal range
Copies values of rows of selected column to corresponding rows in the column chosen in the Column(x) dropdown menu IF the numerical value of row in selected column is within or not within (select logic in the In/Not in drop-down menu) temporal range entered in the Range start and Range end textboxes
Basic arithmetic
Performs arithmetic operation on selected column. Operation may be addition, subtraction, multiplication, division, exponentiation, root of nth degree or logarithm. Values of rows in selected column can be added to, subtracted from, etc. a fixed value entered in the Fixed value for calculation textbox or the value in corresponding row in column selected in the Calculation values column drop-down menu. Columns involved in the operation must contain only numerical values
Algebra
Performs an algebraic calculation on values in selected column with respect to values of one or multiple other columns. A mathematical expression may be stated in the Expression textbox, using X to denote the selected column and Y1, Y2 and Y3 to denote columns selected in the Y1, Y2 and Y3 drop-down menus (for example X = 1 + Y1^3 / Y2). Valid mathematical operators are +, -, *, /, ^ and ∑. Roots should be expressed in exponential form, for example ^0,5 for square root. Columns involved in the operation must contain only numerical values
Calculate DateTimes
Adds or subtracts TimeSpan values to or from DateTime values in selected column. Calculations may be carried out with respect to corresponding values of column selected in the TimeSpan/Number column drop-down menu or with respect to a Fixed value in the Fixed TimeSpan/Number textbox. Values used to add to or subtract from DateTimes in selected column should be in hh:mm:ss format or an integer or decimal number denoting hours. Use the Select add or subtract drop-down menu to chose whether to add to or subtract from DateTime-value in selected column
Linear regression
Performs linear OLS regression to estimate the relationship between an independent and a dependent variable. Selected column represents the independent variable. Column selected in the Independent variable drop-down menu represents the dependent variable. Results are displayed in a new column added in front of existing columns. Columns involved in the operation must contain only numerical values. Information about OLS can be found here.
Multiple regression
Performs linear OLS regression to estimate the relationship between multiple independent variables and a dependent variable. Selected column represents the dependent variable. Columns checked in the list showing all columns in the data area represents the dependent variables. Results are displayed in a new column added in front of existing columns. Columns involved in the operation must contain only numerical values. Information about OLS can be found here.
Remove outliers
Removes rows containing outliers (data points that differs significantly from other observations) from loaded data set using the Interquartile Range method. Rows of selected column must contain only numerical values. Information about the method can be found here. Note that since the tool calculates one value to be removed at a time and then recalculates, the method will be slow for large data sets. For such situations the Shrink sample around mean tool may be an alternative.
Shrink sample around mean
Calculates the mean value of rows in selected columns and removes rows containing values that differs from the mean at a selected level. Use the Net sample size textbox to provide the percentage (integer number) of the data set that should remain after the operation is carried out. Rows of selected column must contain only numerical values
Variable selection
Uses a forward selection algorithm to estimate if a subset of loaded data may be relevant in predicting the value of a dependent variable. Dependent variable is represented by the selected column. All other columns present in the loaded data set are tested for relevance in predicting the value of the dependent column. All values in the data set must be numerical. Use the Selection method drop-down menu to select one or a combination of methods to be used.
One by one (1b1) method performs linear OLS regression on each of the columns present in the data set and returns output for each of the variables that are statistically significant at the confidence level selected in the Minimum confidence level drop-down menu and that have a calculated coefficient of determination (r2) higher than the value typed in the Minimum R2 textbox. The result is displayed as a new data set, with one row detailing each of the remaining variables. Hit Undo to get back to your data.
1b1 + Collinearity test (coltest) performs the 1b1 method described above, and then goes on to test whether there is a linear association between any two of the remaining variables using the Pearson correlation method. If such association is found, that have higher Pearson correlation than the value typed in the Max allowed collinearity textbox, the less significant variable of the pair is removed from the result.
1b1 + F-test performs the 1b1 method, then calculates the overall significance of a model using only the most significant variable. It then adds each of the remaining variables to the model, keeping the added variable only if the overall significance of the model is increased by adding the variable. The result is presented in a new column added in front of existing columns, detailing the overall model and each of the remaining variables.
1b1 + coltest + F-test performs all methods described above in a sequence
Validate or calculate check digit
Uses the Luhn-algorithm to verify or calculate check digits of numerical values in selected column (e.g. a credit card number or IMEI-number). Select whether to Verify check digit or Calculate check digit in the Action drop-down menu. Verify will return true or false (is the check digit correct) in column selected in the Column(x) drop-down menu. Calculate will calculate the check digit and add it to the number in selected column
Date and time
Converts DateTime or string representations of DateTime values in selected column from format selected in the Source format drop-down menu to format selected in the Target format drop-down menu. The source format may be any valid DateTime format (dates may be validly expressed f. ex as yyyy-MM-dd, yyyy/MM/dd, MM/dd/yyyy or MMM/dd/yy, time may be validly expressed f. ex as hh:mm:ss or H.mm.ss AM) or a value that cannot be validly parsed as DateTime but that can be selected in the Source format menu
Time span
Converts TimeSpan or string representations of TimeSpan values in selected column from format selected in the Source format drop-down menu to format selected in the Target format drop-down menu. Source formats may be hh:mm:ss, strings using h, s or m/min to denote hours minutes and seconds, integer seconds, decimal minutes, decimal hours or 1899-12-30 hh:mm:ss (which is often the result of converting a time-only value to DateTime)
Amount
Converts multiple representations of numerical amounts into current culture numeric form. Use the Grouping character drop-down menu to allow for selected thousand separators or other grouping of numbers. Use the Separate result into +/- columns drop-down menu to decide whether resulting positive and negative numbers should be separated into different columns (new columns will automatically be created)
Geographic coordinates
Converts geographical coordinates in decimal WGS84-format, DMS WGS84-format, UTM-format or MGRS-format into any other of these formats. Select source and target formats in the Source format and Target format drop-down menus. For UTM and MGRS formats hemisphere must be selected. For MGRS format UTM zone/GZD and Grid square must also be entered
Phone number
Converts phone numbers in selected column into different standard formats. Use the Source format and Target format drop-down menus to select the source and target formats. If rows of selected columns have mixed source formats Mixed or other may be selected to choose a generic conversion algorithm. If any of the phone numbers to convert include country codes, or if you want country codes in the output values, values for Expected country code and Min length of international number must be provided (standard length is 10)
Export to database
Exports loaded and transformed data to a database. Supported database types are MS Access, SQL Server, SQLite and MySQL. Use the Connect to database control (opens automatically) to connect to a table of a database. You may then select a database field using the Select database field drop-down menu showing all fields in selected table, and hit the Add pair button to pair it up with selected column in the data area to designate the selected column to be imported into the selected field. Repeat until all the columns that you wish to import are paired with a database table field. Use the Datatable key drop-down menu to select a field in the database table that is a primary key to prevent duplication of values in this field.
To make imports to multiple tables, select a new table and repeat the actions described above
Create one GUID/value in selected columns
Creates one unique GUID (Globally Unique Identifier, also referred to as UUID) in each row of multiple selected columns. Identical values in multiple columns will be represented by identical GUIDs. New columns containing the created GUIDs will be created to the left of each selected column. GUIDs are often used as unique identifiers in databases
Get key from database
Imports primary keys from a database table if a value of that table is identical to a value present in the loaded data set. The tool is useful if you want to export to a database table using primary keys that already exists in another table of that database. Use the Connect to database control (opens automatically) to connect to a table of a database. Then select the database field containing the key in the DB field containing key drop-down menu and select the database field containing the condition in the DB field containing condition drop-down menu. Select the column of your data set that should equal the condition for import to be performed in the Grid column containing condition drop-down menu. Imported data will be placed in the selected column of the data set
Validate output
Creates output validation rules that can be used to verify that values in an export (to file or database) satisfies a set of conditions. Select the column of your loaded data that you wish that your rule should apply to. Select the type of rule in the Validation type drop-down menu. You may condition the output data of the selected column on having a specific data type (selected in the Data-type drop-down menu), to be within an entered numeric range, to have a character length within an entered range, to have a certain alphanumeric content (text), or to match an entered regular expression. Rules for ranges, content and regex should be entered in the Rule textbox. Use the Sensitivity textbox to enter the percentage of errors with respect to the selected column and the created rule that must occur for an error to be reported in the output validation report. Multiple rules can be applied to multiple columns. See Display output validation for information on how to access the results of your validation rules
Display output validation
Checks loaded data against rules created with the Validate output tool and creates an error report that opens in a new .txt-file. The error report includes the time of creation of the report, information about the source of the loaded data and a list of errors exceeding the sensitivity limit set while creating rules, detailing the column index in which errors occur, the applied rule and the percentage of errors among the values of that column
Geocode
Translates addresses in selected column into decimal latitude and longitude using a Google API. Results will be displayed in the column selected in the Column(x) drop-down menu (latitude) and the column to the right of that column (longitude). The tool requires internet connection and a Google API key that you can obtain for free from here
Translate
Translates text in selected column from a language selected in the Source language drop-down menu to the language selected in the Target language drop down menu, using Google translate. Results will be displayed in the column selected in the Column(x) drop-down menu. The tool requires internet connection and a Google API key that you can obtain for free from here
Currency conversion
Converts currencies in selected column from currency selected in the From currency drop-down menu to currency selected in the To currency drop down menu, with respect to the date entered in the Conversion date textbox (provide date in yyyy-MM-dd format). Results will be displayed in the column selected in the Column(x) drop-down menu. Currency rates are retrieved from ecb.europa.eu. The tool requires internet connection
Screen scrape
Scrapes text from URL in selected column and searches for phrase entered in the Search phrase textbox. The first line of text in each URL that contains the search phrase will be displayed in the column selected in the Column(x) drop-down menu. The tool can be combined with other tools to create custom screen scraping capability. The tool requires internet connection
IP Geolocation
Displays geolocation data for IP-addresses in selected column. Results will be displayed in the column selected in the Column(x) drop-down menu and four columns to the right of that column, detailing city, country, latitude, longitude and organization associated to the IP-address. If you do not create empty columns to hold the retrieved data, make sure each column that displays results has string data format. The tool requires internet connection. IP details are retrieved from ipapi.co
OSINT Entity Extraction
Extracts person names, addresses, phone numbers, email addresses, IP-addresses and user defined text patterns (defined by a regular expression) from the first 10-100 URLs hit by Google search for search phrase entered in the Search phrase or Custom regex textboxes. Select the number of hit URLs to extract data from in the Search hits to extract from drop-down menu. The tool requires internet connection, a Google API key that you can obtain for free from here, and a custom Google search engine that you can create here. Results are well suited to be displayed with the Show as network chart tool found in Analysis operations. Add additional tools to your workflow to filter your search results
US Public data finder
Displays and retrieves data sets available through www.data.gov (US Government open data). Select a publisher in the Publisher drop-down menu and select an action to perform in the Action drop-down menu. You may choose to retrieve a list of all data sets available from the selected publisher, to check availability as XML or .csv for each data set in the retrieved list (note that checking availability for thousands of rows will be very slow, users are advised to save the list of available data sets locally and manually edit them to include only data sets interesting to the user before checking availability) or to retrieve data from the data set represented by the row index of data set in list entered in the row number textbox. Publisher must be selected while checking availability or retrieving a data set. The tool requires internet connection. Also note that some available data sets are too large to retrieve on standard systems
Show and filter positions on map
Displays geographical coordinates (expressed as decimal WGS84 latitudes and longitudes) over a map and allows for filtering of data by highlighting a map area. Select the columns containing coordinates in the Latitude and Longitude drop-down menus and select whether to show or filter geodata in the Action drop-down menu. The selected column in the data area will be used to provide coordinate labels. Hit Go to display coordinates on the map. Use left mouse button to scroll the map and use the mouse wheel to change the scale. Click Exit map view to return to the grid representation of your data set.
You can use the right mouse button to highlight an area on the map. Selecting Filter positions in the Action drop-down menu and hitting Go with an area selected will filter your data set to only include rows with positions within the selected area. The tool requires internet connection
Show as network graph
Displays data in columns selected in the First node and Second node drop-down menus as a link chart detailing connection between the columns. Use the Layout drop-down menu to select the layout algorithm to use for displaying your data. Use the Zoom control to determine zoom level of your displayed data. Click Exit network view to return to the grid representation of your data set
Cluster analysis
Examines every link between nodes selected in the First node and Second node drop-down menus and groups them into clusters. Every value in one of the selected columns that do not have a relationship to a value already related to a cluster will be added to a new cluster. Every value in one of the selected columns that does have a relationship to a value related to a cluster will be added to that cluster. Each unique value in included columns will be presented in a list, accompanied by its cluster number. Results are well suited to be visualized using the Show as network graph tool
Find pattern or word in text document
Searches for a phrase or a regular expression pattern entered in the Search word/pattern textbox in a loaded MS Word (.doc or .docx extension) document. Use the Word/pattern drop-down menu to select whether to search for a specific phrase or for a pattern defined by a regular expression. Multiple documents can be loaded an examined simultaneously. Results will be displayed in a new data set detailing the source document, the selected search phrase and the number of occurrences of entered search phrase or pattern
Find related words
Searches for words related to search word or phrase entered in the Search phrase textbox by returning the frequency of words in each row of the selected column containing the search phrase. The tool can be used to find patterns related to word or a phrase among a large set of observations
Create analysis database
Creates an SQLite analysis database with the Rafn Analytics data model. Clicking Go opens a directory selection dialog, creating the database at the selected location. The created database will be named RafnAnalyticsDatabase. and will, at creation, contain only the tables links and themes.
Use the Export to database tool to export data to the database. Export to links table should include two entity instances (phone numbers, financial accounts, a person and an address or any pair of things that are linked in your source data), IDs for each entity as well as for the unique event (preferably GUIDs), and the entity type of each entity instance. The export may also include temporal data, geographical coordinates expressed as decimal WGS84 (dot MUST be used as decimal separator) and free text. You may add additional tables representing each entity type using the Add entity type to analysis database tool. Rafn Analytics Studio will be connected to the created database until you connect to another database. To reconnect, use the Connect to existing database tool
Add entity type to analysis database
Adds an entity type to connected analysis database. Enter the entity type name in the Entity type textbox (entity type name may NOT contain spaces). Then enter each attribute of the entity type in the Entity attribute textbox and select the data type of the attribute in the Attribute data type drop-down menu. Clicking the Add attribute button will add the attribute to list of attributes. Hitting Go will create the new entity type with selected attributes in database
Query analysis database
Queries analysis database with respect to selected filters (see below) and shows result in grid. Filtering may include entity types and attributes filters, temporal filter, geographical filter, free text filter and theme filter. If selected integer value in the Degrees of separation drop-down menu is larger than 1, result will be expanded with entities connected to entities in result for selected number of iterations
Set entity filter
Sets filter used by the Query analysis database tool with respect to entity type and attribute. Set entity type filter by selecting a value in the Entity type drop-down menu. You may optionally also filter by attribute by selecting an attribute in the Attribute drop-down menu, a logical operator (=, <, >, Contains) in the Logic drop-down menu and a condition in the Filter textbox (e.g. filter by vehicles (entity type) where the attribute color = blue)
Set time filter
Sets filter used by the Query analysis database tool with respect to temporal data. Result will be limited to events with temporal data within the range stated in the From time and To time textboxes. Input should be a valid representation of Date or DateTime (although actual data format may also be string)
Set geographic filter
Sets filter used by the Query analysis database tool with respect to geographical data. Use the right mouse button to draw a rectangle over the map view. Result will be limited to objects inside the area selected. (You may use the Show and filter positions on map tool to focus map view to your data)
Add theme to selected row
Adds a new (entered in the New theme textbox) or existing (selected in the Existing theme drop-down menu) theme to selected row (check Select rows radio button to select rows). Rows must be retrieved using the Query analysis database tool. Theme can be applied to only one row at a time
Get country from phone calling code (example)
It is possible to patch custom tools into Rafn Analytics Studio with no need for reinstalling the application. The Get country from phone calling code tool is provided as an example. Origin country of phone numbers in selected column (depending on phone calling code) will be displayed in the column selected in the Column(x) drop-down menu