Skip to Main Content

Digital tools for research

Find information on digital tools to analyse and visualise data and text.

What is text analysis and data mining?

Text analysis

Text analysis (or text mining) uses large collections of text or "unstructured data" to identify patterns or connections. Automated computer tools, are used to process large amounts of text, meaning that no reading or viewing of the materials is necessary. Text analysis is considered ‘non-consumptive’ research.

Adapted from "What is text and data mining" by The University of Adelaide Library is licensed under CC BY-NC-SA 4.0

How does Text Mining Work? (1:34 mins) by Elsevier (YouTube)

Data mining

Data mining is the use of computational techniques to find patterns or relationships within large sets of organised or "structured" data. These datasets need to be organised into specific, defined formats before mining processes can be performed. 

Adapted from "What is text and data mining" by The University of Adelaide Library is licensed under CC BY-NC-SA 4.0

All major data mining techniques explained with examples (13:03 mins) by Learn with Whiteboard (YouTube)

Key considerations

There are a number of factors to be aware of when conducting text analysis and data mining - see below for issues related to Ethics, Copyright, Permissions, Licensing, and Referencing.

For more details on definitions and legal implications of text and data mining see the Australian Law Reform Commission page.

Ethics

Even if access is permitted when performing text and data mining, it is important that researchers respect the rights of the owners of the content and abide by their terms of access. Researchers also need to respect the privacy of the subjects of research and be aware that data mining may reveal confidential details.

Information on the responsibilities of researchers can be found on this page on Research integrity.

Copyright

There is no Australian copyright exemption for text and data analysis, as explained in this Australian Law Reform Commission discussion paper. Even publicly accessible arrangements of datasets are still protected by copyright and may require permission for use in a text analysis or data mining project.

Permissions

For some data, you may need to acquire permission from the rightsholder before performing analysis on datasets. Be aware that if you are granted permission to use data for your research, this may not extend to use for publication. It is easier to seek permission for all uses of the data upfront.

For tips on permission seeking for researchers, please see the Copyright guide's Seeking permission section. 

Licensing

Data and database publishers vary widely in the degree to which they permit text and data mining of their collections. First consult the licence in the LibrarySearch record for the database, as illustrated in the image below:

LibrarySearch database record with licensing options

Image: Copyright © Ex Libris. Used under licence.

If the 'Show License' option does not appear, or if the information does not mention data mining, contact the Library Research Services team.

Websites and social media platforms have terms of service which may include clauses around data mining and text analysis. Check the website terms of service or terms of use to determine what is allowed for the site you intend to use.

The Australian Research Data Commons (ARDC) has several flowcharts that illustrate the licensing process and a data rights management guide that focuses on rights information and licences.

Referencing

Data sources such as data sets and raw data (for text analysis), stop word lists, algorithms, visualisations and other textual data borrowed from others used for the purposes of text analysis and data mining should be acknowledged and cited appropriately in your chosen referencing style.

See the RMIT Easy Cite referencing guide to determine how to cite data sources in a variety of referencing styles.

Data sources

Overview

Raw data available for text analysis and data mining can be derived from many sources, including library databases and the open web. See the following tabs for some licensed and open access data sources that may be useful for your research. 

Licensed library sources

Note: The following data sources are licensed library resources and permitted for use by RMIT staff and students (RMIT login required).

Open access sources

Note: The Library does not license the following open access resources, and does not assist with API management, text and data mining, and other services.

Coding tools and tool indexes

Overview

The following coding tools such as Python and R are the perfect programming languages for developing text analysis applications, due to the abundance of custom libraries available that are focused on delivering natural language processing (NLP) functions.

Note: Some basic familiarity with programming languages may be required to use these tools, and where possible, training resources have been provided for inexperienced users.

Python

Python is a general-purpose programming language with a focus on code readability for projects of all sizes.

Access

RMIT provides access to Python for staff and students. 

  • myDesktop - go to myDesktop (RMIT login required) and select Python from the Apps tab.
  • Personal device - to download/install Python on your own device: 
    • Go to the Python homepage.
    • Locate the Downloads menu and follow the on-screen instructions.

Training resources

R

R is a programming language and software platform focused on statistical analysis, graphical presentation, and is widely used in data mining.

Access

RMIT provides access to R for staff and students. 

  • myDesktop - go to myDesktop (RMIT login required) and select R from the Apps tab.
  • Personal device - to download/install R on your own device: 
    • Go to the R homepage.
    • Locate the Download menu and follow the on-screen instructions.

Note: For access and training resources to RStudio, See the RStudio section on this guide.

Training resources

Tool collections and indexes

Visual data analysis

Web-based tools

Overview

Web-based tools provide a variety of easy to use and manage visualisation and analysis tools. Some of the following tools listed include word clouds, charts, graphics, and other analysis tools that create visual images and statistically interpret your text.

Jupyter  

Leximancer

Orange

Voyant