One of the things that I've come across has been legacy applications which use Oracle; where we don't have access to the database, but do get provided with Oracle .dmp files. These aren't very helpful when all you have (or are used to!) is MySQL and PostgreSQL.
One approach that I've had success with is to
download a VirtualBox image of Oracle, and then play with the data in there. I chose the
Database App Development VM since I wasn't sure what parts of Oracle I'd need, having strenuously, pretty successfully, avoided Oracle for most of my time in the industry.
I then imported this into VirtualBox (running on OSX Lion) and configured networking:
- One adapter running NAT so that I can browse the internet in the guest OS. This let me download the .dmp file to the guest OS filesystem.
- Set up Port Forwarding for the NAT interface so that I can ssh to port 2022 on the host which will go to port 22 on the guest, thus allowing ssh access.
- Optionally, I set up the VM to have another adapter (Host-only) so that I can set up NFS shares to mount part of the host filesystem under the guest.
Next, I needed to get the .dmp data onto the guest OS (and later get the transformed data off the guest).
ssh-copy-id
is good for this, to put an SSH public key into the
authorized_keys
for the oracle user on the guest. You can also get data into and out of the guest using
python -m SimpleHTTPServer
ran in the appropriate directory, which let me browse the host filesystem or guest filesystem as needed.
ifconfig
in the host or guest lets me know which IP address to use.
Now, I needed to create a tablespace and user to allow me to import the data. I advise doing this, since (for me at least!) importing the data is an iterative process, and creating a separate tablespace (with separate data files) is a good practice since it avoids bloating the system tablespace and means that disk space can be reclaimed. Pretty much the only Oracle knowledge I have! Before you create the tablespace, it's a good idea to check the size of your .dmp and available space on the filesystem. I had a 1.4GB .dmp which didn't fit into the space left on the fs and I burnt a bit of time figuring out Oracle error messages for the failed import before I worked out the filesystem wasn't big enough. In this case, I created a symlink in
$ORACLE_HOME/dbs/
which pointed to a large enough partition and set the owner / permissions as required. Creating the tablespace was just a case of running:
$ sqlplus / as sysdba
...
SQL> CREATE BIGFILE TABLESPACE mytablespace DATAFILE 'mytablespace/f1.dat' SIZE 20M AUTOEXTEND ON;
Tablespace created.
SQL> CREATE USER myuser IDENTIFIED BY password DEFAULT TABLESPACE mytablespace;
User created.
SQL> GRANT CREATE SESSION,CREATE SYNONYM,CONNECT,RESOURCE,CREATE VIEW,IMP_FULL_DATABASE to myuser;
Grant succeeded.
SQL> exit
We should now be in a position to try to import the data.
$ time imp myuser/password file=path/to/data.dmp full=yes
If this fails since the user that it was exported as is not the same as the user you created, then stop the import and clear out the user and tablespace.
$ sqlplus / as sysdba
SQL> DROP USER myuser CASCADE;
User dropped.
SQL> DROP TABLESPACE mytablespace INCLUDING CONTENTS AND DATAFILES;
Then re-create the tablespace and the new user and try the import again.
Once the import has succeeded, you want to get the data out of the database into a less proprietary format. One way is to use SQL Developer (a GUI tool included in the VM image).
Open SQL Developer and define a new database connection:
- Connection name
- myuser
- User name
- myuser
- Password
- password
- Save Password?
- Checked
SID is orcl rather than xe, in the Developer Days VM that I used.
Test the connection. It should work. Then open the connection and examine the tables.
- In the menu, click Tools | Database Export
- Want to export the data only, into CSV.
- Choose the connection, choose the tables, choose the destination file.
For large databases, this can take a while to process (2.5 hours for my case). It may be faster to write your own export routine using Perl, PL/SQL or similar. In the end, that's what I did, so that I could script the entire process like so: