Showing posts with label tech memoir 技术备忘. Show all posts
Showing posts with label tech memoir 技术备忘. Show all posts

Monday, June 15, 2009

librdf Python API summary

I reviewed my code written using the Redland librdf Python API, and made a brief summary as a memoir. For more advanced and powerful parsing, I'm turning to some Java libraries, such as Jena, owlapi, Pellet, ...

Update: but I have been working with python-librdf for all the time and got my Java stuff put away...

** RDF.Model object

import RDF

model = RDF.Model(RDF.MemoryStorage())
Model using in-memory storage

model = RDF.Model(RDF.HashStorage(bdb_location,options="hash-type='bdb'"))
Model using Berkeley DB storage

p = RDF.Parser('raptor')

file_uri = RDF.Uri('file:/path/to/rdf_file')
Create a URI indicating a local file

p.parse_into_model(model,f_uri)
Parse a rdf file into model. Return boolean to indicate whether this operation is successful or not. (can parse multiple files into one model)

len(model)
Returns number of statements in model, only applies for models with in-memory storage. Won't work for Berkeley DB storage.

** RDF.Node object type

The node.type attribute is an integer indicating type of the node:
1: Resource node, can get RDF.Uri object by node.uri
2: literal node, the value of which can be extracted as node.literal_value['string']
4: blank node, which usually appears in owl as object of rdfs:subClassOf as a restriction on properties.

Better to use node.is_literal(), node.is_resource(), node.is_blank() to make judgment on node types, to avoid confusion.

** Simple query methods (bound to RDF.Model object)

All simple query methods supported by RDF.Model object can accept RDF.Node object. These methods also returns RDF.Node objects.

(model indicates RDF.Model object)

result = model.get_target(a,b)

result = model.get_predicate(a,b)
result = model.get_source(a,b)
Returns a RDF.Node object, or None upon failure

results = model.get_targets(a,b)
results = model.get_predicates(a,b)
results = model.get_sources(a,b)
Always return a RDF.Iterator object, containing the sequence of RDF.Node objects.

Iteration:
for result in results:

Check for end:
results.end() # return 0 or 1 on whether it is exhausted.

Membership test:
my_node in results # return boolean

** Not simple query methods

Create a RDF.Query object:
query = RDF.Query(query_string,query_language='xxx')

Query languages are rdql or sparql, default rdql

Sparql query with new string format syntax:
query = RDF.Query('SELECT ?s WHERE {{ ?s <{0}> <{1}> }}'.format(...),query_language='sparql')

results = query.execute(model)
results is a RDF.QueryResults object, also an iterator

Check for end:
results.finished()
for this_re in results:
print this_re['s']

** Blank node

Blank node is specially noted here because it is frequently used in collection-type domain/range declaration, and property restriction for class. All happen in OWL.

Blank node does not have uri attribute, and cannot be converted to RDF.Uri object. It can be easily used in RDF.Model object-bound queries, as they readily accepts node object as arguments.

To use it in ``not-simple'' query, Sparql query syntax (but not rdql) has to be used:

# node is a blank node
node_str = '_:'+node.blank_identifier
q = RDF.Query('SELECT ?predicate ?object WHERE {{ {0} ?predicate ?object }}'.format(node_str),query_language='sparql')
results = q.execute(model)

Problem: when working with Uniprot OWL, such a query would retrieve all the blank nodes. For example, a sparql query of blank node with predicate owl:unionOf will retrieve 26 blank nodes as objects, but a get_targets() query will retrieve only 1 blank node correctly. Will check if this is a bug or not.

Nodes with collection parseType are frequently used, eg. in domain/range specification.

Friday, June 12, 2009

Generic genome browser on Ubuntu

I'm preparing some stuff for the workshop, so I'm getting back to gbrowse again.

** Installation steps:

gbrowse version: 1.69
Ubuntu version: 9.04)
perl version: 5.10.0
bioperl version: don't know how to figure that out...

$ sudo apt-get install libapache2-mod-perl2
$ sudo apt-get install libapache2-mod-perl2-dev
$ sudo apt-get install libapache2-mod-perl2-doc
$ sudo apt-get install apache2-doc

Verify that this directory exists: /usr/lib/cgi-bin, if not, create.

$ sudo apt-get install libgd2-noxpm-dev
$ sudo apt-get install mysql-server
$ sudo apt-get install mysql-client

Use cpan to install all prerequisite Perl modules as listed in INSTALL.

Started to install gbrowse from source code:

$ perl Makefile.PL

Complained that Bio::Graphics module is old. So upgrade it using cpan. (not successful until graphviz software is installed)

$ cpan
cpan> upgrade Bio::Graphics

$ perl Makefile.PL
$ make
$ sudo make install

Finished. Really a bit surprised to see so many fancy features in this version of gbrowse. The last time I was working with gbrowse is 2006.

** It stuffed some scripts into my system directory. For example those are found in /usr/local/bin:
bp_search2alnblocks bp_search2tribe bp_seqfeature_load.pl
bp_search2alnblocks.pl bp_search2tribe.pl bp_seq_length
bp_search2BSML bp_search_overview bp_seq_length.pl
bp_search2BSML.pl bp_seqconvert bp_seqret
bp_search2gff bp_seqconvert.pl bp_seqret.pl
bp_search2gff.pl bp_seqfeature_delete.pl bp_seqretsplit.pl
bp_search2table bp_seqfeature_gff3.pl
bp_search2table.pl bp_seqfeature_load
... and much more!!

Gbrowse configuration files are located in ``/etc/apache2/gbrowse.conf/''

** use bp_seqfeature_load.pl to initialize the mysql database.

** GFF3 format file nuisance:

The gff3 file for E.coli I downloaded from NCBI was rejected by gbrowse!
The scaffold declaration entry (first row of gff contents) is different with the example gff3 file that comes along with gbrowse program. The TYPE (3rd column) has to be ``chromosome'', and there has to be ID,NAME contents in 9th column.

Friday, June 5, 2009

X11 display reconfiguration for Ubuntu on Dell T300

I just upgraded Ubuntu on our Dell T300 machine to 9.04. Fancy... but we got a surprisingly slow response in gnome-terminal.

I googled, and found out that Ubuntu 9.04 is not working very well with the Intel video chipset of the machine, as discussed here.

The solution is quite simple (though the instructions kept me trying for a while). Sudo to root, edit the file /etc/X11/xorg.conf, add following line to Device section:

Option "MigrationHeuristic" "greedy"

Then logout, and re-login, solved!

Friday, May 15, 2009

Ubuntu install from USB stick and what

Well, I just made a successful install of Ubuntu from USB stick, not a live CD.

This turns out to be very simple: you just download a Ubuntu install CD image onto a Ubuntu machine, insert a working USB stick (make sure there's no important data), and use the ``USB startup Disk Creator'' to do the thing. It is crucial for the version of system you're running and the CD image to be the same (version 9.04 was tested to be working).

After this, plug the stick onto a PC which can boot from USB device and install the system. I got a Dell studio computer and the installation worked fine.

==========

I went to the library this afternoon, where I saw a kind of ceremony for the graduating cadets (the students going to be officers in military). I joined with their families to watch them.








下午去图书馆的时候在路上看到了一个小的仪式,应该是为毕业的军校生(cadets)举行的。人群很安静。肃立在人群中央的学生(男女都有哦)被叫到名字后就会走上前去与一个军官模样的人,敬礼,然后握手交谈几句,退下。围观的人中应该不少都是学员的家人,忙着用相机记录这一幕,或者静静的注视。

Texas A&M有四千军校生,所以校园里经常能看到最后一张照片里打扮的学生。我在research park跑步的时候还遇见过全副武装的学员背着制式步枪在拉练,有点像在媒体上见到的驻扎在海外的美军士兵。这个学校还有个Memorial student center,那里的草坪是禁止踩踏的(在美国很少见),草坪上的一块小石碑告诉了你原因:这里是专门用来纪念从这个学校毕业的、为美利坚合众国捐躯的军校生。

Monday, March 16, 2009

Python-C software installation

I've been trying to install xMAN software, named for "extreme mapping of oligonucleotides". This software is a new kind to me: C-extended Python. And its installation tends to be different.

First install prerequisite:

sudo yum install swig.x86_64
sudo yum install python-numarray.x86_64

The package comes along with a setup.py, run as suggested:

sudo python setup.py install

Got error:

error: Python.h: No such file or directory

Looked through Google, saw that python-dev needs to be installled:

sudo yum install python-devel.x86_64

Then the installation worked just fine!

Thursday, March 12, 2009

Make friend with SAMBA

I'm doing a system admin's business, where I'm totally a dummy.

Following is trials I use to make friend with Samba (or hoax it to work for me...)

Start/stop service:
# /etc/init.d/smb start
# /etc/init.d/smb stop


Configuration file: /etc/samba/smb.conf
Content:
[global]
workgroup = MYGROUP
netbios name = HOBBIT
security = SHARE
[data]
comment = Data
path = xxxx
force user = xx
force group = xx
read only = No
guest ok = Yes

To properly use it, set seLinux to log warnings instead of blocking. I did this via GUI. And here's a method to modify its config file:

http://www.revsys.com/writings/quicktips/turn-off-selinux.html

The net location thus can be viewed on my laptop via:
smb://location/dir

The location can be viewed on windows machine via:
\\location\dir

To map it as a network drive for windows application, use following command in CMD:

net use X: \\location\dir

Tuesday, March 10, 2009

Compiling R on CentOS

What's the problem with the CentOS I'm working with... seems so many necessary components are missing from regular Linux distro I have been working with>_<

During configuration, following errors tell that F77 compiler is missing, and I installed following:
# yum install compat-gcc-34-g77.x86_64

And re-config, again error complains that readline utility is missing.
After yum search readline, I chose to install this:

yum install readline.x86_64

but did not work. then I installed this:

yum install readline-devel.x86_64

and it worked!

Go on to install bioconductor:

> source("http://bioconductor.org/biocLite.R")
> biocLite()

Everything OK.

Thursday, March 5, 2009

Mount NTFS device on CentOS

This morning, I inserted an external hard drive to my dear server, and got an angry message: Unsupported file system type (ntfs)!

I consulted contents on this centos wiki page:

http://wiki.centos.org/TipsAndTricks/NTFS?highlight=(ntfs)

and followed all the instructions, including install rpmforge repository, yum update, install those weird stuff (fuse fuse-ntfs-3g dkms dkms-fuse), and tried but failed again:

# mount /dev/sdf1 /mnt/test/ -t ntfs-3g
FATAL: Module fuse not found.
ntfs-3g-mount: fuse device is missing, try 'modprobe fuse' as root


Two hours later when I was almost despair, I searched on google again, and got a top-hit link:

http://www.wains.be/index.php/2007/02/28/mount-ntfs-disks-under-centos/

I immediately went to rpmfind.org to search the RPM package, with the kernel name of the machine. and I got it.

# rpm -ivh kernel-module-ntfs-2.6.18-92.1.10.el5-2.1.27-0.rr.10.11.x86_64.rpm
# /sbin/modprobe ntfs


This just works!

Wednesday, January 14, 2009

Severe errors that might ruin a project

I have recently ran into a severe error that would have spoiled a project... and luckily I got to know a bit why.

I have utilized the tgicl (TIGR sequence assembling software) to make one assembling job. The tgicl called formatdb to build index of all the sequence reads, and there's .nhr, nin, .nsq files been produced.

After assembling, I want to get sequences for all singletons, so I used fastacmd to fetch them from the database formated by tgicl. Everything is usual... but the output is unusual: identifiers of the sequences thus fetched are strange names like following:

>gnl|BL_ORD_ID|10 000067_0726_3676 ...descriptions...

The second word is indeed sequence name, and I have no idea how the first strange name came here. This would cause serious trouble in further processing if the name change were not considered.

Anyway, I found that if I do formatdb using following parameters:

$ formatdb -i xxx.fasta -p F

The produced file xxx.fasta.nhr contains all name correspondance, which uses only one line to hold the very big contents! And this indirectly saved my ass in this nasty issue, especially when I was running to meet the deadline!

Tuesday, January 13, 2009

Ubuntu for Lenovo Thinkpad X61

I have got a X61 and now I want to use Ubuntu on it.

A partition of 40 GB was made for /home, and 15 GB for /, and 300 MB for /boot
(peculiarly, my lab PC which runs Ubuntu 8.03 gets a lot of stuff in /boot and the tiny space of 100 MB is used up, and the update manager refuses to work!!)

** The system is in English.

** Enable Chinese input

I have previously installed scim-chinese, but this won't help.

I followed this link, and its quite easy: go to System > Administration > Language support, and select Chinese. Ubuntu will download a lot of stuff. After that, select "allow input complex characters". Finally select OK. Log-out will be required.

http://www.blog.highub.com/linux/ubuntu-install-chinese-input/

现在我可以输入中文了 :-)
(Now I can input Chinese!)


** Adjust bash prompt appearance

Edit ~/.bashrc, around line 55, make following replacement:

old: PS1='${debian_chroot:+($debian_chroot)}\u@\h:\w\$ '
new: PS1='${debian_chroot:+($debian_chroot)}[\u@\h \W]\$ '

I have no ideas what that's about -_-||


** Personalize Vim

Vim is my favorite unix software...

$ sudo apt-get install vim

And I created a .vimrc file with following contents:
set hls
set nonumber
set ts=4
" use indent of 4 spaces, and have them copied all around
set shiftwidth=4
set shiftround
set expandtab
set autoindent
syntax on


A more comprehensive (and old) .vimrc sample file is here:
http://www.stripey.com/vim/vimrc.html


** Personal data for Firefox

The personal data on my lab PC was built into a tarball, and transfered to X61.
Original contents in ~/.mozilla/firefox/ were back-upped and deleted, where my personal data was used to make the replacement.

And Firefox happily accepts my personal data and new configurations.


** Install following goodies (and problems)

$ sudo apt-get install dia
xxxxxxxxxx
xxxxxxxxxx
The following packages have unmet dependencies:
dia: Depends: dia-common (= 0.96.1-5ubuntu2) but 0.96.1-7ubuntu1 is to be installed
Depends: dia-libs (= 0.96.1-5ubuntu2) but 0.96.1-7ubuntu1 is to be installed
E: Broken packages


I'm using Chinese version of sources.list (locates at /etc/apt/), and I changed it to a "western" version I found online.

$ sudo apt-get update
## finishing update
$ sudo apt-get install dia

Then Dia 0.96.1 was installed successfully.

Monday, October 27, 2008

Dummy bug in my Python programming

My buggy program:

for line in fin.open('onefile'):
list = line.strip().split('\t')
... (other goodies)

def revComplimentary(seq):
li = list(seq)
... (other goodies)

When the self-defined function is called, a TypeError is thrown saying "list object is not callable". What hell is this problem? It turns out that I have redefined "list" to be an object in file processing.

Sunday, June 22, 2008

Install libsbml on Ubuntu

I have changed OS to Ubuntu 8.04(hardy), so everything I installed and configured on my old FedoraCore4 were lost

>_< :-(

Now I'm installing libsbml.

./configure complained that C++ compiler cannot produce executables. So I installed "build-essential".

Then ./configure complained following:
*** Could not find 'xml2-config' in directory yes/bin/.
*** Please check that the PATH supplied to --with-libxml=PATH
*** is of the form '/usr/local' and not '/usr/local/lib'; in
*** other words, omit the 'lib' part of the name. The 'configure'
*** utility will append 'lib' to the given path automatically.

I googled, and installed "libxml2-dev".

Then I configured successfully.
./configure --with-python

But make produces a lot of errors. This is because Python developer tools has not been installed.
apt-get install python-dev

Compilation succeeded this time!

make
make install

ldconfig

After this, run python, run "from libsbml import *", command succeed!

Saturday, June 7, 2008

Install Python API to libsbml

I installed Python API to libsbml, the famous SBML library.

First build libsbml with "--with-python" during configure, the libsbml library is installed on /usr/local/lib/ by default, and the Python API is installed on /usr/local/lib/python2.4/site-packages/libsbml;

Then add following line to .bashrc:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
export PYTHONPATH=/usr/local/lib/python2.4/site-packages/libsbml:/usr/local/lib/python2.4/site-packages

This way python can import libsbml!

Tuesday, May 6, 2008

Be a blogger...

I hasten to add a few words to my Blog when I discovered something, so as to be a blogger...

In the coding morning, I ran into one strange problem which kept me scratch hair for an hour...



file = 'tmp.ode'
fout = open(file,'w')
... (write goodies) ...
fout.write('done\n')
subprocess.Popen(['xppaut %s -silent > out 2>err' % file], shell=True).wait()




Obviously I forgot to close the output file object. For me, this has the peculiar effect of disturbing the following system call: the xppaut interpreter simply refused to read the ODE file, which I have found to be readily there on the directory, and contents are all correct.

Wednesday, April 16, 2008

Find files and delete

The first time I used "find" to delete all old and obsolete temporary files:

# find /xxx/tmp/ -ctime +14 -exec /bin/rm '{}' \;

$ find . -name '*.zip' -exec unzip {} \;

Thursday, February 21, 2008

Article submitted

Today I submitted my article to BMC Bioinformatics. The article is my primary attempt & endeavor to computational biology, and the first article prepared using LaTeX and BibTeX. How convenient! Sadly I have no hope to prepare my dissertation in same way here.

I used BMC's TeX template when drafting the manuscript. Despite that BMC has instructed me to remove all dummy contents in the template file, I found following lines essential for submission or else BMC would complain that it is NOT done with their template:

%% BioMed_Central_Tex_Template_v1.05
%% %
% bmc_article.tex ver: 1.05 %
% %

Wednesday, January 9, 2008

table border...

After years been haunted by ugly-looking HTML tables I produced, I finally found out that some simple html configs would make table looks smarter.


a nice table...

Tuesday, November 27, 2007

Windows to serve? :-(

"Platform independent" indeed is a motto for professional developers, which means quality softwares should be able to move freely between platforms with great ease, not sorrow. And my recent experience proved it right, but in another field, the web service development.

I'm the author of several junky databases and on Monday this week I got a task to "transplant" them on a ThinkPad X31, which runs a windows XP, which is authentic home edition, and which guarantees to be, eh, slow.

After complaining for a couple of hours, I finally settled down to work and analyze the situation I was facing. (Those databases were wrote on a Linux platform, and running on Linux platform. They were never intended to have even a single relationship with Microsoft products!)

Any way, I got to do the task, and two days later now it's nearly finished. I think I should write some lines here, as memoir in case I met similar tasks!!

1. MySQL issue
Mysql databases on linux platform can be dumped out and used to rebuild an identical one on windows like following:
c:\mysql -uroot dbname < dbname.dump
dbname.dump is a dump file generated by mysqldump.
There seemed to be version incompatibility as the mysql on windows is 4.0.1 while on Linux it is 4.1. When dumping on windows, mysql complains unrecognized word like:
"ENGINE=MyISAM DEFAULT CHARSET=latin1;"
They appear after table creation SQL statements. I directly replaced them to "TYPE=MyISAM;" and it simply works.

2. File path in Perl
The header in all Perl CGI program has to be changed to:
#!c:\perl\bin\perl.exe
and absolute directories has to be dealt with care. In open file, DIRs can be specified safely with slash, while in system calls (I frequently call "move"!) backslash has to be used.
For DIRs with space in it (eg, "Program Files"), double quote mark is sometimes necessary. However note following are valid:
require 'C:\Program Files\Apache Software Foundation\Apache2.2\cgi-bin\easygo\code.pl';
my $dir = 'c:\program files\apache software foundation\apache2.2';
system "move script\\funky_stuff \"$dir\\htdocs\"";

3. File path in R
The program needs to call R in batch mode and do some file input/output. On windows, R accepts slash to indicate directories, but space need to be converted by backslash. So to indicate a file, write following in Perl:
$dirInR .= "c:/program\\ files/aaa/bbb/";

Monday, November 19, 2007

Turning to Python

I finally decided to turn to Python, although I've been Perl-lish for 3 years...

However, on my old FC4, Tkinter could not be used...
>>> import Tkinter
^*&**&%^%^&*(%$$$#=
ImportError: No module named Tkinter :-(

I went to Python community for help, but they said Tkinter is "built-in" facility. My Python is 2.4.1, is this an antiquate??
I checked my PC and confirmed that tcl/tk is installed and functional, I even looked the whole file system and found that there's no such file "Tkinter.py"

A friend of one of my friend told me Tkinter is a wrapper for tcl/tk, and tkinter package needs to be installed for use (which has name as "python-tk" on debian/Ubuntu).

I resorted to rpm.pbone.net and downloaded "tkinter-2.4.1-2.i386.rpm" and installed. Then import Tkinter works!

Wednesday, August 15, 2007

在笔记本电脑上安装Ubuntu

今天在lidaof同学的大力帮助下在笔记本电脑上安装并配置了Ubuntu, 从此抬起头来做人!
空言无凭, 有图为证:

总结如下:

1.
分区设置
swap 1G
/boot 100M
/ 40G

2.
安装过程很顺利, 只是选择了英语, 最后发现系统中没有中文输入法 :-O


3.
无线网络的配置:
自动识别出无线网卡, 然后在Properties窗口中有自动找到Network name (ESSID)为TP-LINK
填写passwd, 注意passwd类型应选为ascii
Configuration选为DHCP
然后就点OK, Ubuntu就去寻找网络了, 然后就能联网了. 如果不能立即联网就多等一会儿...

4.
安装软件: apt-get 的配置和使用
apt-get的source file在: /etc/apt/sources.list

export HTTP_proxy=http://proxy:port

4.1 various
apt-get update
apt-get install apache2-mpm-worker
apt-get install mysql-server
apt-get install dia
apt-get install ssh
apt-get install gftp
apt-get install rar
apt-get intstall mplayer
apt-get install graphviz
apt-get install scilab
apt-get install r-base
apt-get install xchm
apt-get install tetex-base
apt-get install tetex-bin
apt-get install tetex-extra
apt-get install build-essential

4.2 Perl GD
To install Perl GD, libgd is needed. However, following installation won't help:
apt-get install libgd-dev
But need to do this:
apt-get install libgd2-xpm-dev

Then I manually & successfully installed GD.pm (by perl Makefile.pl -> make -> make test -> make install)

4.3 VI
这个发行版中带的VI好像并不完整, 在编辑脚本和网页的时候都不肯给高亮. 于是重新装了一便:
apt-get install vim

并建立.vimrc作为配置:
set ts=4
set sw=4
set cindent
set autoindent
syntax on

就OK啦!

4.4 Chinese input method
The Chinese input method is not easily working:
apt-get install scim
apt-get install ttf-arphic-gbsn001p
apt-get install chinput

Neither scim nor chinput refuses to work... GOD saves me!!!

4.5 PDF reader
But I cannot get Acrobat Reader by issuing "apt-get install acroread".
Hmmm...

4.6 distribution update
最后这个比较漫长:
apt-get dist-upgrade


5.
apache的系统路径:
/var/www/apache2-default (原始默认路径, 网页的放到这里才能认)
配置文件: /etc/apache2/apache2.conf
更改服务状态的脚本: /etc/init.d/apache2 (在FC中用/sbin/service)
And error log file: /var/log/apache2/error.log
利用如下语句使apache执行CGI脚本:

<Virtualhost localhost>
DocumentRoot /var/www/
<Directory />
Options FollowSymLinks
AllowOverride all
</Directory>
<Directory /var/www/cgi-bin/>
Options ExecCGI
</Directory>
</Virtualhost>

开始的时候出过怪事: 进行如上配置后apache死活不肯执行.pl脚本, 说脚本找不到. 查看error.log发现apache竟然到/usr/local/下去找脚本去了!! 真是可怕, 要是偶尔执行了那里的一个什么系统命令就完蛋了. 不更改配置文件, 把电脑重起后这个问题又自动解除了, 真是怪!

然后就Good night!