Dear Deepak :
( Julia Computing )
--------------------------------------------
I
write to you as suggested by Viral
Shah ( thru Linkedin )
Some
2.5 years back , I had tried to get Ms Rohini Damahe ( Lecturer - L&T
Institute of Technology ) , to take up a DATA
MINING project for her MS studies . But this did not work out
I
wonder if Julia Computing would want to do this - as a Service to the Nation
What
I have in mind , is explained in the attachment
Feel
free to write / phone for any clarifications
with
regards,
hemen
parekh
www.hemenparekh.in > Blogs >
Towards a National Job Portal
> Reports > www.ResumesExchange.com
Marol
, Mumbai , India
10 Feb 2016
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Proposal
for Julia Computing
From : hemen parekh
/ hcp@RecruitGuru.com / (M)
0 - 98,67,55,08,08 / www.hemenparekh.in
10 Feb 2016
Mumbai
------------------------------------------------------------------------------------------------------------------------------------------
07 May 2013
------------------------------
Dear RohiniDuring our discussions yesterday , you expressed your desire to work on some project involving Data mining At that time , I mentioned that we have a database of over 5 million job advts , downloaded over the past 6 / 7 years from various job portals of India Each job advt database consists of :
Ø Advt
ID
Ø
Ø Designation
( being advertised )
Ø
Ø Company
Name
Ø
Ø Job
Description
Ø
Ø Desired
Profile
Ø
Ø Compensation
Ø
Ø Experience
( desired ) – Years
Ø
Ø Industry
Type
Ø
Ø Education
Ø
Ø Location
( Posting City )
Ø
Ø Keywords
Ø
Ø Post
Date
Ø
Ø Expiry
Date
Ø
We were displaying PIE-CHARTS of :
Ø Industry-wise
Jobs
Ø City-wise
Jobs
You will observe that , with a much larger database available now , it is possible to analyze / display the “ No of Jobs “ , in many more ways Not only that , it should be possible to analyze this huge database to predict the future expected PATTERN of the occurrence of jobs , in many different ways ! Beyond that , it should be possible to evolve some sort of an EXPERT SYSTEM , by extracting patterns that tell us ,
Ø IF
this , THAN that
Ø IF
this , THAN not that
Type of DECISION
RULES !I have already written down a few such possible RULES , that I can send you later , in case you wish to take up this project If you do , I can even prepare U/I of a web page that will enable any visitor to search such co-relations amongst various data fields Considering that , currently , we are downloading approx 1,000 job advts EVERY DAY , this would refine and improve as time goes by Pl let me know in case of interest hcp
---------------------------------------------------------------------------------------------------------------------------
07 June 2013
At any given time , the number of jobs getting advertised , is an important Economic Indicator If economy is booming and company Order Books are getting fatter , then more jobs will get advertized – and vice-versa Hence , a time-series analysis of the no of new jobs getting posted on job portals , has a straight line relationship with the state of the economy ( a high co-efficient of correlation ) Apart from that , can a Data mining of 5 million jobs , answer ( even partially ) , the following questions ?
Ø Who ( which Companies )
are advertizing and when
Ø What jobs / vacancies /
positions are being advertized
Ø What
is the frequency
with which a particular job gets advertized ? By entire industry ? By a given
Company ?
Ø Which regions / cities have
max / min no of new jobs
Ø What are regional
disparities due to
Ø Which Industries are
advertising most – creating most jobs
Ø What Edu Qualifications
are in max demand
Ø What kind of jobs demand
what kind of Edu Qualifications
Ø What
is the level of
co-relation between , Position and the years of Experience demanded
Ø For
identical positions being advertized , how much do “ Job Descriptions /
Desired Profiles “ differ, from company to company
Ø Are
there significant
differences in the “ No of years of Experience “ being demanded , for
identical positions
Ø What
is the probability
of finding the “ Keywords “ in “ Job Description / Desired Profile “
Ø What
is the extent of
duplication ( redundancy ? ) between , “ Job Description “ and “
Desired Profile “
Ø What percentage of Advts
fail to make any mention of , Compensation Offered
Ø When
a company posts an advt for same / identical position , at different points of time , are
there any differences in values ( fields )
Ø From
an analysis of all the advts posted by a given Company ( over past 7 years )
, can any conclusion be reached as to the changing nature of that company’s business ( by
co-relating the “ Skills related Keywords “ )
Ø Can the
algorithm predict
what job a company will advertize next – and when
Ø Is
there any correlation
between , “ Designation / Position “ and the “ Keywords “
Ø From
analyzing this huge data , can software auto-generate , a complete / editable job advt ,
as soon as a Recruiter simply types the “ Designation / Position “
I believe , so far , no one has undertaken such a Data mining project If carried out diligently , I am sure , the outcome would be of immense benefit to :
Ø HR
Managers……………….. ( for Manpower Planning / Compensation Planning )
Ø Recruiting
Managers…………( for framing Man Specifications / Job Description Manuals )
Ø Educationists…………………(
for deciding what Edu Quali are in demand and tailor the Courses )
Ø Students
……………………..( to figure out what “ Skills “ are in demand by Industry and
prepare )
Ø Planning
Commission………( for allocating Resources to States / Regions , based on
imbalances )
Ø HRD
Ministry ………………….( For long term Macro-Planning in respect of Education
)
Ø National
Skills Development Commission ………( for chalking out Skills Development
Programs in collaboration with Companies / Industries )
I do hope , you will consider my proposal sincerely Regards hcp
----------------------------------------------------------------------------------------------------------------------------
15 July 2013
Rohini
I am glad that you liked the idea of
Data mining of 5+ million job advts
What can / will such a project yield ?
Without exaggerating , it would be
safe to assume that , this vast database of job advts would contain :
Ø
50 million
phrases / sentences
Ø
500 million
words
Obviously , each word / phrase /
sentence , is nothing more than a “ Database of Intentions “ of the Employer
Companies ( to borrow from John Battelle’s well-researched book about Google
)
Our goal shall be to make this ( Data
mining Algorithm ) a dynamic / continuous “ Process “ , so that , we can
measure the changing nature of these “ Intentions “ , over a long , long
period
And we must enable a “ Researching
Visitor ( of our web site ) “, to benefit from these trends / patterns
If your Guide approves of this project
, we will sit down to draw up a plan , along with Shuklendu
In the meantime , I attach a very
small list of the Words / Phrases / Sentences , that I had manually compiled
some 16 years ago
Regards
hcparekh
-------------------------------------------------------------------------------------------------------------------------------
18 July 2013
GOOGLE N-GRAM
PROJECT
Graph these case-sensitive
comma-separated phrases:
[ ]
between
[ ] and
[ ] from the
corpus [English \/] with smoothing of [3 \/].
[Search lots of books]
Search in Google Books:
Run
your own experiment! Raw data is available for download here.
--------------------------------------------------------------------------------------------------------------------------------------------
18 July
2013
Rohini
Even
though 5 million job
advts may contain 500
million “ words “ , these are not Unique
Most
of these are used again and again , hundreds or thousands of times
Thru
data mining , it is not difficult to compute their “ Frequency of Usage “
And
then , these frequencies can be graphically plotted against any particular time-period
Such
Graphical Representations can be further broken up by ,
Ø City Names
Ø Company Names
Ø Industry Names
Ø Function Names
Ø Designations ( Vacancy Names ).. etc
And
such graphical analysis can be done , not only for “ Keywords “ but even for “ Key Phrases “ and “ Sentences “ !
Regards
Hcp
-------------------------------------------------------------------------------------------------------------------------------------------------
22 July
2013
Rohini
Take
a look at this project
paper
It
is all about data mining
of some 150 million records ( location points ) and about uncovering “ trends / patterns “ of
physical movements of 300 human volunteers , over a “ period of time “
I
quote from article in Times of India ( 19 July 2013 ) :
“
..the first system of its kind to predict long term human mobility in a unified way , parse the data. Far Out
does not need to be told exactly what to look for --- it automatically discovered
regularities in the data “
“
Do you know precisely where you’ll be 285 days from now at 2 pm ?
Researchers
have developed a new tracking
software that can tell you exactly where you will be on a precise time
and date , years into
the future “
What
we want to do with 5 million job advts database , is quite similar – viz ;
predict WHO ( which Company / Industry ) ,
will advertize WHAT ( vacancies / positions /
designations ) and WHEN ( time )
It
is do-able !
Regards
hcp
-----------------------------------------------------------------------------------------------------------------------------------------------------------
31 July
2013
Rohini
No problem
Based on all the emails that I have sent so far , you should prepare an outline of
the Data mining Project
That paper would help all of us to know , in advance , what to expect when the
project gets completed ( hopefully , by Dec 2013 ? )
As explained to you over phone yesterday , this “ Data Mining
and trend / pattern generation “ must happen online on www.CustomizeResume.com
Quite likely , we , currently have some 3 million job advts in CustomizeResume
web site
By a copy of this email , I am requesting Shuklendu , to add to
this , another 3 million
job advts which are available in www.IndiaRecruiter.net web
site
And , since this database ( of job advts ) keeps growing at
approx 1,000 per day , the software that you develop and install on
CustomizeResume web site should be such that , the trends / patterns / search
results etc , are generated
dynamically / on-the-fly , any time a visitor selects any given,
Ø Search
Criteria ( Industry / Company / Position / Time Period etc )
Ø Tabular
or Graphical Display
( graphs are critical to visualizing trends / patterns )
If there are any questions , feel free to phone me
I hope , you could talk to Shuklendu re your technical queries
Regards
Hcp
CC: Shuklendu
We should seriously consider reviving , www.IndiaRecruiter.net
About a year back , while talking about this ( revival ) , Nitin
mentioned that , it may take 2/3 hours to “ connect-up “ the software code ( available with you , in the
back-up taken at Reliance Server Farm ), with the databases of IndiaRecruiter
-----------------------------------------------------------------------------------------------------------------------------------------
02 Aug
2013
Shuklendu
Ø
Demand
of the Project
I do
not understand what is meant by “ Demand “ !
I
presume , this has nothing to do with the “ Market Demand “ – as for a
product or a service
If ,
what is meant is , what is the “ Object “ or “ Purpose “ of this project ,
then , that has been amply explained in my 4/5 earlier emails ( with copies
to you ) sent to Rohini
Very
briefly stated , the
purpose is for the software to be able to “ Predict “ , WHO ( which Company
or which Industry ) will advertise for WHAT ( Vacancy / Position /
Designation ) and WHEN ( specific time in future )
The
software will accomplish this by examining / analyzing millions of Job Advts
thru PARSING / INDEXING its contents and graphically
plotting Trends / Patterns , along a “ Specified Time Axis “
The
Contents are :
·
Every field of a Advt
·
Millions of Sentences / Phrases /
Keywords , contained in those advts and computing their Frequencies
of Occurrences
Ø
Deadline
Dec
2013
Ø
Software
Tools / Languages to be used
You
are best placed to advise Rohini re this .
From
“ Availability to the Users “ point-of-view , this project / feature must
work on CustomizeResume web site . It will be freely
available to both , Employers as well as Jobseekers – and without
login
Being
web site – based , it must dynamically accommodate the inflow of 1000+ job
advts getting added to Jobs Database daily.
This is
NOT an Enterprise based TOOL
It
can be demonstrated directly from the web site only
Ø
Technical
Help
I
hope you / Nitin can guide Rohini , whenever required , as far as integration
with Job Advt Database is concerned.
I
have a strong belief that what Rohini develops , will be of immense help to
our own team in developing our “ Job Recommendation System “ (
for which , you already have with you ,
·
A folder containing my various
handwritten notes
·
Several past emails , laying down the
precise logic
We should , jointly monitor this project , once-a-fortnight , in
a face-to-face meeting with Rohini
She should continue to work from LTIT premises ( - unless , you
want her to sit at Sentient premises )
hcp
--------------------------------------------------------------------------------------------------------------------------------------
02 Aug
2013
Hello Rohini,
In consultation with Parekh Sir, these
are the responses to your queries:
What is the Demand of the Project
I take it that you do not want to know
the ‘Market Demand’ of the project (as it is irrelevant for an ME project),
but ‘What is Demanded of the Project’. Parekh Sir has already explained the
Objective & Requirement of the project in detail in his mails.
In summary, he had said
the purpose is for the software to be
able to “ Predict “ , WHO ( which Company or which Industry ) will advertise
for WHAT ( Vacancy / Position / Designation ) and WHEN ( specific time in
future )
The
software will accomplish this by examining / analyzing millions of Job Advts
thru PARSING / INDEXING its contents and graphically
plotting Trends / Patterns , along a “ Specified Time Axis “
The
Contents are :
·
Every field of a Advt
Millions
of Sentences / Phrases / Keywords , contained in those advts and computing
their Frequencies of Occurrences
Deadline of the Project
December 2013
Software Tools and Language
This project is to be part of our
existing site www.customizeresume.com. Therefore the same software platform is to be
used, which is
·
ASP.Net
3.5
·
MS
SQL Server 2005
·
C#
So, you can use Visual Studio 2008 or
Visual Web Developer Express 2010. For SQL Server, you can use SQL Server Management
Studio.
Place of Development
You can develop it from any place
convenient to you, e.g. LTIT or Institute where are doing ME or Home. You may
visit our office in Malad for discussion, showing what you have done, trouble
shooting, etc.
Demonstrating it Outside
Once the project is approved by Parekh
Sir, it will go online on www.customizeresume.com. So, it will be in public domain and anyone can see
it. You can send the link to anyone you want to demonstrate to. Parekh Sir is
very generous in giving credit where due, so i am sure he will give due
credit to you for your efforts.
Who will give Technical Help
You can contact me or my colleague
Nitin Ruge for technical help. Nitin can be contacted at nitin.ruge@sentientsystems.net or 022 42666657.
Hope this answers your queries.
Regards
Shuklendu Baji
---------------------------------------------------------------------------------------------------------------------------------------------------
07 Aug
2013
Rohini
Pl ignore my earlier email of today morning – which , I had sent
without looking at this
Anyway , Shuklendu’s answers to your queries are satisfactory
During one of our meetings , I had also talked to you about
developing an “ Expert
System “ , thru discovery of specific “ Co-relations “ amongst various Data Fields of 5
million job advts
Eg :
Ø What
is the Co-relation between , any given “ Designation / Vacancy-Name /
Advertized Position “ and “ Educational Qualifications “ ?
Here are some examples :
Ø Any
designation such as “ Production Manager “ would call for an “
Engineering Degree / Diploma “ ( but never a CS / CA )
Ø Any
designation in “ Finance Function “ will require,
·
B Com
·
M Com
·
CA etc
But never a BE(M ) / BE
(Chem )
Ø Any
designation at Manager level will call for a minimum experience of 5 years (
but never a Fresh Graduate with NIL experience )
Ø MBA /
BBA / MMS etc are the most preferred Edu Qualifications for positions in
Marketing
Ø No
vacancy in an Automobile Manufacturing Company , will call for a degree in
Pharmaceutical
Ø No
Electrical Machinery Manufacturing company will ever demand a Medical Degree
(MBBS )
To a human mind , these ( rules ) are SOO OBVIOUS !
But , no human mind can
write-down ALL of such RULES , in 2 minutes ! – something that your Data
mining Software can – and will – do in 5 seconds !
All that you need , after computing “ Frequencies of Occurrences “ , is to :
Ø Plot
the Co-efficients of Co-relations between various Fields ( of
job advts )
Ø Compute
Probabilities for each and create hundreds of Probability Tables
And , since a thousand new job advts are getting added to our
Job Advt Database , daily , the SAMPLE SIZE is perpetually increasing – thereby , increasing
the Accuracies of your
Predictions !
Having done this , imagine the following scenario :
Recruitment Officer of Wipro , comes to our “ Post Job “ page
and , in the field for “ Designation “ simply types ,
“ Business
Analyst “
And Presto !
The entire Job Advt Form gets auto-filled , with MOST PROBABLE values !
Would not that amaze her ?
All that our software has done is analyzed job advts of all “ Software Companies “ (
an Industry ),– and of WIPRO – for the position of Business Analyst and filled in the most
probable values
This is no rocket science !
We had actually , partially attempted it – albeit in a crude way
– in our earlier web site , www.IndiaRecruiter.net
What surprises me is , how come no one has attempted this so far
! Especially , Naukri / TimesJobs / MonsterIndia , who have accumulated
millions of job advts !
Any way , the fact that they have , so far , ignored this “ Line
of Examination “ , will work to your advantage – making YOU the very first person in
the entire world to come up with a PREDICTION MODEL in the area of JOBS
Let us keep our HORIZONS way
wide
hcp
--------------------------------------------------------------------------------------------------------------------------------------
27 Aug
2013
I refer to our telecom today morning From what I understood , your guide would like to know the “ Prior Knowledge “ in this area ( Research Papers ) I did some searching on Google and came across the following All of these may not be directly / immediately relevant to our project , but these are worth going thru You may even short-list 3 or 4 , that you may wish to submit to your guide Next question ( of your guide ) is also very relevant , viz ; “ What do you expect to achieve thru this project ? How will it benefit Jobseekers / Employers / Edu Institutions / Policy Makers etc ? “ From my own past experience ( of designing 8 web sites over last 16 years ) , I have found that , this question cannot be satisfactorily answered by writing a long and esoteric description ! The best way to answer is to conceive / design / display , the User Interface ! That alone can force you to answer : “ When a visitor arrives on this web page , will he easily understand what he can select / click ? Will he intuitively expect , what ( texts / figures / graphs ) will he get to see when he takes any action ? Pl do prepare U/I and that would convince your guide ! hcp http://www.cnts.ua.ac.be/papers/2000/extract00.pdf http://edpath.typepad.com/source_scholars/2013/02/does-labor-market-intelligence-software-along-with-spidering-and-data-mining-of-online-job-advertise.html http://www.trainingindustry.com/blog/blog-entries/new-providers-of-data-collection-and-analysis-of-online-job-ads-in-real-time-may-be-helping-community-colleges-create-better-curricula.aspx http://www.youtube.com/watch?v=6LeUiFcfpyw http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5232798&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F52%2F5370751%2F05232798.pdf%3Farnumber%3D5232798 http://www.questia.com/library/1G1-277519055/changing-trends-in-lis-job-advertisements http://lexicometrica.univ-paris3.fr/jadt/jadt2012/Communications/Fioredistella%20Iezzi,%20Domenica%20et%20al.%20-%20Text%20clustering%20based%20on%20centrality%20measures.pdf https://www.google.co.in/url?sa=t&rct=j&q=&esrc=s&source=web&cd=38&cad=rja&ved=0CFwQFjAHOB4&url=http%3A%2F%2Fwww.lirgjournal.org.uk%2Flir%2Fojs%2Findex.php%2Flir%2Farticle%2Fdownload%2F499%2F548&ei=aDgcUrW8Osf_rQe_2YDQDg&usg=AFQjCNFLjH3CRQgJe7DmX8sfiO1Ju7rUIA&sig2=N7ObhwbL-Qmf5YRBYOAVRA&bvm=bv.51156542,d.bmk http://books.google.co.in/books?id=ImJPbmcgF4wC&pg=PA590&lpg=PA590&dq=%22data+mining%22+%2B+%22job+advertisements%22&source=bl&ots=nowszCkygZ&sig=-IC9g67l6q9HqY9eC2H_LHn3VYE&hl=en&sa=X&ei=nDwcUsrAENCmrAemwgE&ved=0CEMQ6AEwAThG#v=onepage&q=%22data%20mining%22%20%2B%20%22job%20advertisements%22&f=false https://www.google.co.in/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCsQFjAA&url=http%3A%2F%2Fwww.comp.hkbu.edu.hk%2F~william%2Fpapers_slides%2Fcheung_am_icdm02.ps&ei=-T0cUrHlLMfIrQfP9oCwDQ&usg=AFQjCNHwvqNLJP3WLpniR4VX348B6BXG4w&sig2=dUVpsLN0tXXowSTrAgwjVQ http://etjanst.hb.se/bhs/ith/2-99/ja.htm http://codecamp.fi/lib/exe/fetch.php/wiki/a_tool_for_visualizing_skill_requirements_in_ict_job_advertisements---preprint.pdf http://www.bloomberg.com/news/2013-04-03/algorithms-play-matchmaker-to-fight-7-7-u-s-unemployment-jobs.html ( quite interesting ! –hcp ) http://it.vtc.edu.hk/itjobanalysis/
-------------------------------------------------------------------------------------------------------------------------------------------
01 Sept
2013
Following is a list of links from the first page of Google , when you type search term : “ Download Data Mining Software “ Altogether , there were more than 2 Million results ! Nearly all of these can be downloaded for FREE After examining , you can decide if any of these can be gainfully employed for our project . If so , go ahead and download Nothing can be discovered without “ experimenting “ ! Remember , too much of “ Analysis “ , often leads to “ Paralysis “ ! It is important to “ Get Going “ ! In the meantime , I hope you have gone thru the links that I sent to you earlier In those , could you find any Research Papers that you want to submit to your guide ? It has been over 6 weeks since we started talking about this project . It is high time we put this in “ Second Gear “ ! hcp
---------------------------------------------------------------------------------------------------------------------------------------------
02 Sept
2013
Take a look at the counter for “ Live Jobs “ on , http://www.customizeresume.com/Jobseeker/JobSearchConventional.aspx Today , it reads …. 14,861 Some 6 months back , it read , approx 30,000 This counter is constructed from Jobs RSS Feeds from , Naukri / TimesJobs / MonsterIndia and ClickJobs Hence , it is fairly representative of the job market in the organized sector It would have been a fairly simple exercise to plot the daily figures in a graph to reveal the gradually declining no of jobs being advertized That would not require use of any Data Mining tool However , without applying some simple data mining tool , it would not be possible to answer the following questions : Where is the greatest decline of jobs being advertized ? How much is the percentage decline ?
Ø In
which Industry ?
Ø In
which Company ?
Ø In
which City ?
Ø In
which Region ?
Ø In
which Skills ?
Ø For
which Positions ?
Ø For
which Education Levels
? ………… etc
One could even co-relate these graphs with other , publicly available statistical data such as :
Ø IIP (
Index of Industrial Production )
Ø Stock
Market Index
Ø Currency
Exchange Rate ( eg; declining Rupee )
Ø Decline
in GDP / Increasing Fiscal Deficit
Ø CAD (
Current Account Deficit )
Ø Foreign
Investments
Ø Primary
Bank Rates of RBI…………………………….etc
With proper co-relations , one could even predict how
much the job market will further shrink , over the next 6 months !
Such” Predictive
Model of Job Market “, would be of immense interest to , not only the
economists but also to the HRD Ministry / Planning Commission / Educational
Institutions and of course the students themselves
I believe you could now , accelerate the pace of your project
I await to hear from you
hcp
------------------------------------------------------------------------------------------------------------------------------------------
04 Sept
2013
Thank you for your email , mentioning that tomorrow you will let us know the exact status of the project In the meantime , you may want to look up the following InMobi Ad Network delivers every month , billions of Advts on millions of mobile phones in 165 countries This portal shows some interesting methods of graphically displaying their findings , on a continuous / dynamic manner You may , well , consider this ( portal ) to be THE LARGEST DATA MINING project ever undertaken ( barring , possibly , Google Analytics ) Although our project is very small ( only 5 million job advts and only approx 1000 added every day ) , we , too , should be able to present our analytics / findings in beautiful / meaningful graphs And in our case , we want the visitors ( to our web site ) themselves , to be able to select any Search Parameter and be able to generate the graphs on-the-fly hcp http://www.inmobi.com/hstar/netres/netres.php?country=India&month=6&year=2013&cmonth=3&cyear=2013
--------------------------------------------------------------------------------------------------------------------------------------
|