Post by Jake Niemiechttps://github.com/segmentio/ui-box
Post by Jake NiemiecReact components are run client-side, meaning the text you are looking
for is inserted into the document after the page runs <script> tags. I
would take a look at the Sources tab in chrome, you can find all the loaded
scripts there.
Post by Jake NiemiecI should think that javascript is involved. I am sure you asked a
similar question before when you were trying to scrape a website and
couldn't find the text in the html.
Colin
Post by fugee ohuI'm not very good with the consoles in chrome and firefox but I
couldn't find the text I was looking for in source even though it's
displayed as text seemingly, the cursur changes to a vertical line on
mouse-over I found this html below in the source How does this html create
the text that displays?
Post by Jake NiemiecPost by fugee ohu<div class="ui-box product-description-main"
id="j-product-description">
Post by Jake NiemiecPost by fugee ohu<div class="ui-box-title">Product Description</div>
<div class="ui-box-body">
<div class="description-content" data-role="description"
data-spm="1000023">
Post by Jake NiemiecPost by fugee ohu<div class="loading32"></div>
</div>
</div>
</div>
--
You received this message because you are subscribed to the Google
Groups "Ruby on Rails: Talk" group.
Post by Jake NiemiecPost by fugee ohuTo unsubscribe from this group and stop receiving emails from it, send
To view this discussion on the web visit
https://groups.google.com/d/msgid/rubyonrails-talk/8e0eb26a-517a-4216-bb9c-8bd05e4412a5%40googlegroups.com.
Post by Jake NiemiecPost by fugee ohuFor more options, visit https://groups.google.com/d/optout.
Yes, within that context, javascript, how does it happen that the text
I'm viewing in the browser isn't visible in source?
Post by Jake Niemiec--
You received this message because you are subscribed to the Google
Groups "Ruby on Rails: Talk" group.
Post by Jake NiemiecTo unsubscribe from this group and stop receiving emails from it, send
To view this discussion on the web visit
https://groups.google.com/d/msgid/rubyonrails-talk/12b65225-60e5-4fe3-80a7-9ebb8013f312%40googlegroups.com.
Post by Jake NiemiecFor more options, visit https://groups.google.com/d/optout.
So far I'm trying to get up to the table, the last element shown below
doc.at_css("div#j-product-description div.ui-box-body
div.description-content") gets me back the div class="description-content
element but doc.at_css("div#j-product-description div.ui-box-body
div.description-content div.origin-part") returns nil There's a lot inside
kde:widget that I'm not including here
Post by Jake Niemiec<div class="ui-box product-description-main" id="j-product-description"
data-widget-cid="widget-27">
Post by Jake Niemiec<div class="ui-box-title">Product Description</div>
<div class="ui-box-body">
<div class="description-content" data-role="description"
data-spm="1000023"><div class="origin-part"><p> <br> <br> <br> </p>
Post by Jake Niemiec<kse:widget data-widget-type="relatedProduct" id="24226336" title="TOP"
type="relation">...</kse:widget>
It seems to me that you are going to have to identify the data source that
the in-page JavaScript is using to generate the dynamic table data, and
query that rather than trying to work everything out from the HTML (which
is just a template for the in-page script to fill). There's probably a JSON
URL somewhere that is being loaded into the page, and the script is
building from that. This entire approach is pretty fraught with peril,
though, because (like any scraping project, only more so) any change to the
scheme that the site's developer chooses to implement will break your
scraper immediately.
Following this path is going to force you to learn about how the site is
working on a code level -- and to figure out how they go from data to
presentation.
Another approach might be to use a headless browser on the server to
construct a "real" DOM of the page, and query that. To be clear -- I do not
recommend you follow this path -- I am noting it here to illustrate how
ridiculous this effort will be.
One way to visualize this difference is to use the Web Inspector in Safari
or Chrome to look at the differences between the raw HTML (Safari labels
this tab "Resources") and the DOM (Safari calls this "Elements"). There is
likely very little in common outside of the overall outline, if the page is
changing as dramatically as you describe. If you hunt through the Resources
tab (in Safari) you may find a link to a JSON file that is being required
into the page. Loading that URL, rather than the HTML, may give you a much
cleaner set of data (which you can parse directly using Ruby) rather than
trying to execute JS on your server in order to construct an HTML DOM that
you can parse with Nokogiri.
Walter